Modern Architectures

Latest Mixture of Experts Research Papers

The newest Mixture of Experts papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Mixture of Experts so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Mixture of Experts papers in your inbox — free →

Recent papers

DASTMoE: A Spatio-Temporal Mixture-of-Experts Model for Distributed Acoustic Sensing
Michel Dione, Bilal Faye, Jerry Lonlac, H. Louis et al. · HAL (Le Centre pour la Comm... · Sep 14, 2026
International audience...
Spatially Validated Graph-Based Mixture-of-Experts Modelling and Geodetector Attribution of Heavy Metal Contamination in European Topsoils
Li Niu, Yongzhang Zhou, Biaobiao Zhu, Chengyue Shi et al. · Applied Sciences · Jul 21, 2026
Continental-scale assessment of soil heavy metal contamination is complicated by the contrasting environmental behaviour of individual metals and by spatial autocorrelation, which can lead to overly optimistic model evaluation. This study a…
MoOE (mixture of optimized experts) : Bounded Expert Streaming and Neural Orchestration for Sub-Latency Sparse MoE Inference on Consumer Hardware
Dwij Shukla · Zenodo (CERN European Organ... · Jul 20, 2026
Executing high-parameter mixture-of-experts (MoE) architectures on asymmetric, consumergrade hardware is often bottlenecked by weight-transfer latency across the PCI Express (PCIe)bus. We present MooE, a routing and memory-management framew…
Mixture of Experts Approach for Domain Adaptation in Finance
Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi · Journal of Advanced Computa... · Jul 19, 2026
Pre-trained language models (PLMs) have demonstrated high performance across various tasks and domains. Among these PLMs, Mixture of Experts (MoE) models also exhibit high performance with fewer active parameters. In domain adaptation, gene…
Understanding Mixture of Experts (MoE) Through Academic Administration: A Conceptual Analogy
Musa Gülmak · Zenodo (CERN European Organ... · Jul 17, 2026
This technical paper presents a novel conceptual framework for understanding the Mixture of Experts (MoE) architecture in Large Language Models (LLMs) through a comprehensive academic administration analogy. By mapping routers, experts, and…
Understanding Mixture of Experts (MoE) Through Academic Administration: A Conceptual Analogy
Musa Gülmak · Zenodo (CERN European Organ... · Jul 17, 2026
This technical paper presents a novel conceptual framework for understanding the Mixture of Experts (MoE) architecture in Large Language Models (LLMs) through a comprehensive academic administration analogy. By mapping routers, experts, and…
Understanding Mixture of Experts (MoE) Through Academic Administration: A Conceptual Analogy
Musa Gülmak · Zenodo (CERN European Organ... · Jul 17, 2026
This technical paper presents a novel conceptual framework for understanding the Mixture of Experts (MoE) architecture in Large Language Models (LLMs) through a comprehensive academic administration analogy. By mapping routers, experts, and…
Capacity-bounded expansion: Transferability-driven mixture of experts for continual process monitoring
Yihui Wang, Zixuan Chen, Qinzhe Wang, Keying Ding et al. · Expert Systems with Applica... · Jul 15, 2026
Tail-calibrated mixture of experts for ultra-low false-alarm intrusion detection in IEC 61850 communications
Livinus Obiora Nweke · Discover Computing · Jul 11, 2026
IEC 61850 communications in digital substations require intrusion detection methods that can operate under very low false-alarm budgets while remaining sensitive to both availability disruption and integrity manipulation. This paper present…
A Sovereign, Open-Source Foundation Model for German and English
The Soofi-Team, :, Benedikt Droste, David Fitzek et al. · arXiv · Jul 10, 2026
We present Soofi S 30B-A3B, a sovereign, open-source Mixture-of-Experts (MoE) hybrid Mamba Transformer foundation model for German and English. Its hybrid design activates only 3B of 30B parameters per token and keeps the inference cache ne…
Deep learning embedded latent class joint modelling of time-to-event and longitudinal data
Tristan Harris, Najmeh Nakhaei Rad, Sphiwe B. Skhosana · Statistical Methods in Medi... · Jul 10, 2026
Joint modelling of longitudinal and time-to-event data is a powerful tool for analysing complex medical data. Joint latent class models (JLCMs), in particular, provide additional prognostic insight by clustering subjects into latent subgrou…
Uncertainty-Aware Expert Allocation for Efficient Multitask Fine-Tuning of Large Language Models
Maab Elhassan, Minhee Jun, Hanseok Ko · Data · Jul 10, 2026
Large Language Models increasingly rely on Mixture-of-Experts architectures to scale model capacity while controlling computational cost. However, most MoE systems employ static expert routing strategies that allocate identical computationa…
A dynamic-gated Mixture-of-Experts framework improves and interprets daily streamflow simulation
Wenrui Yuan, Shi Hu, Chesheng Zhan, Zhonghui Lin · Communications Earth & Envi... · Jul 9, 2026
Abstract Accurately forecasting daily streamflow remains challenging as existing models struggle to balance flexibility for rapid process shifts with physical consistency. Here we present HydroMoE, a dynamic-gated Mixture-of-Experts framewo…
TriA Pipeline: A Large-Scale Automatic Audio Annotation Pipeline For Audio Classification In Specific Scenarios
Hong Lyu, Mingru Yang, Qianhua He, Yanxiong Li et al. · arXiv · Jul 7, 2026
There are some datasets of varying scales for audio classification (AC) applied to different tasks. However, annotated data is limited for most scenarios, such as domestic environments. To address this challenge, we propose an $\textbf{A}$u…
Cascaded spectral operator transformer with mixture-of-experts for urban wind field prediction
Xinyu Huang, J J Liu, Siqi Wang, Xinhai Chen et al. · Engineering Applications of... · Jul 7, 2026
Development of a Mixture of Experts with Optimized EfficientNet Features (MoEffNet)-Powered Automated Identification System for Images of Agricultural Equipment
Wang Xingxing · Zenodo (CERN European Organ... · Jul 1, 2026
How to Run: % 1. Edit data path in config/hyperparams.mcfg.dataRoot = 'data/'; % point to downloaded dataset % 2. Launch>> main...
FreqMoE: Robust Time Series Forecasting via Frequency-Domain Mixture of Experts for Out-of-Distribution Scenarios
A. Shen, Zesheng Lai, Tianwei Wang · Electronics · Jul 1, 2026
Time series forecasting plays a fundamental role across diverse domains including energy systems, transportation, healthcare, etc. Despite significant advancements in forecasting models, their practical application often encounters challeng…
Organ-aware mixture-of-experts framework for generalized pan-tumor segmentation
Hancang Mi, Hong‐Seng Gan, Dong Ma, M. Alper Selver · Biomedical Signal Processin... · Jul 1, 2026
Development of a Mixture of Experts with Optimized EfficientNet Features (MoEffNet)-Powered Automated Identification System for Images of Agricultural Equipment
Wang Xingxing · Zenodo (CERN European Organ... · Jul 1, 2026
How to Run: % 1. Edit data path in config/hyperparams.mcfg.dataRoot = 'data/'; % point to downloaded dataset % 2. Launch>> main...
A Mixture-of-Experts Network for Infectious Keratitis Classification Using Multimodal Slit-Lamp Images: A Multicenter Study
Fen-Fen Li, Gao-Xiang Li, Xin-Xin Yu, Xiao-Yu Chen et al. · Translational Vision Scienc... · Jun 25, 2026
Purpose: Infectious keratitis (IK) remains one of the leading causes of corneal blindness worldwide, and subtype distinguishing continues to pose significant clinical challenges. Existing deep learning (DL) approaches typically rely on a si…
ViT-KANMoE: a vision transformer enhanced with a pure Kolmogorov–Arnold network-based mixture-of-experts for malaria blood smear image classification
Aluri Jitendra Chowdary, Mukesh Medikonda, Jasti Ramakrishna, Jyostna Devi Bodapati · Machine Vision and Applicat... · Jun 25, 2026
Two-Field Product Bit-Width Allocation for Mixed-Precision LLM Quantization Method, and a Negative Scaling Result on Large Mixture-of-Experts Models
Oleg Yuryevich Kirichenko · Zenodo (CERN European Organ... · Jun 24, 2026
Mixed-precision post-training quantization assigns each group of weights a bit-width so as to minimize task-loss degradation at a fixed average budget. Established methods allocate bits from a single field (weight magnitude/activation scale…
CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation
Zhuoren Ye, Tianyu Wo, Dinghao Xue, Mingming Zhang et al. · arXiv · Jun 23, 2026
Emerging LLM services increasingly host many sparse MoE models, yet most models receive sparse requests and remain cold. This creates a GPU memory problem: model weights are stable and model-determined, while KV-cache is transient and deman…
Hedgementation = Hedgerow Segmentation: A Remote Sensing Benchmark
Nathan Senyard, Salem Hamdani, Astrid Zhang, Derek Wang et al. · arXiv · Jun 22, 2026
We propose Hedgementation: a new benchmark to evaluate machine learning models for hedgerow mapping from remote sensing data at country scale and 10m$^2$ spatial resolution. We combine and harmonize multiple remote sensing data products and…
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
Tianyi Li, Zhiqiang Shen · arXiv · Jun 22, 2026
Linear mode connectivity (LMC) provides a promising foundation for understanding and merging independently trained neural networks, but existing methods typically optimize the interpolation path from only one model endpoint, limiting their …
Mixture of Experts Architecture for Scalable Security Classification
Ali Ali Safa Mohamed · Zenodo (CERN European Organ... · Jun 22, 2026
research - Document 03 — Mixture of Experts vs Eigenvector Routing — Multimodal AI, Vision-Language, Neural Networks, Sovereign AI, and Post-Cloud Architecture (Inte11Ect)
Lois-Kleinner Alpasan · Zenodo (CERN European Organ... · Jun 20, 2026
This document provides a comparative analysis of Mixture of Experts (MoE) routing mechanisms and the Inte11ect platform's novel Eigenvector Routing approach (GOD-11). MoE, as popularized by the Mixtral 8x7B architecture and Google's Switch …
Heterogeneous Mixture-of-Experts Adaptation for Finetuning Pretrained Time-series Foundation Models
Priyanka Nihalchandani, Naman Srivastava, Varun Ojha, Pandarasamy Arjunan · OpenAlex · Jun 20, 2026
Short-term load forecasting (STLF) is essential for energy-efficient building operation but remains challenging due to heterogeneous temporal patterns across buildings. Time-series foundation models (TSFMs) provide strong pretrained represe…
research - Document 03 — Mixture of Experts vs Eigenvector Routing — Multimodal AI, Vision-Language, Neural Networks, Sovereign AI, and Post-Cloud Architecture (Inte11Ect)
Lois-Kleinner Alpasan · Zenodo (CERN European Organ... · Jun 20, 2026
This document provides a comparative analysis of Mixture of Experts (MoE) routing mechanisms and the Inte11ect platform's novel Eigenvector Routing approach (GOD-11). MoE, as popularized by the Mixtral 8x7B architecture and Google's Switch …
MoE-B2RNet: Mixture of experts-based bidirectional boosted recurrent network for phishing attack detection
Sangeeta Vhatkar, Zahir Aalam, Ranjita Asati, Rahul Neve · Journal of Computer Virolog... · Jun 20, 2026

Track Mixture of Experts on Distill AI — start free →

Latest Mixture of Experts Research Papers

Recent papers

Related topics