Mixture Of Experts Llms Transformer Models

There are 23 mixture of experts llms models tracked. 1 score above 50 (established tier). The highest-rated is EfficientMoE/MoE-Infinity at 50/100 with 288 stars.

Get all 23 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=mixture-of-experts-llms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

50
Established
2 raymin0223/mixture_of_recursions

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive...

47
Emerging
3 AviSoori1x/makeMoE

From scratch implementation of a sparse mixture of experts language model...

46
Emerging
4 thu-nics/MoA

[CoLM'25] The official implementation of the paper

46
Emerging
5 jaisidhsingh/pytorch-mixtures

One-stop solutions for Mixture of Expert modules in PyTorch.

46
Emerging
6 CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of...

44
Emerging
7 MoonshotAI/MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

44
Emerging
8 efeslab/fiddler

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

42
Emerging
9 FareedKhan-dev/qwen3-MoE-from-scratch

A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch

39
Emerging
10 ByteDance-Seed/FlexPrefill

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse...

38
Emerging
11 lliai/D2MoE

D^2-MoE: Delta Decompression for MoE-based LLMs Compression

37
Emerging
12 SkyworkAI/MoE-plus-plus

[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with...

36
Emerging
13 dmis-lab/Monet

[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers

35
Emerging
14 CASE-Lab-UMD/Router-Tuning-Mixture-of-Depths

The open-source Mixture of Depths code and the official implementation of...

34
Emerging
15 cmu-flame/FLAME-MoE

Official repository for FLAME-MoE: A Transparent End-to-End Research...

32
Emerging
16 rioyokotalab/optimal-sparsity

[ICLR 2026 Oral] Optimal Sparsity of Mixture-of-Experts Language Models for...

29
Experimental
17 robinzixuan/FROST

[ICLR 2026] FROST: Filtering Reasoning Outliers with Attention for Efficient...

28
Experimental
18 UNITES-Lab/HEXA-MoE

Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE...

24
Experimental
19 Spico197/MoE-SFT

🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction...

23
Experimental
20 zhongshsh/MoExtend

ACL 2024 (SRW), Official Codebase of our Paper: "MoExtend: Tuning New...

21
Experimental
21 lorenzflow/robust-moa

This is the official repository for the paper: This is your Doge: Exploring...

20
Experimental
22 RoyZry98/T-REX-Pytorch

[Arxiv 2025] Official code for T-REX: Mixture-of-Rank-One-Experts with...

20
Experimental
23 Devanik21/HAG-MoE

HAG-MoE introduces a revolutionary approach to artificial intelligence by...

18
Experimental