CASE-Lab-UMD/Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
This project helps machine learning engineers and researchers make large Mixture-of-Experts (MoE) models, like those used for complex language understanding, run more efficiently. It takes existing MoE models and applies various compression techniques, outputting smaller, faster models that still perform well. This is for professionals building and deploying advanced AI applications who need to optimize model size and speed.
Use this if you are working with large Mixture-of-Experts models and need to reduce their computational cost, memory footprint, or inference time without significantly sacrificing performance.
Not ideal if you are working with simpler, non-MoE models or are not concerned with the efficiency of very large language models.
Stars
89
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/CASE-Lab-UMD/Unified-MoE-Compression"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EfficientMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
raymin0223/mixture_of_recursions
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation...
AviSoori1x/makeMoE
From scratch implementation of a sparse mixture of experts language model inspired by Andrej...
thu-nics/MoA
[CoLM'25] The official implementation of the paper
jaisidhsingh/pytorch-mixtures
One-stop solutions for Mixture of Expert modules in PyTorch.