jxiw/MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
This project helps machine learning engineers and researchers create smaller, faster language models while maintaining high quality. It takes a large, powerful language model (like Llama) and distills its knowledge into a more efficient 'Hybrid-Mamba' model. The output is a compact, high-performing model suitable for various applications, especially those needing quicker inference.
238 stars. No commits in the last 6 months.
Use this if you need to deploy a high-quality large language model but are constrained by computational resources, inference speed, or model size.
Not ideal if you prefer to train large transformer models from scratch without distillation, or if you need the absolute cutting-edge performance of the largest models, regardless of efficiency.
Stars
238
Forks
21
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jxiw/MambaInLlama"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ZHZisZZ/dllm
dLLM: Simple Diffusion Language Modeling
pengzhangzhi/Open-dLLM
Open diffusion language model for code generation — releasing pretraining, evaluation,...
EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM...
THUDM/LongWriter
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
AIoT-MLSys-Lab/SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2