jxiw/MambaInLlama

[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

41
/ 100
Emerging

This project helps machine learning engineers and researchers create smaller, faster language models while maintaining high quality. It takes a large, powerful language model (like Llama) and distills its knowledge into a more efficient 'Hybrid-Mamba' model. The output is a compact, high-performing model suitable for various applications, especially those needing quicker inference.

238 stars. No commits in the last 6 months.

Use this if you need to deploy a high-quality large language model but are constrained by computational resources, inference speed, or model size.

Not ideal if you prefer to train large transformer models from scratch without distillation, or if you need the absolute cutting-edge performance of the largest models, regardless of efficiency.

large-language-models model-optimization ai-inference natural-language-processing machine-learning-engineering
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

238

Forks

21

Language

Python

License

Apache-2.0

Last pushed

Oct 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jxiw/MambaInLlama"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.