jxiw/MambaInLlama

[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

/ 100

Emerging

This project helps machine learning engineers and researchers create smaller, faster language models while maintaining high quality. It takes a large, powerful language model (like Llama) and distills its knowledge into a more efficient 'Hybrid-Mamba' model. The output is a compact, high-performing model suitable for various applications, especially those needing quicker inference.

238 stars. No commits in the last 6 months.

Use this if you need to deploy a high-quality large language model but are constrained by computational resources, inference speed, or model size.

Not ideal if you prefer to train large transformer models from scratch without distillation, or if you need the absolute cutting-edge performance of the largest models, regardless of efficiency.

large-language-models model-optimization ai-inference natural-language-processing machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

238

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

ZHZisZZ/dllm

dLLM: Simple Diffusion Language Modeling

pengzhangzhi/Open-dLLM

Open diffusion language model for code generation — releasing pretraining, evaluation,...

EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM...

THUDM/LongWriter

[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

AIoT-MLSys-Lab/SVD-LLM

[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2

Explore Transformer Models

All categories Trending Transformer directory Insights