FasterDecoding/Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

/ 100

Emerging

This project helps developers accelerate the text generation speed of large language models (LLMs) they are working with. It takes an existing LLM and enhances its ability to predict multiple words at once, leading to faster responses. Developers or engineers building applications that use LLMs for tasks like chatbots or content creation will find this useful for improving user experience and reducing computational costs.

2,717 stars. No commits in the last 6 months.

Use this if you are a developer or MLOps engineer looking to significantly speed up how quickly your deployed LLMs generate text, especially for single user queries.

Not ideal if you are primarily focused on training LLMs from scratch or only require batch inference, as the current focus is on single-query acceleration.

LLM deployment AI application development model inference language model optimization MLOps

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

2,717

Forks

195

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

NX-AI/xlstm

Official repository of the xLSTM.

sinanuozdemir/oreilly-hands-on-gpt-llm

Mastering the Art of Scalable and Efficient AI Model Deployment

DashyDashOrg/pandas-llm

Pandas-LLM

wxhcore/bumblecore

An LLM training framework built from the ground up, featuring a custom BumbleBee architecture...

MiniMax-AI/MiniMax-01

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model &...

Explore Transformer Models

All categories Trending Transformer directory Insights