FasterDecoding/Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

45
/ 100
Emerging

This project helps developers accelerate the text generation speed of large language models (LLMs) they are working with. It takes an existing LLM and enhances its ability to predict multiple words at once, leading to faster responses. Developers or engineers building applications that use LLMs for tasks like chatbots or content creation will find this useful for improving user experience and reducing computational costs.

2,717 stars. No commits in the last 6 months.

Use this if you are a developer or MLOps engineer looking to significantly speed up how quickly your deployed LLMs generate text, especially for single user queries.

Not ideal if you are primarily focused on training LLMs from scratch or only require batch inference, as the current focus is on single-query acceleration.

LLM deployment AI application development model inference language model optimization MLOps
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

2,717

Forks

195

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Jun 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FasterDecoding/Medusa"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.