FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
This project helps developers accelerate the text generation speed of large language models (LLMs) they are working with. It takes an existing LLM and enhances its ability to predict multiple words at once, leading to faster responses. Developers or engineers building applications that use LLMs for tasks like chatbots or content creation will find this useful for improving user experience and reducing computational costs.
2,717 stars. No commits in the last 6 months.
Use this if you are a developer or MLOps engineer looking to significantly speed up how quickly your deployed LLMs generate text, especially for single user queries.
Not ideal if you are primarily focused on training LLMs from scratch or only require batch inference, as the current focus is on single-query acceleration.
Stars
2,717
Forks
195
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Jun 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FasterDecoding/Medusa"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NX-AI/xlstm
Official repository of the xLSTM.
sinanuozdemir/oreilly-hands-on-gpt-llm
Mastering the Art of Scalable and Efficient AI Model Deployment
DashyDashOrg/pandas-llm
Pandas-LLM
wxhcore/bumblecore
An LLM training framework built from the ground up, featuring a custom BumbleBee architecture...
MiniMax-AI/MiniMax-01
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model &...