sgl-project/SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
For those working with large language models, SpecForge helps you train specialized 'speculative decoding' models that can significantly speed up how fast your main LLM responds. You feed in your LLM and it outputs a more efficient version, ready to be used with the SGLang serving framework. This is for AI practitioners and researchers looking to optimize LLM inference performance.
729 stars. Actively maintained with 27 commits in the last 30 days. Available on PyPI.
Use this if you are a machine learning engineer or researcher looking to accelerate the inference speed of your large language models by training and deploying specialized speculative decoding models.
Not ideal if you're not already working with large language models or are not familiar with model training and deployment concepts.
Stars
729
Forks
179
Language
Python
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
27
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sgl-project/SpecForge"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
structuredllm/syncode
Efficient and general syntactical decoding for Large Language Models
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...
hao-ai-lab/JacobiForcing
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder