sgl-project/SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

/ 100

Verified

For those working with large language models, SpecForge helps you train specialized 'speculative decoding' models that can significantly speed up how fast your main LLM responds. You feed in your LLM and it outputs a more efficient version, ready to be used with the SGLang serving framework. This is for AI practitioners and researchers looking to optimize LLM inference performance.

729 stars. Actively maintained with 27 commits in the last 30 days. Available on PyPI.

Use this if you are a machine learning engineer or researcher looking to accelerate the inference speed of your large language models by training and deploying specialized speculative decoding models.

Not ideal if you're not already working with large language models or are not familiar with model training and deployment concepts.

LLM-optimization AI-inference model-training machine-learning-engineering large-language-models

No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 24 / 25

Community 25 / 25

How are scores calculated?

Stars

729

Forks

179

Language

Python

License

MIT

Compare

SpecForge and TorchSpec

Related models

structuredllm/syncode

Efficient and general syntactical decoding for Large Language Models

SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

romsto/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...

hao-ai-lab/JacobiForcing

Jacobi Forcing: Fast and Accurate Diffusion-style Decoding

kssteven418/BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Explore Transformer Models

All categories Trending Transformer directory Insights