Infini-AI-Lab/Sequoia

scalable and robust tree-based speculative decoding algorithm

/ 100

Emerging

This project offers a scalable and robust way to speed up the process of generating text from large language models (LLMs). By providing a smaller 'draft' model and a larger 'target' model, it accelerates how quickly the target model produces its output. This is for researchers and engineers who are developing and evaluating the performance of LLMs and their deployment.

372 stars. No commits in the last 6 months.

Use this if you are developing or evaluating large language models and need to accelerate their text generation speed while maintaining quality, especially on specific hardware setups.

Not ideal if you are a general user looking for a ready-to-use chatbot or an application for everyday text generation, as this is a low-level optimization tool.

large-language-models text-generation-optimization machine-learning-engineering model-inference AI-performance-tuning

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

372

Forks

Language

Python

License

—

Higher-rated alternatives

sgl-project/SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

structuredllm/syncode

Efficient and general syntactical decoding for Large Language Models

SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

romsto/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...

hao-ai-lab/JacobiForcing

Jacobi Forcing: Fast and Accurate Diffusion-style Decoding

Explore Transformer Models

All categories Trending Transformer directory Insights