Infini-AI-Lab/Sequoia
scalable and robust tree-based speculative decoding algorithm
This project offers a scalable and robust way to speed up the process of generating text from large language models (LLMs). By providing a smaller 'draft' model and a larger 'target' model, it accelerates how quickly the target model produces its output. This is for researchers and engineers who are developing and evaluating the performance of LLMs and their deployment.
372 stars. No commits in the last 6 months.
Use this if you are developing or evaluating large language models and need to accelerate their text generation speed while maintaining quality, especially on specific hardware setups.
Not ideal if you are a general user looking for a ready-to-use chatbot or an application for everyday text generation, as this is a low-level optimization tool.
Stars
372
Forks
37
Language
Python
License
—
Category
Last pushed
Jan 28, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Infini-AI-Lab/Sequoia"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sgl-project/SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
structuredllm/syncode
Efficient and general syntactical decoding for Large Language Models
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...
hao-ai-lab/JacobiForcing
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding