mscheong01/speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
This tool helps developers speed up the text generation process for large language models (LLMs). It takes a pre-trained base LLM and a smaller, faster "draft" model as input, then outputs the same high-quality text as the base model but much more quickly. Developers working with local LLMs, especially on resource-constrained devices, will find this useful for improving performance.
No commits in the last 6 months.
Use this if you are a developer looking to accelerate the inference (text generation) speed of your Llama2-based large language models, particularly in a pure C environment.
Not ideal if you are an end-user without programming experience or if you need to generate very long sequences of text beyond the draft model's capacity.
Stars
28
Forks
2
Language
C
License
MIT
Category
Last pushed
Jul 15, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mscheong01/speculative_decoding.c"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sgl-project/SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
structuredllm/syncode
Efficient and general syntactical decoding for Large Language Models
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...
hao-ai-lab/JacobiForcing
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding