mscheong01/speculative_decoding.c

minimal C implementation of speculative decoding based on llama2.c

/ 100

Emerging

This tool helps developers speed up the text generation process for large language models (LLMs). It takes a pre-trained base LLM and a smaller, faster "draft" model as input, then outputs the same high-quality text as the base model but much more quickly. Developers working with local LLMs, especially on resource-constrained devices, will find this useful for improving performance.

No commits in the last 6 months.

Use this if you are a developer looking to accelerate the inference (text generation) speed of your Llama2-based large language models, particularly in a pure C environment.

Not ideal if you are an end-user without programming experience or if you need to generate very long sequences of text beyond the draft model's capacity.

LLM inference edge AI model optimization C programming text generation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

License

MIT

Higher-rated alternatives

sgl-project/SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

structuredllm/syncode

Efficient and general syntactical decoding for Large Language Models

SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

romsto/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...

hao-ai-lab/JacobiForcing

Jacobi Forcing: Fast and Accurate Diffusion-style Decoding

Explore Transformer Models

All categories Trending Transformer directory Insights