mscheong01/speculative_decoding.c

minimal C implementation of speculative decoding based on llama2.c

30
/ 100
Emerging

This tool helps developers speed up the text generation process for large language models (LLMs). It takes a pre-trained base LLM and a smaller, faster "draft" model as input, then outputs the same high-quality text as the base model but much more quickly. Developers working with local LLMs, especially on resource-constrained devices, will find this useful for improving performance.

No commits in the last 6 months.

Use this if you are a developer looking to accelerate the inference (text generation) speed of your Llama2-based large language models, particularly in a pure C environment.

Not ideal if you are an end-user without programming experience or if you need to generate very long sequences of text beyond the draft model's capacity.

LLM inference edge AI model optimization C programming text generation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 7 / 25

How are scores calculated?

Stars

28

Forks

2

Language

C

License

MIT

Last pushed

Jul 15, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mscheong01/speculative_decoding.c"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.