keith2018/TinyGPT

Tiny C++ LLM inference implementation from scratch

50
/ 100
Established

This project helps software developers integrate large language model (LLM) inference capabilities directly into their applications. It takes pre-trained LLM model files (like GPT-2, Llama, Qwen, or Mistral) and provides a fast, efficient way to generate text completions or chat responses. The end-user is typically a C++ or Python developer building applications that require local, high-performance LLM functionality.

106 stars.

Use this if you are a developer looking to embed efficient, in-process LLM inference directly into your C++ or Python application, or if you need to run a local OpenAI-compatible LLM server.

Not ideal if you are an end-user without programming experience, or if you need advanced LLM features like distributed inference or continuous batching, which are not yet supported.

LLM-development NLP-engineering inference-optimization application-development API-creation
No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

106

Forks

15

Language

C++

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/keith2018/TinyGPT"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.