keith2018/TinyGPT
Tiny C++ LLM inference implementation from scratch
This project helps software developers integrate large language model (LLM) inference capabilities directly into their applications. It takes pre-trained LLM model files (like GPT-2, Llama, Qwen, or Mistral) and provides a fast, efficient way to generate text completions or chat responses. The end-user is typically a C++ or Python developer building applications that require local, high-performance LLM functionality.
106 stars.
Use this if you are a developer looking to embed efficient, in-process LLM inference directly into your C++ or Python application, or if you need to run a local OpenAI-compatible LLM server.
Not ideal if you are an end-user without programming experience, or if you need advanced LLM features like distributed inference or continuous batching, which are not yet supported.
Stars
106
Forks
15
Language
C++
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/keith2018/TinyGPT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
tabularis-ai/be_great
A novel approach for synthesizing tabular data using pretrained large language models
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron...
shibing624/textgen
TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...
ai-forever/ru-gpts
Russian GPT3 models.
AdityaNG/kan-gpt
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...