keith2018/TinyGPT

Tiny C++ LLM inference implementation from scratch

/ 100

Established

This project helps software developers integrate large language model (LLM) inference capabilities directly into their applications. It takes pre-trained LLM model files (like GPT-2, Llama, Qwen, or Mistral) and provides a fast, efficient way to generate text completions or chat responses. The end-user is typically a C++ or Python developer building applications that require local, high-performance LLM functionality.

106 stars.

Use this if you are a developer looking to embed efficient, in-process LLM inference directly into your C++ or Python application, or if you need to run a local OpenAI-compatible LLM server.

Not ideal if you are an end-user without programming experience, or if you need advanced LLM features like distributed inference or continuous batching, which are not yet supported.

LLM-development NLP-engineering inference-optimization application-development API-creation

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

106

Forks

Language

C++

License

MIT

Related models

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron...

shibing624/textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...

ai-forever/ru-gpts

Russian GPT3 models.

AdityaNG/kan-gpt

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...

Explore Transformer Models

All categories Trending Transformer directory Insights