thansen0/fastllm.cpp

A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.

23
/ 100
Experimental

This project helps developers and system architects deploy large language models (LLMs) on their own infrastructure, ensuring very fast response times. It takes a pre-trained LLM (in GGUF format) and provides an API service that other applications can call. This is ideal for backend engineers or MLOps specialists building applications that rely on immediate LLM responses.

No commits in the last 6 months.

Use this if you need to integrate a private, high-speed LLM inference service directly into your applications, avoiding the latency of external cloud APIs.

Not ideal if you are looking for a pre-packaged, ready-to-use LLM without local setup or if your application can tolerate higher latency from cloud-based LLM providers.

LLM deployment API development low-latency systems MLOps backend engineering
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

11

Forks

Language

C++

License

Unlicense

Last pushed

Jun 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/thansen0/fastllm.cpp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.