efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

38
/ 100
Emerging

Nanoflow is designed for engineers and MLOps professionals who need to efficiently run large language models (LLMs) in production. It takes common LLM models (like Llama 2, Llama 3, or Qwen2) and serves them faster and more efficiently than other systems. The output is a highly responsive LLM service that can handle more user requests with the same hardware.

949 stars.

Use this if you need to serve large language models to many users and want to maximize the number of requests your existing GPU hardware can handle.

Not ideal if you are a data scientist performing one-off model experiments or if you are serving models other than large language models.

LLM-serving model-deployment production-AI machine-learning-operations AI-infrastructure
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

949

Forks

47

Language

Jupyter Notebook

License

Last pushed

Oct 29, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/efeslab/Nanoflow"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.