zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

59
/ 100
Established

ZhiLight is a specialized engine designed to speed up the process of generating text from large language models (LLMs) like Llama and its variants. It takes your trained LLM and, by optimizing how the model runs on NVIDIA GPUs, delivers faster responses and more outputs per second. This tool is for AI engineers or machine learning operations specialists who deploy and manage LLMs in production.

905 stars. Actively maintained with 4 commits in the last 30 days.

Use this if you need to accelerate the performance of your Llama-based language models, especially on PCIe-based NVIDIA GPUs, to handle more user requests or reduce response times.

Not ideal if your LLM infrastructure does not primarily use NVIDIA GPUs or if you are not deploying Llama or similar models.

LLM deployment AI infrastructure GPU optimization model serving MLOps
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

905

Forks

102

Language

C++

License

Apache-2.0

Last pushed

Mar 11, 2026

Commits (30d)

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zhihu/ZhiLight"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.