ModelTC/LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
LightLLM helps machine learning engineers and MLOps teams efficiently deploy and manage Large Language Models (LLMs). It takes a trained LLM as input and provides a high-speed, scalable serving framework, enabling applications to quickly get responses from the model. This is for professionals building and maintaining systems that rely on fast, reliable LLM interactions.
3,944 stars. Actively maintained with 23 commits in the last 30 days.
Use this if you need to serve large language models with high performance and scalability, ensuring quick responses for your applications.
Not ideal if you are looking for a tool to train LLMs or a pre-built application that uses LLMs, rather than a serving infrastructure.
Stars
3,944
Forks
307
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
23
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ModelTC/LightLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...