InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Deploying and serving large language models (LLMs) or visual language models (VLMs) can be complex and resource-intensive. This toolkit helps you compress these models and efficiently serve them so you can get more responses per second from your hardware. It takes your existing large language or visual models and outputs an optimized, ready-to-serve model, making it ideal for engineers and MLOps professionals managing AI inference infrastructure.
7,680 stars. Actively maintained with 56 commits in the last 30 days.
Use this if you need to serve large language models or visual language models on your own infrastructure and want to maximize efficiency and throughput while minimizing hardware costs.
Not ideal if you're a casual user looking for a pre-built chatbot or an API-based service, as this tool requires technical expertise to set up and manage.
Stars
7,680
Forks
661
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
56
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/InternLM/lmdeploy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...