InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

/ 100

Established

Deploying and serving large language models (LLMs) or visual language models (VLMs) can be complex and resource-intensive. This toolkit helps you compress these models and efficiently serve them so you can get more responses per second from your hardware. It takes your existing large language or visual models and outputs an optimized, ready-to-serve model, making it ideal for engineers and MLOps professionals managing AI inference infrastructure.

7,680 stars. Actively maintained with 56 commits in the last 30 days.

Use this if you need to serve large language models or visual language models on your own infrastructure and want to maximize efficiency and throughput while minimizing hardware costs.

Not ideal if you're a casual user looking for a pre-built chatbot or an API-based service, as this tool requires technical expertise to set up and manage.

LLM deployment MLOps AI inference model optimization large language models

No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

7,680

Forks

661

Language

Python

License

Apache-2.0

Recent Releases

v0.12.3 08 Apr 2026 v0.12.2 18 Mar 2026 v0.12.1 13 Feb 2026 v0.12.0 04 Feb 2026 v0.11.1 24 Dec 2025

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights