alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
This is a high-performance engine for deploying large language models (LLMs) in real-world applications. It takes your trained LLM, potentially with multimodal inputs like images and text, and efficiently generates responses for a large number of users. It is designed for engineers and AI product managers responsible for running LLM-powered services like AI assistants or smart search features at scale.
1,065 stars. Actively maintained with 163 commits in the last 30 days.
Use this if you need to run large language models reliably and quickly for many users within a production environment, especially for applications like AI chatbots, intelligent customer support, or advanced search.
Not ideal if you are only experimenting with LLMs on a personal computer or do not require enterprise-grade performance and scalability for your AI applications.
Stars
1,065
Forks
159
Language
Cuda
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
163
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/alibaba/rtp-llm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...