NetEase-Media/grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

42
/ 100
Emerging

This project helps large organizations and tech companies deploy high-performance AI large language models (LLMs) and multimodal models for various internal and external applications. It takes in user prompts, images, and other data, processing them through advanced AI models to generate text, facilitate AI agent workflows, and execute function calls. This is ideal for AI product managers, machine learning operations engineers, and technical leaders who need to serve advanced AI capabilities with maximum efficiency.

158 stars.

Use this if you need to run large language models and multimodal AI services with superior speed and efficiency compared to existing solutions, especially for AI agents or function calling applications.

Not ideal if you are a single user or small team without significant GPU resources, as this project is designed for high-scale, production-grade AI inference.

AI-service-deployment large-language-models multimodal-AI AI-agent-orchestration high-performance-inference
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

158

Forks

11

Language

Python

License

Apache-2.0

Last pushed

Dec 08, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NetEase-Media/grps_trtllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.