NetEase-Media/grps_trtllm
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
This project helps large organizations and tech companies deploy high-performance AI large language models (LLMs) and multimodal models for various internal and external applications. It takes in user prompts, images, and other data, processing them through advanced AI models to generate text, facilitate AI agent workflows, and execute function calls. This is ideal for AI product managers, machine learning operations engineers, and technical leaders who need to serve advanced AI capabilities with maximum efficiency.
158 stars.
Use this if you need to run large language models and multimodal AI services with superior speed and efficiency compared to existing solutions, especially for AI agents or function calling applications.
Not ideal if you are a single user or small team without significant GPU resources, as this project is designed for high-scale, production-grade AI inference.
Stars
158
Forks
11
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NetEase-Media/grps_trtllm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hassancs91/SimplerLLM
Simplify interactions with Large Language Models
tylerelyt/LLM-Workshop
🌟 Learn Large Language Model development through hands-on projects and real-world implementations
avilum/minrlm
Token-efficient Recursive Language Model. 3.6x fewer tokens than vanilla LLMs. Data never enters...
kyegomez/SingLoRA
This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix...
parvbhullar/superpilot
LLMs based multi-model framework for building AI apps.