lix19937/llm-deploy
AI Infra LLM infer/ tensorrt-llm/ vllm
This project helps AI infrastructure engineers optimize large language model (LLM) inference. It provides techniques and frameworks to accelerate the processing of LLMs, reducing the time it takes to get responses (latency) and increasing the number of requests handled per second (throughput). The end-users are engineers responsible for deploying and maintaining LLM-powered applications in production environments.
Use this if you are an AI infrastructure engineer looking to maximize the performance and efficiency of large language model deployments, especially in high-concurrency scenarios or when system resources are limited.
Not ideal if you are a developer simply using LLMs via an API or a researcher prototyping models without needing production-level optimization.
Stars
22
Forks
1
Language
Python
License
—
Category
Last pushed
Mar 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lix19937/llm-deploy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips