lix19937/llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

28
/ 100
Experimental

This project helps AI infrastructure engineers optimize large language model (LLM) inference. It provides techniques and frameworks to accelerate the processing of LLMs, reducing the time it takes to get responses (latency) and increasing the number of requests handled per second (throughput). The end-users are engineers responsible for deploying and maintaining LLM-powered applications in production environments.

Use this if you are an AI infrastructure engineer looking to maximize the performance and efficiency of large language model deployments, especially in high-concurrency scenarios or when system resources are limited.

Not ideal if you are a developer simply using LLMs via an API or a researcher prototyping models without needing production-level optimization.

AI-infrastructure LLM-deployment model-serving GPU-optimization inference-acceleration
No License No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 4 / 25

How are scores calculated?

Stars

22

Forks

1

Language

Python

License

Last pushed

Mar 07, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lix19937/llm-deploy"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.