lix19937/llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

/ 100

Experimental

This project helps AI infrastructure engineers optimize large language model (LLM) inference. It provides techniques and frameworks to accelerate the processing of LLMs, reducing the time it takes to get responses (latency) and increasing the number of requests handled per second (throughput). The end-users are engineers responsible for deploying and maintaining LLM-powered applications in production environments.

Use this if you are an AI infrastructure engineer looking to maximize the performance and efficiency of large language model deployments, especially in high-concurrency scenarios or when system resources are limited.

Not ideal if you are a developer simply using LLMs via an API or a researcher prototyping models without needing production-level optimization.

AI-infrastructure LLM-deployment model-serving GPU-optimization inference-acceleration

No License No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Compare

llm-deploy and FastDeploy llm-deploy and mlc-llm

Higher-rated alternatives

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

Explore Transformer Models

All categories Trending Transformer directory Insights