datawhalechina/llm-deploy

大模型/LLM推理和部署理论与实践

/ 100

Emerging

This project provides practical guidance and theoretical foundations for deploying large language models (LLMs) into production. It helps turn trained LLMs into live services that can handle user requests efficiently. The output is a robust, optimized LLM serving system. This resource is for algorithm engineers and anyone interested in the technical aspects of deploying LLMs.

381 stars. No commits in the last 6 months.

Use this if you are an algorithm engineer or student needing to understand the end-to-end process of taking a large language model from development to a live, performant service.

Not ideal if you are looking for an introduction to training LLMs or their applications, as this focuses specifically on the deployment and inference stages.

LLM deployment model serving inference optimization AI engineering machine learning operations

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

381

Forks

Language

—

License

—

Higher-rated alternatives

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

Explore Transformer Models

All categories Trending Transformer directory Insights