FastDeploy and llm-deploy

PaddlePaddle/FastDeploy is an ecosystem sibling to lix19937/llm-deploy, as the former is a high-performance deployment toolkit for inference, which could be leveraged as a backend for the latter, an AI infrastructure tool for LLM inference that integrates various runtimes like TensorRT-LLM and vLLM.

FastDeploy
73
Verified
llm-deploy
28
Experimental
Maintenance 22/25
Adoption 10/25
Maturity 16/25
Community 25/25
Maintenance 10/25
Adoption 6/25
Maturity 8/25
Community 4/25
Stars: 3,659
Forks: 720
Downloads:
Commits (30d): 221
Language: Python
License: Apache-2.0
Stars: 22
Forks: 1
Downloads:
Commits (30d): 0
Language: Python
License:
No Package No Dependents
No License No Package No Dependents

About FastDeploy

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

This tool helps machine learning engineers and AI researchers deploy large language models (LLMs) and vision-language models (VLMs) efficiently. It takes trained PaddlePaddle-based models and optimizes them for high-performance inference, outputting a production-ready deployment solution. You would use this if you need to serve advanced AI models like ERNIE-4.5 or PaddleOCR-VL in real-world applications with speed and reliability.

AI model deployment Large Language Models Vision-Language Models AI inference optimization Machine Learning Engineering

About llm-deploy

lix19937/llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

This project helps AI infrastructure engineers optimize large language model (LLM) inference. It provides techniques and frameworks to accelerate the processing of LLMs, reducing the time it takes to get responses (latency) and increasing the number of requests handled per second (throughput). The end-users are engineers responsible for deploying and maintaining LLM-powered applications in production environments.

AI-infrastructure LLM-deployment model-serving GPU-optimization inference-acceleration

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work