PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
This tool helps machine learning engineers and AI researchers deploy large language models (LLMs) and vision-language models (VLMs) efficiently. It takes trained PaddlePaddle-based models and optimizes them for high-performance inference, outputting a production-ready deployment solution. You would use this if you need to serve advanced AI models like ERNIE-4.5 or PaddleOCR-VL in real-world applications with speed and reliability.
3,659 stars. Actively maintained with 221 commits in the last 30 days.
Use this if you need to rapidly deploy and serve large language or vision-language AI models from the PaddlePaddle ecosystem, requiring high performance and compatibility with various hardware.
Not ideal if your primary focus is on training new models or if you are not working with PaddlePaddle-based LLMs or VLMs.
Stars
3,659
Forks
720
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
221
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/PaddlePaddle/FastDeploy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips
AmpereComputingAI/ampere_model_library
AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)