bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
AI/ML engineers use BentoML to turn their trained machine learning models into live prediction services. It takes your model code and dependencies, packages them into a standardized format, and creates a deployable API endpoint. This allows practitioners to easily serve various AI applications, from language models to computer vision systems, making them accessible for real-world use.
8,516 stars. Used by 4 other packages. Actively maintained with 14 commits in the last 30 days. Available on PyPI.
Use this if you need to quickly and efficiently deploy your trained AI or machine learning models as production-ready web services that can handle real-time inference requests.
Not ideal if you are looking for a platform to train your models or perform data preprocessing, as this tool focuses specifically on serving already-trained models.
Stars
8,516
Forks
927
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
14
Dependencies
42
Reverse dependents
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/bentoml/BentoML"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related tools
nndeploy/nndeploy
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
kubeflow/trainer
Distributed AI Model Training and LLM Fine-Tuning on Kubernetes
cncf/llm-in-action
🤖 Discover how to apply your LLM app skills on Kubernetes!
llmcloud24/de.KCD-Summer-School-2024
Learn how to deploy your own LLM in the de.NBI cloud via a step-by-step guided journey...
ray-project/llms-in-prod-workshop-2023
Deploy and Scale LLM-based applications