deepspeedai/DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

57
/ 100
Established

This is for machine learning engineers and MLOps specialists who deploy large language models or text-to-image models. It helps you serve your models, like Llama-2 or Stable Diffusion, more efficiently. You provide a trained model and it outputs a high-performance, low-latency inference service, enabling applications to get faster responses from your AI models.

2,099 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to serve large AI models with extremely high throughput and minimal delay, ensuring your applications can respond quickly to user requests, especially for text generation or image creation tasks.

Not ideal if you are looking for a solution to train AI models or if your primary concern is not about optimizing inference speed and cost for large models.

Large Language Model Deployment MLOps AI Inference Optimization Real-time AI Generative AI Serving
Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

2,099

Forks

190

Language

Python

License

Apache-2.0

Last pushed

Jun 30, 2025

Commits (30d)

0

Dependencies

18

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/deepspeedai/DeepSpeed-MII"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.