deepspeedai/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
This is for machine learning engineers and MLOps specialists who deploy large language models or text-to-image models. It helps you serve your models, like Llama-2 or Stable Diffusion, more efficiently. You provide a trained model and it outputs a high-performance, low-latency inference service, enabling applications to get faster responses from your AI models.
2,099 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to serve large AI models with extremely high throughput and minimal delay, ensuring your applications can respond quickly to user requests, especially for text generation or image creation tasks.
Not ideal if you are looking for a solution to train AI models or if your primary concern is not about optimizing inference speed and cost for large models.
Stars
2,099
Forks
190
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 30, 2025
Commits (30d)
0
Dependencies
18
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/deepspeedai/DeepSpeed-MII"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
mlcommons/training
Reference implementations of MLPerf® training benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning