deepspeedai/DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

/ 100

Established

This is for machine learning engineers and MLOps specialists who deploy large language models or text-to-image models. It helps you serve your models, like Llama-2 or Stable Diffusion, more efficiently. You provide a trained model and it outputs a high-performance, low-latency inference service, enabling applications to get faster responses from your AI models.

2,099 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to serve large AI models with extremely high throughput and minimal delay, ensuring your applications can respond quickly to user requests, especially for text generation or image creation tasks.

Not ideal if you are looking for a solution to train AI models or if your primary concern is not about optimizing inference speed and cost for large models.

Large Language Model Deployment MLOps AI Inference Optimization Real-time AI Generative AI Serving

Stale 6m

Maintenance 2 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

2,099

Forks

190

Language

Python

License

Apache-2.0

Related frameworks

NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...

mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

mlcommons/training

Reference implementations of MLPerf® training benchmarks

datamade/usaddress

:us: a python library for parsing unstructured United States address strings into address components

GRAAL-Research/deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Explore ML Frameworks

All categories Trending ML Framework directory Insights