hpdps-group/ElasticMM

ElasticMM: Elastic and Efficient MLLM Serving System

/ 100

Emerging

This system helps organizations efficiently manage the computational resources needed to run large artificial intelligence models that understand both text and images. It takes in requests for text-only or multimodal AI inferences and outputs the processed results much faster and more cost-effectively. It's designed for AI infrastructure engineers or MLOps teams who deploy and maintain AI services.

Use this if you are running AI services that involve large language models and vision models, and you need to serve a high volume of diverse requests (both text-only and text-and-image) while optimizing GPU usage.

Not ideal if you are running small-scale AI models or do not have access to multiple high-end GPUs.

AI infrastructure MLOps model deployment AI inference GPU resource management

No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

Explore Transformer Models

All categories Trending Transformer directory Insights