hpdps-group/ElasticMM
ElasticMM: Elastic and Efficient MLLM Serving System
This system helps organizations efficiently manage the computational resources needed to run large artificial intelligence models that understand both text and images. It takes in requests for text-only or multimodal AI inferences and outputs the processed results much faster and more cost-effectively. It's designed for AI infrastructure engineers or MLOps teams who deploy and maintain AI services.
Use this if you are running AI services that involve large language models and vision models, and you need to serve a high volume of diverse requests (both text-only and text-and-image) while optimizing GPU usage.
Not ideal if you are running small-scale AI models or do not have access to multiple high-end GPUs.
Stars
41
Forks
2
Language
Python
License
—
Category
Last pushed
Dec 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/hpdps-group/ElasticMM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips