alibaba/ServeGen
A framework for generating realistic LLM serving workloads
This tool helps engineers and researchers simulate realistic demands on large language model (LLM) serving systems. You input parameters like request rates and model types, and it outputs a sequence of simulated user requests, mirroring the complex, dynamic patterns observed in real production environments. This is for anyone responsible for designing, evaluating, or optimizing the infrastructure that runs large language models.
106 stars. No commits in the last 6 months.
Use this if you need to test the performance, scalability, or cost-effectiveness of an LLM serving system before deploying it or making changes to your existing setup.
Not ideal if you're looking for a tool to generate synthetic text for training LLMs or for benchmarking the LLM's natural language generation quality.
Stars
106
Forks
10
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/alibaba/ServeGen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency,...
NotPunchnox/rkllama
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning...
sophgo/LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
Deep-Spark/DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of...
howard-hou/VisualRWKV
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle...