modal-labs/stopwatch
A tool for benchmarking LLMs on Modal
This tool helps machine learning engineers and researchers evaluate the performance of large language models (LLMs) served on Modal. You provide the LLM (like Llama-3.1-8B-Instruct) and the serving framework (such as vLLM, SGLang, or TensorRT-LLM), and it outputs benchmark results or performance profiles. This allows you to understand how different models and serving configurations perform under various load conditions.
No commits in the last 6 months.
Use this if you need to systematically compare the speed and efficiency of different LLMs and serving backends (like vLLM, SGLang, or TensorRT-LLM) when deployed on Modal.
Not ideal if you are not deploying LLMs on Modal or if you need to benchmark other types of machine learning models beyond LLMs.
Stars
50
Forks
5
Language
Python
License
MIT
Category
Last pushed
Aug 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/modal-labs/stopwatch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
emredeveloper/Mem-LLM
Mem-LLM is a Python library for building memory-enabled AI assistants that run entirely on local...
cloudguruab/modsysML
Human reinforcement learning (RLHF) framework for AI models. Evaluate and compare LLM outputs,...
ManasVardhan/bench-my-llm
🏎️ Dead-simple LLM benchmarking CLI - latency, cost, and quality metrics
Mya-Mya/CBF-LLM
"CBF-LLM: Safe Control for LLM Alignment"