modal-labs/stopwatch

A tool for benchmarking LLMs on Modal

/ 100

Emerging

This tool helps machine learning engineers and researchers evaluate the performance of large language models (LLMs) served on Modal. You provide the LLM (like Llama-3.1-8B-Instruct) and the serving framework (such as vLLM, SGLang, or TensorRT-LLM), and it outputs benchmark results or performance profiles. This allows you to understand how different models and serving configurations perform under various load conditions.

No commits in the last 6 months.

Use this if you need to systematically compare the speed and efficiency of different LLMs and serving backends (like vLLM, SGLang, or TensorRT-LLM) when deployed on Modal.

Not ideal if you are not deploying LLMs on Modal or if you need to benchmark other types of machine learning models beyond LLMs.

LLM-benchmarking model-performance ML-operations cloud-infrastructure model-serving

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

allenai/RL4LMs

A modular RL library to fine-tune language models to human preferences

emredeveloper/Mem-LLM

Mem-LLM is a Python library for building memory-enabled AI assistants that run entirely on local...

cloudguruab/modsysML

Human reinforcement learning (RLHF) framework for AI models. Evaluate and compare LLM outputs,...

ManasVardhan/bench-my-llm

🏎️ Dead-simple LLM benchmarking CLI - latency, cost, and quality metrics

Mya-Mya/CBF-LLM

"CBF-LLM: Safe Control for LLM Alignment"

Explore Transformer Models

All categories Trending Transformer directory Insights