jpreagan/llmnop

A tool for measuring LLM performance metrics.

/ 100

Emerging

This tool helps AI/ML operations engineers and MLOps professionals evaluate the real-world performance of large language models (LLMs) served via API endpoints. You provide details like the API URL, model name, and desired input/output token counts, and it produces detailed metrics on latency (like time to first token) and throughput. This allows you to compare different LLM providers, validate deployments, or optimize serving configurations.

Use this if you need to reliably measure how fast your LLM inference endpoints are responding and generating tokens under various load conditions.

Not ideal if you're a data scientist primarily focused on model accuracy or training, rather than the operational performance of deployed models.

LLM-operations MLOps API-benchmarking inference-performance system-tuning

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Rust

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...

IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...

lean-dojo/LeanDojo

Tool for data extraction and interacting with Lean programmatically.

GoodStartLabs/AI_Diplomacy

Frontier Models playing the board game Diplomacy.

google/litmus

Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...

Explore LLM Tools

All categories Trending LLM Tool directory Insights