JonathanChavezTamales/llm-leaderboard

A comprehensive set of LLM benchmark scores and provider prices. (deprecated, read more in README)

/ 100

Emerging

This provides a comprehensive and community-driven resource for comparing large language models (LLMs). It compiles detailed information, including model parameters, pricing, performance metrics like throughput and latency, and standardized benchmark results across various tests. This is for anyone from researchers to business strategists who needs to evaluate and select the best LLMs for specific applications or understand the competitive landscape.

362 stars.

Use this if you need to compare different LLMs based on their technical specifications, benchmark performance, and pricing to make informed decisions for your projects.

Not ideal if you're looking for real-time API monitoring or advanced analytics beyond aggregated benchmark scores.

LLM evaluation AI model selection machine learning research AI strategy language model comparison

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

362

Forks

Language

JavaScript

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...

IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...

lean-dojo/LeanDojo

Tool for data extraction and interacting with Lean programmatically.

GoodStartLabs/AI_Diplomacy

Frontier Models playing the board game Diplomacy.

google/litmus

Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...

Explore LLM Tools

All categories Trending LLM Tool directory Insights