open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
This platform helps you understand how well different large language models (LLMs) perform on various tasks. You input specific LLMs and datasets, and it outputs detailed evaluation scores and benchmarks. It's designed for researchers, developers, or anyone building applications with LLMs who needs to compare and select the best model for their needs.
6,752 stars. Actively maintained with 12 commits in the last 30 days. Available on PyPI.
Use this if you need to systematically evaluate the performance of different large language models across a wide range of datasets and benchmarks to make informed decisions.
Not ideal if you're looking for a simple tool to fine-tune an LLM or just want to run a quick test on a single model without comprehensive comparison.
Stars
6,752
Forks
743
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
12
Dependencies
49
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/open-compass/opencompass"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Recent Releases
Compare
Related tools
IBM/unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...
lean-dojo/LeanDojo
Tool for data extraction and interacting with Lean programmatically.
GoodStartLabs/AI_Diplomacy
Frontier Models playing the board game Diplomacy.
google/litmus
Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...
NatLabRockies/COMPASS
INFRA-COMPASS is a tool that leverages Large Language Models (LLMs) to create and maintain an...