Llm Evaluation Benchmarking LLM Tools
There are 7 llm evaluation benchmarking tools tracked. 1 score above 50 (established tier). The highest-rated is jeinlee1991/chinese-llm-benchmark at 52/100 with 5,675 stars. 1 of the top 10 are actively maintained.
Get all 7 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-evaluation-benchmarking&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
jeinlee1991/chinese-llm-benchmark
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pr... |
|
Established |
| 2 |
bvobart/mllint
`mllint` is a command-line utility to evaluate the technical quality of... |
|
Emerging |
| 3 |
ApextheBoss/canary
🐤 Know when your LLM provider silently degrades. Automated quality testing... |
|
Experimental |
| 4 |
Software-Engineering-Arena/SWE-Chatbot-Arena
Compare chatbots pairwise via multi‑round evaluations for SE tasks. |
|
Experimental |
| 5 |
oolong-tea-2026/arena-ai-leaderboards
📊 Daily auto-updated snapshots of all Arena AI (LMSYS Chatbot Arena)... |
|
Experimental |
| 6 |
abject-milkingmachine273/llm-cost-dashboard
Monitor LLM token costs in real time with a terminal dashboard offering... |
|
Experimental |
| 7 |
Neiwone/ia-project
Here are a program of candidate classification system made with pair... |
|
Experimental |