Llm Evaluation Benchmarking LLM Tools

There are 7 llm evaluation benchmarking tools tracked. 1 score above 50 (established tier). The highest-rated is jeinlee1991/chinese-llm-benchmark at 52/100 with 5,675 stars. 1 of the top 10 are actively maintained.

Get all 7 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-evaluation-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	jeinlee1991/chinese-llm-benchmark ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括359个大模型，覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pr...	52	Established	5,675	—
2	bvobart/mllint `mllint` is a command-line utility to evaluate the technical quality of...	42	Emerging	80	Go
3	ApextheBoss/canary 🐤 Know when your LLM provider silently degrades. Automated quality testing...	23	Experimental	1	Python
4	Software-Engineering-Arena/SWE-Chatbot-Arena Compare chatbots pairwise via multi‑round evaluations for SE tasks.	23	Experimental	13	Python
5	oolong-tea-2026/arena-ai-leaderboards 📊 Daily auto-updated snapshots of all Arena AI (LMSYS Chatbot Arena)...	22	Experimental	—	Python
6	abject-milkingmachine273/llm-cost-dashboard Monitor LLM token costs in real time with a terminal dashboard offering...	14	Experimental	—	—
7	Neiwone/ia-project Here are a program of candidate classification system made with pair...	11	Experimental	—	Python