onejune2018/Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
This project helps AI researchers, machine learning engineers, and data scientists understand and benchmark the performance of large language models (LLMs). It brings together a comprehensive list of tools, datasets, benchmarks, and leaderboards. Practitioners can use this to identify relevant resources for evaluating LLMs, whether for general capabilities, domain-specific tasks, or specific attributes like inference speed or factuality.
616 stars.
Use this if you need a centralized, up-to-date resource to find tools, datasets, and methodologies for evaluating large language models across various criteria and domains.
Not ideal if you are looking for an off-the-shelf software library to run evaluations directly, as this is primarily a curated list of external resources.
Stars
616
Forks
51
Language
—
License
MIT
Category
Last pushed
Nov 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/onejune2018/Awesome-LLM-Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SepineTam/stata-mcp
Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.
datawhalechina/code-your-own-llm
一份全栈式大语言模型参考指南,用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节
leonid20000/odin-slides
This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint...
axhiao/QuickNote
Capture what you want with LLM
R3gm/InsightSolver-Colab
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning,...