onejune2018/Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

/ 100

Emerging

This project helps AI researchers, machine learning engineers, and data scientists understand and benchmark the performance of large language models (LLMs). It brings together a comprehensive list of tools, datasets, benchmarks, and leaderboards. Practitioners can use this to identify relevant resources for evaluating LLMs, whether for general capabilities, domain-specific tasks, or specific attributes like inference speed or factuality.

616 stars.

Use this if you need a centralized, up-to-date resource to find tools, datasets, and methodologies for evaluating large language models across various criteria and domains.

Not ideal if you are looking for an off-the-shelf software library to run evaluations directly, as this is primarily a curated list of external resources.

LLM evaluation Generative AI research NLP benchmarking AI model assessment Machine learning engineering

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

616

Forks

Language

—

License

MIT

Higher-rated alternatives

SepineTam/stata-mcp

Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.

datawhalechina/code-your-own-llm

一份全栈式大语言模型参考指南，用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节

leonid20000/odin-slides

This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint...

axhiao/QuickNote

Capture what you want with LLM

R3gm/InsightSolver-Colab

InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning,...

Explore LLM Tools

All categories Trending LLM Tool directory Insights