MLGroupJLU/LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

/ 100

Emerging

This resource provides a curated collection of research papers and materials focused on evaluating Large Language Models (LLMs). It helps researchers and practitioners understand various aspects of LLM performance, covering topics from natural language processing tasks like sentiment analysis and reasoning, to robustness and ethical considerations. The collection is organized to help users quickly find relevant studies on how LLMs are assessed.

1,591 stars. No commits in the last 6 months.

Use this if you are an AI researcher, LLM developer, or academic looking for a comprehensive overview of current research and benchmarks on evaluating Large Language Models.

Not ideal if you are looking for a practical guide on how to evaluate a specific LLM, or if you need code implementations for evaluation metrics.

AI-research LLM-evaluation natural-language-processing machine-learning-research academic-research

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 17 / 25

How are scores calculated?

Stars

1,591

Forks

100

Language

—

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights