MLGroupJLU/LLM-eval-survey
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
This resource provides a curated collection of research papers and materials focused on evaluating Large Language Models (LLMs). It helps researchers and practitioners understand various aspects of LLM performance, covering topics from natural language processing tasks like sentiment analysis and reasoning, to robustness and ethical considerations. The collection is organized to help users quickly find relevant studies on how LLMs are assessed.
1,591 stars. No commits in the last 6 months.
Use this if you are an AI researcher, LLM developer, or academic looking for a comprehensive overview of current research and benchmarks on evaluating Large Language Models.
Not ideal if you are looking for a practical guide on how to evaluate a specific LLM, or if you need code implementations for evaluation metrics.
Stars
1,591
Forks
100
Language
—
License
—
Category
Last pushed
Jun 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MLGroupJLU/LLM-eval-survey"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents