CSLiJT/awesome-lm-evaluation-methodologies

Frontier papers in the evaluation methodologies of language models.

/ 100

Experimental

This resource provides a curated collection of research papers focused on evaluating large language models (LLMs). It helps AI researchers, machine learning engineers, and data scientists understand and apply the latest methods for assessing LLM performance, reliability, and safety. You input specific keywords related to evaluation, and it provides links to relevant academic papers.

No commits in the last 6 months.

Use this if you are developing or working with large language models and need to find state-of-the-art methods and benchmarks to evaluate their capabilities.

Not ideal if you are a non-technical user looking for a simple tool to assess an LLM's output without delving into academic research papers.

AI research machine learning engineering natural language processing LLM evaluation AI model assessment

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights