CSLiJT/awesome-lm-evaluation-methodologies
Frontier papers in the evaluation methodologies of language models.
This resource provides a curated collection of research papers focused on evaluating large language models (LLMs). It helps AI researchers, machine learning engineers, and data scientists understand and apply the latest methods for assessing LLM performance, reliability, and safety. You input specific keywords related to evaluation, and it provides links to relevant academic papers.
No commits in the last 6 months.
Use this if you are developing or working with large language models and need to find state-of-the-art methods and benchmarks to evaluate their capabilities.
Not ideal if you are a non-technical user looking for a simple tool to assess an LLM's output without delving into academic research papers.
Stars
10
Forks
—
Language
—
License
MIT
Category
Last pushed
Oct 14, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/CSLiJT/awesome-lm-evaluation-methodologies"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents