dependentsign/Awesome-LLM-based-Evaluators

✨✨Latest Papers about LLM-based Evaluators

/ 100

Experimental

This is a curated collection of research papers focused on evaluating large language models (LLMs) using other LLMs, rather than human review. It helps researchers and practitioners understand the latest advancements in automatically assessing LLM performance. The output is a list of relevant academic papers, often with links to their code, for those working on or interested in the quality of AI language models.

Use this if you are a researcher or AI practitioner needing to stay current with academic work on automated LLM evaluation methods.

Not ideal if you are looking for ready-to-use software or a guide on how to implement LLM evaluations yourself, as this primarily lists academic papers.

AI-evaluation natural-language-processing-research large-language-models AI-benchmarking machine-learning-research

No License No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

FairyFali/SLMs-Survey

Survey of Small Language Models from Penn State, ...

USC-FORTIS/AD-LLM

[ACL Findings 2025] A benchmark for anomaly detection using large language models. It supports...

swordlidev/Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

zabir-nabil/awesome-multilingual-large-language-models

A comprehensive collection of multilingual datasets and large language models, meticulously...

AIoT-MLSys-Lab/Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

Explore Transformer Models

All categories Trending Transformer directory Insights