SeekingDream/Static-to-Dynamic-LLMEval
The official GitHub repository of the paper "Recent advances in large language model benchmarks against data contamination: From static to dynamic evaluation"
This survey helps AI researchers and practitioners understand and mitigate 'data contamination' when evaluating large language models (LLMs). It provides a comprehensive analysis of existing static and dynamic benchmarking methods designed to prevent inflated performance scores. The output is a clear guide and proposed design principles for creating more reliable LLM evaluations.
547 stars.
Use this if you are developing or evaluating large language models and need to ensure your benchmark results accurately reflect the model's capabilities without bias from contaminated training data.
Not ideal if you are looking for an off-the-shelf tool to directly run benchmarks; this project is a research survey providing insights and guidelines rather than executable code for immediate evaluation.
Stars
547
Forks
45
Language
—
License
—
Category
Last pushed
Mar 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/SeekingDream/Static-to-Dynamic-LLMEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents