SeekingDream/Static-to-Dynamic-LLMEval

The official GitHub repository of the paper "Recent advances in large language model benchmarks against data contamination: From static to dynamic evaluation"

43
/ 100
Emerging

This survey helps AI researchers and practitioners understand and mitigate 'data contamination' when evaluating large language models (LLMs). It provides a comprehensive analysis of existing static and dynamic benchmarking methods designed to prevent inflated performance scores. The output is a clear guide and proposed design principles for creating more reliable LLM evaluations.

547 stars.

Use this if you are developing or evaluating large language models and need to ensure your benchmark results accurately reflect the model's capabilities without bias from contaminated training data.

Not ideal if you are looking for an off-the-shelf tool to directly run benchmarks; this project is a research survey providing insights and guidelines rather than executable code for immediate evaluation.

LLM evaluation AI model benchmarking AI research Data integrity Machine learning ethics
No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

547

Forks

45

Language

License

Last pushed

Mar 03, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/SeekingDream/Static-to-Dynamic-LLMEval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.