lyy1994/awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
This resource helps researchers and practitioners evaluate Large Language Models (LLMs) accurately by addressing the problem of "data contamination." It provides a curated list of research papers that analyze, prevent, or detect instances where LLMs might have inadvertently seen evaluation data during their training. Users can consult this list to understand how to ensure their LLM benchmarks reflect true model capabilities, not just memorization.
110 stars.
Use this if you are a researcher, data scientist, or engineer developing or evaluating large language models and need to understand, detect, or prevent data contamination that can skew performance metrics.
Not ideal if you are looking for a general introduction to LLMs or seeking pre-trained models, as this resource focuses specifically on the technical issue of data contamination in evaluation.
Stars
110
Forks
5
Language
—
License
MIT
Category
Last pushed
Jan 29, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lyy1994/awesome-data-contamination"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
rafska/awesome-local-llm
A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally
KalyanKS-NLP/llm-engineer-toolkit
A curated list of 120+ LLM libraries category wise.
yzhao062/anomaly-detection-resources
Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM...
llm-jp/awesome-japanese-llm
日本語LLMまとめ - Overview of Japanese LLMs
InftyAI/Awesome-LLMOps
🎉 An awesome & curated list of best LLMOps tools.