lyy1994/awesome-data-contamination

The Paper List on Data Contamination for Large Language Models Evaluation.

/ 100

Emerging

This resource helps researchers and practitioners evaluate Large Language Models (LLMs) accurately by addressing the problem of "data contamination." It provides a curated list of research papers that analyze, prevent, or detect instances where LLMs might have inadvertently seen evaluation data during their training. Users can consult this list to understand how to ensure their LLM benchmarks reflect true model capabilities, not just memorization.

110 stars.

Use this if you are a researcher, data scientist, or engineer developing or evaluating large language models and need to understand, detect, or prevent data contamination that can skew performance metrics.

Not ideal if you are looking for a general introduction to LLMs or seeking pre-trained models, as this resource focuses specifically on the technical issue of data contamination in evaluation.

LLM evaluation model benchmarking data quality AI ethics machine learning research

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

110

Forks

Language

—

License

MIT

Higher-rated alternatives

rafska/awesome-local-llm

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

KalyanKS-NLP/llm-engineer-toolkit

A curated list of 120+ LLM libraries category wise.

yzhao062/anomaly-detection-resources

Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM...

llm-jp/awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

InftyAI/Awesome-LLMOps

🎉 An awesome & curated list of best LLMOps tools.

Explore LLM Tools

All categories Trending LLM Tool directory Insights