koalazf99/Awesome-DataCentric-LLM

Trending projects & awesome papers about data-centric llm studies.

23
/ 100
Experimental

This resource helps machine learning researchers and engineers explore leading-edge techniques for improving large language models (LLMs) through better data. It provides a curated list of research papers and open-source projects focused on data collection, quality assessment, and evaluation strategies for LLM training. The output is a deeper understanding of how to build more effective and efficient LLMs.

No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models and want to discover the latest research and tools for optimizing your training data.

Not ideal if you are looking for ready-to-use LLM applications or a beginner's guide to natural language processing.

LLM training data curation AI research natural language processing machine learning engineering
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 6 / 25

How are scores calculated?

Stars

40

Forks

2

Language

License

Last pushed

May 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/koalazf99/Awesome-DataCentric-LLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.