luo-junyu/Awesome-Data-Efficient-LLM

A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

25
/ 100
Experimental

This resource helps machine learning researchers and engineers discover efficient ways to train Large Language Models (LLMs). It provides a curated list of research papers on techniques for making LLMs better using less data. You'll find methods for selecting the most valuable data or generating synthetic data to improve LLM performance, reducing the time and computational resources needed for development.

No commits in the last 6 months.

Use this if you are a machine learning practitioner working with LLMs and want to find research-backed strategies to improve model performance and reduce training costs by optimizing your data usage.

Not ideal if you are looking for ready-to-use software, code implementations, or a general introduction to LLMs; this is a research paper collection.

Large Language Models Machine Learning Research Data Efficiency LLM Training AI Development
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

52

Forks

4

Language

License

Last pushed

Feb 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/luo-junyu/Awesome-Data-Efficient-LLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.