weAIDB/awesome-data-llm
Official Repository of "LLM × DATA" Survey Paper
This resource provides a comprehensive overview of how Large Language Models (LLMs) interact with data across various stages, from initial preparation to analysis and system optimization. It consolidates research papers and projects into a structured collection, offering insights into data characteristics, processing, storage, and serving for LLMs. Data scientists, machine learning engineers, and researchers working with LLMs will find this a valuable guide.
740 stars. Actively maintained with 8 commits in the last 30 days.
Use this if you are developing or working with Large Language Models and need to understand best practices and emerging trends in data handling, quality, and preparation for optimal model performance.
Not ideal if you are looking for a practical tool or library for direct use, as this is primarily a survey and collection of research papers.
Stars
740
Forks
66
Language
—
License
—
Category
Last pushed
Mar 05, 2026
Commits (30d)
8
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/weAIDB/awesome-data-llm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
monarch-initiative/ontogpt
LLM-based ontological extraction tools, including SPIRES
AXYZdong/AMchat
AM (Advanced Mathematics) Chat is a large language model that integrates advanced mathematical...
skywalker023/sodaverse
🥤🧑🏻🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with...
Y-Research-SBU/TimeSeriesScientist
Official Repository for TimeSeriesScientist
open-chinese/poetry-collection
中文《诗歌总集》,距今为止最全面,最系统的中文诗词数据集,统一数据建模.