ConardLi/easy-dataset

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

66
/ 100
Established

This tool helps AI practitioners transform various domain-specific documents (like PDFs, Markdown, Word docs) into high-quality, structured datasets for training and evaluating Large Language Models (LLMs). It takes your raw documents and converts them into fine-tuning datasets, retrieval-augmented generation (RAG) data, or evaluation datasets with intelligent questions and answers. It is ideal for data scientists, AI engineers, or researchers working to build or improve specialized LLMs.

13,613 stars. Actively maintained with 44 commits in the last 30 days.

Use this if you need to create, clean, and enrich custom datasets from your proprietary documents to fine-tune LLMs, enhance RAG systems, or rigorously evaluate model performance.

Not ideal if you are looking for a general-purpose data labeling tool for image classification or traditional NLP tasks that don't involve LLMs.

LLM fine-tuning RAG data preparation AI model evaluation natural language processing text data management
No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

13,613

Forks

1,361

Language

JavaScript

License

Last pushed

Mar 11, 2026

Commits (30d)

44

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/ConardLi/easy-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.