ConardLi/easy-dataset
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
This tool helps AI practitioners transform various domain-specific documents (like PDFs, Markdown, Word docs) into high-quality, structured datasets for training and evaluating Large Language Models (LLMs). It takes your raw documents and converts them into fine-tuning datasets, retrieval-augmented generation (RAG) data, or evaluation datasets with intelligent questions and answers. It is ideal for data scientists, AI engineers, or researchers working to build or improve specialized LLMs.
13,613 stars. Actively maintained with 44 commits in the last 30 days.
Use this if you need to create, clean, and enrich custom datasets from your proprietary documents to fine-tune LLMs, enhance RAG systems, or rigorously evaluate model performance.
Not ideal if you are looking for a general-purpose data labeling tool for image classification or traditional NLP tasks that don't involve LLMs.
Stars
13,613
Forks
1,361
Language
JavaScript
License
—
Category
Last pushed
Mar 11, 2026
Commits (30d)
44
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/ConardLi/easy-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related tools
ItzCrazyKns/Vane
Vane is an AI-powered answering engine.
xuwei95/ezdata
基于python和llm大模型开发的数据处理和任务调度系统。...
ModelEngine-Group/DataMate
DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG...
DS4SD/deepsearch-toolkit
Interact with the Deep Search platform for new knowledge explorations and discoveries
mithun50/TreeDex
Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.