ConardLi/easy-dataset

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

/ 100

Established

This tool helps AI practitioners transform various domain-specific documents (like PDFs, Markdown, Word docs) into high-quality, structured datasets for training and evaluating Large Language Models (LLMs). It takes your raw documents and converts them into fine-tuning datasets, retrieval-augmented generation (RAG) data, or evaluation datasets with intelligent questions and answers. It is ideal for data scientists, AI engineers, or researchers working to build or improve specialized LLMs.

13,613 stars. Actively maintained with 44 commits in the last 30 days.

Use this if you need to create, clean, and enrich custom datasets from your proprietary documents to fine-tune LLMs, enhance RAG systems, or rigorously evaluate model performance.

Not ideal if you are looking for a general-purpose data labeling tool for image classification or traditional NLP tasks that don't involve LLMs.

LLM fine-tuning RAG data preparation AI model evaluation natural language processing text data management

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

13,613

Forks

1,361

Language

JavaScript

License

—

Recent Releases

1.7.3 09 Apr 2026 1.7.2 25 Feb 2026 1.7.1 24 Jan 2026 1.7.0 12 Jan 2026 1.6.2 29 Dec 2025

Related tools

ItzCrazyKns/Vane

Vane is an AI-powered answering engine.

xuwei95/ezdata

基于python和llm大模型开发的数据处理和任务调度系统。...

ModelEngine-Group/DataMate

DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG...

DS4SD/deepsearch-toolkit

Interact with the Deep Search platform for new knowledge explorations and discoveries

mithun50/TreeDex

Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.

Explore RAG Tools

All categories Trending RAG directory Insights