neuml/ragdata

📚 Build knowledge bases for RAG

/ 100

Emerging

This project helps AI developers and researchers build comprehensive knowledge bases for Retrieval Augmented Generation (RAG) applications. It takes raw data from large datasets like ArXiv and Wikipedia, processes them, and outputs structured embedding databases. These databases are then used by RAG systems to retrieve relevant information efficiently.

No commits in the last 6 months. Available on PyPI.

Use this if you are an AI developer or researcher looking to create or utilize pre-built knowledge bases from common public datasets for RAG models.

Not ideal if you need to build knowledge bases from proprietary or highly specialized internal datasets not already supported by this tool.

AI Development Natural Language Processing Information Retrieval Knowledge Base Management Machine Learning Engineering

Stale 6m

Maintenance 2 / 25

Adoption 7 / 25

Maturity 25 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

ItzCrazyKns/Vane

Vane is an AI-powered answering engine.

ConardLi/easy-dataset

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

xuwei95/ezdata

基于python和llm大模型开发的数据处理和任务调度系统。...

ModelEngine-Group/DataMate

DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG...

DS4SD/deepsearch-toolkit

Interact with the Deep Search platform for new knowledge explorations and discoveries

Explore RAG Tools

All categories Trending RAG directory Insights