neuml/ragdata

📚 Build knowledge bases for RAG

40
/ 100
Emerging

This project helps AI developers and researchers build comprehensive knowledge bases for Retrieval Augmented Generation (RAG) applications. It takes raw data from large datasets like ArXiv and Wikipedia, processes them, and outputs structured embedding databases. These databases are then used by RAG systems to retrieve relevant information efficiently.

No commits in the last 6 months. Available on PyPI.

Use this if you are an AI developer or researcher looking to create or utilize pre-built knowledge bases from common public datasets for RAG models.

Not ideal if you need to build knowledge bases from proprietary or highly specialized internal datasets not already supported by this tool.

AI Development Natural Language Processing Information Retrieval Knowledge Base Management Machine Learning Engineering
Stale 6m
Maintenance 2 / 25
Adoption 7 / 25
Maturity 25 / 25
Community 6 / 25

How are scores calculated?

Stars

32

Forks

2

Language

Python

License

Apache-2.0

Last pushed

Jul 03, 2025

Commits (30d)

0

Dependencies

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/neuml/ragdata"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.