TheDataStation/pneuma
LLM-Powered Data Discovery System for Tabular Data
This tool helps data analysts and researchers quickly find relevant datasets within a large collection of tabular data. You provide a natural language question, and it sifts through your registered tables to return the most pertinent ones, considering both the content and descriptive context. It's designed for anyone who regularly needs to locate specific datasets for analysis or reporting.
No commits in the last 6 months. Available on PyPI.
Use this if you have many tabular datasets and frequently struggle to find the right one for a specific question or analysis.
Not ideal if you only work with a few small datasets or need a tool for data transformation and cleaning rather than discovery.
Stars
24
Forks
8
Language
Python
License
MIT
Category
Last pushed
Jul 14, 2025
Commits (30d)
0
Dependencies
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/TheDataStation/pneuma"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MigoXLab/dingo
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio
jpmorganchase/CodeQuest
CodeQUEST is a generalizable framework which leverages LLMs to iteratively evaluate and enhance...