hikariming/pindata
PinData is a modern, open-source dataset management platform designed specifically for large language model (LLM) training workflows
PinData helps organizations transform their diverse raw data, like documents and reports, into organized knowledge and high-quality datasets for AI applications. It takes various enterprise files and structured data as input, processing them to produce structured knowledge bases and ready-to-use training datasets. This platform is ideal for data managers, AI solution architects, researchers, and professional service providers who work with large volumes of enterprise information.
No commits in the last 6 months.
Use this if you need to unify, process, and structure large volumes of enterprise data (documents, reports, manuals) into a coherent knowledge base or high-quality training datasets for AI models.
Not ideal if your primary need is simply data storage without extensive processing, AI-driven structuring, or dataset generation for large language models.
Stars
44
Forks
6
Language
TypeScript
License
—
Category
Last pushed
Jul 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/hikariming/pindata"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MigoXLab/dingo
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
TheDataStation/pneuma
LLM-Powered Data Discovery System for Tabular Data
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio