GUNDAM-Labet/GUNDAM
GUNDAM is a data management system that prioritizes data using language models.
This tool helps data scientists and ML engineers manage large collections of text data used to train or fine-tune language models. It takes your existing text corpus and an associated language model, then intelligently identifies the most essential and informative data samples (a "golden plug-in set"). This golden set can then be used by demonstration retrievers to efficiently select high-quality examples for various language model tasks without sifting through all your data.
189 stars. No commits in the last 6 months.
Use this if you need to efficiently identify the most valuable text data samples for training or serving language models, especially when dealing with continually growing datasets.
Not ideal if your primary goal is to manage non-textual data or if you are not working with large language models.
Stars
189
Forks
32
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/GUNDAM-Labet/GUNDAM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MigoXLab/dingo
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
TheDataStation/pneuma
LLM-Powered Data Discovery System for Tabular Data
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio