diffgram/awesome-training-data
Curated list of Awesome Training Data! (Data Labeling, Annotation, Discovery, Workflow etc)
This resource helps machine learning engineers, data scientists, and project managers discover tools and platforms for preparing high-quality data to train AI models. It curates various solutions for tasks like labeling images, annotating video, or transcribing audio, taking raw data and turning it into structured datasets ready for model training. The end users are professionals building or managing AI/ML projects.
No commits in the last 6 months.
Use this if you are an AI/ML practitioner looking for a comprehensive list of open-source and commercial tools to efficiently label, annotate, and manage training data for your machine learning models across different data types.
Not ideal if you are looking for a step-by-step tutorial or an actual software tool to perform data labeling directly, as this is a curated list of external resources.
Stars
12
Forks
2
Language
—
License
—
Category
Last pushed
May 24, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/diffgram/awesome-training-data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvat-ai/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and...
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
wkentaro/labelme
Image annotation with Python. Supports polygon, rectangle, circle, line, point, and AI-assisted...
CVHub520/X-AnyLabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
doccano/doccano
Open source annotation tool for machine learning practitioners.