open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
This tool helps Computer Vision engineers prepare and manage their image and video datasets for model training. You can combine various datasets, clean up annotations, and split data into training, validation, and test sets. It takes raw image/video datasets in formats like COCO, VOC, or CVAT, and outputs refined, consistent datasets ready for your machine learning models or for further analysis.
661 stars. Actively maintained with 29 commits in the last 30 days. Available on PyPI.
Use this if you need to build, transform, or analyze complex computer vision datasets from various sources before training your models.
Not ideal if your primary goal is real-time data ingestion for streaming analytics or managing non-computer vision datasets.
Stars
661
Forks
158
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
29
Dependencies
17
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/open-edge-platform/datumaro"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
alan-turing-institute/CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement...