open-edge-platform/datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

80
/ 100
Verified

This tool helps Computer Vision engineers prepare and manage their image and video datasets for model training. You can combine various datasets, clean up annotations, and split data into training, validation, and test sets. It takes raw image/video datasets in formats like COCO, VOC, or CVAT, and outputs refined, consistent datasets ready for your machine learning models or for further analysis.

661 stars. Actively maintained with 29 commits in the last 30 days. Available on PyPI.

Use this if you need to build, transform, or analyze complex computer vision datasets from various sources before training your models.

Not ideal if your primary goal is real-time data ingestion for streaming analytics or managing non-computer vision datasets.

computer-vision dataset-preparation data-labeling model-training machine-learning-operations
Maintenance 20 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 25 / 25

How are scores calculated?

Stars

661

Forks

158

Language

Python

License

MIT

Last pushed

Mar 13, 2026

Commits (30d)

29

Dependencies

17

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/open-edge-platform/datumaro"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.