mlcommons/croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.

/ 100

Established

This tool helps machine learning engineers and researchers easily access and use diverse ML datasets. It takes a standardized description file (Croissant JSON-LD) for any dataset, detailing its metadata, file locations, structure, and intended ML usage. The output is a ready-to-use dataset, seamlessly integrated into popular ML frameworks like TensorFlow or PyTorch, saving time and effort in data preparation.

799 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need a consistent way to describe, discover, and load machine learning datasets from various sources into your ML workflows, regardless of their original file organization.

Not ideal if you primarily work with very small, custom datasets that you manually curate and don't need to share or integrate with standardized tooling.

machine-learning-engineering data-preparation ml-dataset-management model-training research-data

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

799

Forks

100

Language

Jupyter Notebook

License

Apache-2.0

Related frameworks

open-edge-platform/datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...

explosion/ml-datasets

🌊 Machine learning dataset loaders for testing and example scripts

webdataset/webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with...

tensorflow/datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement...

Explore ML Frameworks

All categories Trending ML Framework directory Insights