mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
This tool helps machine learning engineers and researchers easily access and use diverse ML datasets. It takes a standardized description file (Croissant JSON-LD) for any dataset, detailing its metadata, file locations, structure, and intended ML usage. The output is a ready-to-use dataset, seamlessly integrated into popular ML frameworks like TensorFlow or PyTorch, saving time and effort in data preparation.
799 stars. Actively maintained with 1 commit in the last 30 days.
Use this if you need a consistent way to describe, discover, and load machine learning datasets from various sources into your ML workflows, regardless of their original file organization.
Not ideal if you primarily work with very small, custom datasets that you manually curate and don't need to share or integrate with standardized tooling.
Stars
799
Forks
100
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Feb 05, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mlcommons/croissant"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
alan-turing-institute/CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement...