mlcommons/croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.

59
/ 100
Established

This tool helps machine learning engineers and researchers easily access and use diverse ML datasets. It takes a standardized description file (Croissant JSON-LD) for any dataset, detailing its metadata, file locations, structure, and intended ML usage. The output is a ready-to-use dataset, seamlessly integrated into popular ML frameworks like TensorFlow or PyTorch, saving time and effort in data preparation.

799 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need a consistent way to describe, discover, and load machine learning datasets from various sources into your ML workflows, regardless of their original file organization.

Not ideal if you primarily work with very small, custom datasets that you manually curate and don't need to share or integrate with standardized tooling.

machine-learning-engineering data-preparation ml-dataset-management model-training research-data
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

799

Forks

100

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Feb 05, 2026

Commits (30d)

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mlcommons/croissant"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.