mosaicml/streaming
A Data Streaming Library for Efficient Neural Network Training
This tool helps machine learning engineers efficiently train large neural networks using datasets stored in cloud storage like AWS S3 or Google Cloud Storage. It takes raw data (images, text, video) in common formats like CSV, JSONL, or MDS, and streams it directly into PyTorch training workflows. This allows for faster and more scalable training, especially for large, distributed models.
1,472 stars.
Use this if you are a machine learning engineer training large models with datasets stored in cloud object storage and need to improve training speed and scalability.
Not ideal if you are a data scientist primarily working with small datasets on a local machine or not using PyTorch for neural network training.
Stars
1,472
Forks
189
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mosaicml/streaming"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
opentensor/bittensor
Internet-scale Neural Networks
trailofbits/fickling
A Python pickling decompiler and static analyzer
benchopt/benchopt
A framework for reproducible, comparable benchmarks
BiomedSciAI/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code...
taoshidev/vanta-network
Vanta Network built on Bittensor