0xNaN/edufsdp
A minimal, educational implementation of Fully Sharded Data Parallel (FSDP).
This project offers a clear and simplified look at how Fully Sharded Data Parallel (FSDP) works. It takes a complex distributed training strategy and breaks down its core components like parameter sharding and data gathering. Machine learning practitioners, especially those learning about or implementing large model training, can use this to understand the underlying mechanisms without getting lost in optimization details.
Use this if you are a machine learning engineer or researcher who wants to deeply understand the fundamental concepts of FSDP for distributed model training.
Not ideal if you need a production-ready FSDP implementation with advanced features like mixed precision, communication-computation overlap, or CPU offloading.
Stars
12
Forks
1
Language
Python
License
—
Category
Last pushed
Jan 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/0xNaN/edufsdp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.