liangyuwang/Tiny-DeepSpeed

Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library

43
/ 100
Emerging

This project helps deep learning developers understand and experiment with techniques to reduce GPU memory usage when training large models like GPT-2. It takes your existing PyTorch training code and, by applying various parallelism strategies, significantly lowers the GPU memory footprint. This is ideal for machine learning engineers and researchers working with large language models or other deep neural networks who are hitting GPU memory limits.

No commits in the last 6 months.

Use this if you are a deep learning developer struggling with GPU memory limitations when training large models and want to understand how distributed training strategies like ZeRO can help.

Not ideal if you are looking for a production-ready, fully-featured distributed training library or if you are not a developer working with deep learning models.

deep-learning-optimization gpu-memory-management distributed-training large-language-models machine-learning-engineering
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

50

Forks

10

Language

Python

License

Apache-2.0

Last pushed

Aug 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/liangyuwang/Tiny-DeepSpeed"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.