srush/LLM-Training-Puzzles

What would you do with 1000 H100s...

/ 100

Emerging

This is a collection of 8 challenging puzzles about training large language models (or really any NN) on many, many GPUs. Very few people actually get a chance to train on thousands of computers, but it is an interesting challenge and one that is critically important for modern AI. The goal of these puzzles is to get hands-on experience with the key primitives and to understand the goals of memory efficiency and compute pipelining.

1,157 stars. No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher looking to deepen your practical understanding of large-scale distributed training for deep neural networks.

Not ideal if you are looking for a tool to train models on a single GPU or a small cluster without focusing on extreme memory and compute optimization challenges.

distributed-training large-language-models deep-learning-optimization GPU-programming AI-infrastructure

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

1,157

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

SepineTam/stata-mcp

Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.

datawhalechina/code-your-own-llm

一份全栈式大语言模型参考指南，用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节

leonid20000/odin-slides

This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint...

onejune2018/Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs...

axhiao/QuickNote

Capture what you want with LLM

Explore LLM Tools

All categories Trending LLM Tool directory Insights