srush/LLM-Training-Puzzles
What would you do with 1000 H100s...
This is a collection of 8 challenging puzzles about training large language models (or really any NN) on many, many GPUs. Very few people actually get a chance to train on thousands of computers, but it is an interesting challenge and one that is critically important for modern AI. The goal of these puzzles is to get hands-on experience with the key primitives and to understand the goals of memory efficiency and compute pipelining.
1,157 stars. No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher looking to deepen your practical understanding of large-scale distributed training for deep neural networks.
Not ideal if you are looking for a tool to train models on a single GPU or a small cluster without focusing on extreme memory and compute optimization challenges.
Stars
1,157
Forks
72
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 10, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/srush/LLM-Training-Puzzles"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SepineTam/stata-mcp
Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.
datawhalechina/code-your-own-llm
一份全栈式大语言模型参考指南,用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节
leonid20000/odin-slides
This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint...
onejune2018/Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs...
axhiao/QuickNote
Capture what you want with LLM