microsoft/mup
maximal update parametrization (µP)
When training large neural networks, it's often tricky to find the right hyperparameters (like learning rate) that work well as your model grows. This tool helps deep learning practitioners avoid re-tuning these hyperparameters every time they scale up their model's size. By modifying how your PyTorch neural network is initialized and updated, it ensures that good hyperparameters found on smaller models remain effective on much larger versions, saving significant time and computational resources.
1,689 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Use this if you are developing large neural network models and want to find optimal hyperparameters once on a smaller model, then confidently transfer those settings to much larger versions without extensive re-tuning.
Not ideal if you are working with small neural networks or are not experiencing issues with hyperparameter stability when scaling up your models.
Stars
1,689
Forks
105
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jul 17, 2024
Commits (30d)
0
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/mup"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
eole-nlp/eole
Open language modeling toolkit based on PyTorch