microsoft/mup

maximal update parametrization (µP)

/ 100

Established

When training large neural networks, it's often tricky to find the right hyperparameters (like learning rate) that work well as your model grows. This tool helps deep learning practitioners avoid re-tuning these hyperparameters every time they scale up their model's size. By modifying how your PyTorch neural network is initialized and updated, it ensures that good hyperparameters found on smaller models remain effective on much larger versions, saving significant time and computational resources.

1,689 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you are developing large neural network models and want to find optimal hyperparameters once on a smaller model, then confidently transfer those settings to much larger versions without extensive re-tuning.

Not ideal if you are working with small neural networks or are not experiencing issues with hyperparameter stability when scaling up your models.

deep-learning neural-network-training hyperparameter-tuning model-scaling machine-learning-engineering

Stale 6m

Maintenance 0 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 17 / 25

How are scores calculated?

Stars

1,689

Forks

105

Language

Jupyter Notebook

License

MIT

Related models

openvinotoolkit/nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...

NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

huggingface/optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

eole-nlp/eole

Open language modeling toolkit based on PyTorch

Explore Transformer Models

All categories Trending Transformer directory Insights