zyushun/Adam-mini

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793

40
/ 100
Emerging

This project helps machine learning practitioners efficiently train large language models, especially Transformers, by reducing the memory required for the optimizer. It takes your existing PyTorch model and training configuration, optimizing it to consume significantly less GPU memory. Data scientists and AI researchers working with large neural networks, particularly in natural language processing, are the primary users.

453 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are training large deep learning models, such as Transformers or LLMs, and are encountering memory limitations with standard optimizers like AdamW.

Not ideal if you are working with small models or non-deep learning tasks where memory footprint is not a critical constraint.

large-language-models deep-learning-training natural-language-processing gpu-memory-optimization transformer-architectures
No License Stale 6m No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 17 / 25
Community 11 / 25

How are scores calculated?

Stars

453

Forks

17

Language

Python

License

Last pushed

May 13, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zyushun/Adam-mini"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.