zyushun/Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
This project helps machine learning practitioners efficiently train large language models, especially Transformers, by reducing the memory required for the optimizer. It takes your existing PyTorch model and training configuration, optimizing it to consume significantly less GPU memory. Data scientists and AI researchers working with large neural networks, particularly in natural language processing, are the primary users.
453 stars. No commits in the last 6 months. Available on PyPI.
Use this if you are training large deep learning models, such as Transformers or LLMs, and are encountering memory limitations with standard optimizers like AdamW.
Not ideal if you are working with small models or non-deep learning tasks where memory footprint is not a critical constraint.
Stars
453
Forks
17
Language
Python
License
—
Category
Last pushed
May 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zyushun/Adam-mini"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelTC/LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...
p-e-w/heretic
Fully automatic censorship removal for language models
Orion-zhen/abliteration
Make abliterated models with transformers, easy and fast
YerbaPage/LongCodeZip
LongCodeZip: Compress Long Context for Code Language Models [ASE2025]
locuslab/wanda
A simple and effective LLM pruning approach.