zyushun/Adam-mini

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793

/ 100

Emerging

This project helps machine learning practitioners efficiently train large language models, especially Transformers, by reducing the memory required for the optimizer. It takes your existing PyTorch model and training configuration, optimizing it to consume significantly less GPU memory. Data scientists and AI researchers working with large neural networks, particularly in natural language processing, are the primary users.

453 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are training large deep learning models, such as Transformers or LLMs, and are encountering memory limitations with standard optimizers like AdamW.

Not ideal if you are working with small models or non-deep learning tasks where memory footprint is not a critical constraint.

large-language-models deep-learning-training natural-language-processing gpu-memory-optimization transformer-architectures

No License Stale 6m No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 17 / 25

Community 11 / 25

How are scores calculated?

Stars

453

Forks

Language

Python

License

—

Higher-rated alternatives

ModelTC/LightCompress

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...

p-e-w/heretic

Fully automatic censorship removal for language models

Orion-zhen/abliteration

Make abliterated models with transformers, easy and fast

YerbaPage/LongCodeZip

LongCodeZip: Compress Long Context for Code Language Models [ASE2025]

locuslab/wanda

A simple and effective LLM pruning approach.

Explore Transformer Models

All categories Trending Transformer directory Insights