zhuhanqing/APOLLO

APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention

/ 100

Emerging

This project offers an advanced optimizer designed for training and fine-tuning large language models (LLMs). It allows machine learning engineers and researchers to achieve high-performance model training with significantly reduced memory consumption. You feed it your LLM architecture and training data, and it optimizes the learning process to produce a well-trained model faster and with less GPU memory.

271 stars.

Use this if you are pre-training or fine-tuning large language models (LLMs) and are constrained by GPU memory but still need AdamW-level performance.

Not ideal if you are working with smaller models that don't face memory limitations during training, or if you require an optimizer for non-LLM machine learning tasks.

large-language-models LLM-training deep-learning-optimization model-fine-tuning GPU-memory-management

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

271

Forks

Language

Python

License

—

Related tools

zhenye234/xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

HITESHLPATEL/Mamba-Papers

Awesome Mamba Papers: A Curated Collection of Research Papers , Tutorials & Blogs

Y-Research-SBU/CSRv2

Official Repository for CSRv2 - ICLR 2026

psychofict/llm-effective-context-length

Investigating Why the Effective Context Length of LLMs Falls Short (Based on STRING, ICLR 2025)

hrlics/CoPE

CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

Explore LLM Tools

All categories Trending LLM Tool directory Insights