zhuhanqing/APOLLO
APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention
This project offers an advanced optimizer designed for training and fine-tuning large language models (LLMs). It allows machine learning engineers and researchers to achieve high-performance model training with significantly reduced memory consumption. You feed it your LLM architecture and training data, and it optimizes the learning process to produce a well-trained model faster and with less GPU memory.
271 stars.
Use this if you are pre-training or fine-tuning large language models (LLMs) and are constrained by GPU memory but still need AdamW-level performance.
Not ideal if you are working with smaller models that don't face memory limitations during training, or if you require an optimizer for non-LLM machine learning tasks.
Stars
271
Forks
13
Language
Python
License
—
Category
Last pushed
Nov 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zhuhanqing/APOLLO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
zhenye234/xcodec
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
HITESHLPATEL/Mamba-Papers
Awesome Mamba Papers: A Curated Collection of Research Papers , Tutorials & Blogs
Y-Research-SBU/CSRv2
Official Repository for CSRv2 - ICLR 2026
psychofict/llm-effective-context-length
Investigating Why the Effective Context Length of LLMs Falls Short (Based on STRING, ICLR 2025)
hrlics/CoPE
CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs