ScalingOpt/SGG
[ACL 2025 Main] Taming LLMs by Scaling Learning Rates with Gradient Grouping
This project helps machine learning engineers and researchers efficiently train large language models (LLMs) and other large models. It takes your existing adaptive optimizer, like AdamW, and wraps it to improve how learning rates are calculated during training. The output is a more stable and faster training process, leading to better model performance.
No commits in the last 6 months.
Use this if you are training large language models and want to improve training stability, accelerate convergence, and enhance compatibility with parameter-efficient fine-tuning techniques.
Not ideal if you are working with smaller models where basic adaptive optimizers already provide satisfactory training performance and stability.
Stars
9
Forks
—
Language
JavaScript
License
—
Category
Last pushed
Jul 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ScalingOpt/SGG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy