ScalingOpt/SGG

[ACL 2025 Main] Taming LLMs by Scaling Learning Rates with Gradient Grouping

/ 100

Experimental

This project helps machine learning engineers and researchers efficiently train large language models (LLMs) and other large models. It takes your existing adaptive optimizer, like AdamW, and wraps it to improve how learning rates are calculated during training. The output is a more stable and faster training process, leading to better model performance.

No commits in the last 6 months.

Use this if you are training large language models and want to improve training stability, accelerate convergence, and enhance compatibility with parameter-efficient fine-tuning techniques.

Not ideal if you are working with smaller models where basic adaptive optimizers already provide satisfactory training performance and stability.

large-language-model-training deep-learning-optimization model-fine-tuning neural-network-training

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

JavaScript

License

—

Higher-rated alternatives

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...

intel/auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...

pytorch/ao

PyTorch native quantization and sparsity for training and inference

bodaay/HuggingFaceModelDownloader

Simple go utility to download HuggingFace Models and Datasets

NVIDIA/kvpress

LLM KV cache compression made easy

Explore Transformer Models

All categories Trending Transformer directory Insights