Liuhong99/Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”

/ 100

Emerging

Sophia helps deep learning engineers more efficiently pre-train large language models. It takes an existing model and training data, and outputs a more accurate and performant language model by optimizing the training process. This tool is designed for machine learning researchers and engineers who are focused on building and improving large language models.

985 stars. No commits in the last 6 months.

Use this if you are pre-training large language models and want to achieve faster convergence or better final performance than with traditional optimizers like AdamW or Lion.

Not ideal if you are not working with large language models or deep learning model pre-training, or if you prefer simpler optimization methods for quick prototyping.

large-language-models model-pre-training deep-learning-optimization neural-network-training

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

985

Forks

Language

Python

License

MIT

Higher-rated alternatives

scaleapi/llm-engine

Scale LLM Engine public repository

AGI-Arena/MARS

The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models

modelscope/easydistill

a toolkit on knowledge distillation for large language models

AGI-Edgerunners/LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient...

Wang-ML-Lab/bayesian-peft

Bayesian Low-Rank Adaptation of LLMs: BLoB [NeurIPS 2024] and TFB [NeurIPS 2025]

Explore Transformer Models

All categories Trending Transformer directory Insights