Liuhong99/Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
Sophia helps deep learning engineers more efficiently pre-train large language models. It takes an existing model and training data, and outputs a more accurate and performant language model by optimizing the training process. This tool is designed for machine learning researchers and engineers who are focused on building and improving large language models.
985 stars. No commits in the last 6 months.
Use this if you are pre-training large language models and want to achieve faster convergence or better final performance than with traditional optimizers like AdamW or Lion.
Not ideal if you are not working with large language models or deep learning model pre-training, or if you prefer simpler optimization methods for quick prototyping.
Stars
985
Forks
57
Language
Python
License
MIT
Category
Last pushed
Jan 30, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Liuhong99/Sophia"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scaleapi/llm-engine
Scale LLM Engine public repository
AGI-Arena/MARS
The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models
modelscope/easydistill
a toolkit on knowledge distillation for large language models
AGI-Edgerunners/LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient...
Wang-ML-Lab/bayesian-peft
Bayesian Low-Rank Adaptation of LLMs: BLoB [NeurIPS 2024] and TFB [NeurIPS 2025]