Liuhong99/Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”

41
/ 100
Emerging

Sophia helps deep learning engineers more efficiently pre-train large language models. It takes an existing model and training data, and outputs a more accurate and performant language model by optimizing the training process. This tool is designed for machine learning researchers and engineers who are focused on building and improving large language models.

985 stars. No commits in the last 6 months.

Use this if you are pre-training large language models and want to achieve faster convergence or better final performance than with traditional optimizers like AdamW or Lion.

Not ideal if you are not working with large language models or deep learning model pre-training, or if you prefer simpler optimization methods for quick prototyping.

large-language-models model-pre-training deep-learning-optimization neural-network-training
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

985

Forks

57

Language

Python

License

MIT

Last pushed

Jan 30, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Liuhong99/Sophia"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.