AGI-Arena/MARS

The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models

/ 100

Established

When training large-scale deep learning models, particularly large language models like GPT-2, this project helps optimize the training process. It takes your model architecture and training data as input and outputs a trained model that converges faster and achieves better performance (lower validation loss) compared to traditional methods. Machine learning engineers and researchers who are pretraining or fine-tuning large models would use this.

716 stars. Actively maintained with 2 commits in the last 30 days.

Use this if you are a machine learning engineer or researcher looking to significantly improve the efficiency and final performance of your large model training, especially for natural language processing tasks.

Not ideal if you are working with smaller models or simpler machine learning tasks where traditional optimizers already perform adequately, as the overhead might not be justified.

large-language-models deep-learning-optimization model-pretraining natural-language-processing vision-models

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

716

Forks

Language

Python

License

Apache-2.0

Related models

scaleapi/llm-engine

Scale LLM Engine public repository

modelscope/easydistill

a toolkit on knowledge distillation for large language models

AGI-Edgerunners/LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient...

Wang-ML-Lab/bayesian-peft

Bayesian Low-Rank Adaptation of LLMs: BLoB [NeurIPS 2024] and TFB [NeurIPS 2025]

sangmichaelxie/doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language...

Explore Transformer Models

All categories Trending Transformer directory Insights