NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

/ 100

Verified

This project offers a GPU-optimized toolkit for machine learning engineers to build and train very large language models. It provides specialized components and strategies to efficiently scale training across many GPUs. You can use it to create custom frameworks for large-scale AI research and development.

15,633 stars. Actively maintained with 205 commits in the last 30 days.

Use this if you are an ML engineer or researcher building custom training pipelines for large transformer models and need to scale efficiently across many GPUs.

Not ideal if you are looking for a pre-trained model to use directly or a simple library for smaller, single-GPU model training.

large-language-models distributed-training AI-research transformer-architectures GPU-optimization

No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

15,633

Forks

3,689

Language

Python

License

—

Recent Releases

core_v0.16.1 20 Mar 2026 core_v0.16.0 26 Feb 2026 core_v0.15.3 06 Feb 2026 core_v0.15.2 08 Jan 2026 core_v0.15.1 07 Jan 2026

Related models

openvinotoolkit/nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...

huggingface/optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

eole-nlp/eole

Open language modeling toolkit based on PyTorch

huggingface/optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Explore Transformer Models

All categories Trending Transformer directory Insights