NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
This project offers a GPU-optimized toolkit for machine learning engineers to build and train very large language models. It provides specialized components and strategies to efficiently scale training across many GPUs. You can use it to create custom frameworks for large-scale AI research and development.
15,633 stars. Actively maintained with 205 commits in the last 30 days.
Use this if you are an ML engineer or researcher building custom training pipelines for large transformer models and need to scale efficiently across many GPUs.
Not ideal if you are looking for a pre-trained model to use directly or a simple library for smaller, single-GPU model training.
Stars
15,633
Forks
3,689
Language
Python
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
205
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NVIDIA/Megatron-LM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related models
openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...
huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
eole-nlp/eole
Open language modeling toolkit based on PyTorch
huggingface/optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)