OpenSQZ/MegatronApp
Toolchain built around the Megatron-LM for Distributed Training
When training large language models with Megatron-LM across many GPUs, this toolchain helps optimize performance and understand what's happening inside the model. It takes your Megatron-LM training configuration and outputs visualizations, performance insights, and diagnostics to pinpoint slowdowns. AI/ML engineers and researchers working with distributed model training are the primary users.
Use this if you are training large language models with Megatron-LM and need to diagnose performance bottlenecks, optimize resource usage, or gain real-time visual insights into the model's internal workings.
Not ideal if you are working with smaller models, single-GPU training, or a different distributed training framework.
Stars
90
Forks
5
Language
Python
License
—
Category
Last pushed
Mar 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/OpenSQZ/MegatronApp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AI-Planning/l2p
Library for LLM-driven action model acquisition via natural language
datawhalechina/self-llm
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
theaniketgiri/create-llm
The fastest way to build and start training your own LLM. CLI tool that scaffolds...
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)