OpenSQZ/MegatronApp

Toolchain built around the Megatron-LM for Distributed Training

/ 100

Emerging

When training large language models with Megatron-LM across many GPUs, this toolchain helps optimize performance and understand what's happening inside the model. It takes your Megatron-LM training configuration and outputs visualizations, performance insights, and diagnostics to pinpoint slowdowns. AI/ML engineers and researchers working with distributed model training are the primary users.

Use this if you are training large language models with Megatron-LM and need to diagnose performance bottlenecks, optimize resource usage, or gain real-time visual insights into the model's internal workings.

Not ideal if you are working with smaller models, single-GPU training, or a different distributed training framework.

large-language-models distributed-training model-performance AI-research deep-learning-operations

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 15 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

AI-Planning/l2p

Library for LLM-driven action model acquisition via natural language

datawhalechina/self-llm

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程

microsoft/LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

theaniketgiri/create-llm

The fastest way to build and start training your own LLM. CLI tool that scaffolds...

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

Explore LLM Tools

All categories Trending LLM Tool directory Insights