JinjieNi/MegaDLMs

GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 training.

/ 100

Emerging

This is a GPU-optimized framework for training large language models (LLMs), specifically Diffusion Language Models (DLMs), at any scale. It takes raw text data, tokenizes it, and outputs a fully trained language model that can then be used for generation tasks. This tool is designed for AI researchers and engineers who build and train advanced generative AI models.

327 stars.

Use this if you are developing and training state-of-the-art diffusion or autoregressive language models and need a high-performance, scalable solution optimized for GPU clusters.

Not ideal if you are looking for an off-the-shelf model to use directly, or if you need a framework for training models other than large language models.

large-language-models generative-ai deep-learning-training diffusion-models model-scalability

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 5 / 25

Community 15 / 25

How are scores calculated?

Stars

327

Forks

Language

Python

License

—

Higher-rated alternatives

ZHZisZZ/dllm

dLLM: Simple Diffusion Language Modeling

pengzhangzhi/Open-dLLM

Open diffusion language model for code generation — releasing pretraining, evaluation,...

EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM...

THUDM/LongWriter

[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

AIoT-MLSys-Lab/SVD-LLM

[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2

Explore Transformer Models

All categories Trending Transformer directory Insights