JinjieNi/MegaDLMs
GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 training.
This is a GPU-optimized framework for training large language models (LLMs), specifically Diffusion Language Models (DLMs), at any scale. It takes raw text data, tokenizes it, and outputs a fully trained language model that can then be used for generation tasks. This tool is designed for AI researchers and engineers who build and train advanced generative AI models.
327 stars.
Use this if you are developing and training state-of-the-art diffusion or autoregressive language models and need a high-performance, scalable solution optimized for GPU clusters.
Not ideal if you are looking for an off-the-shelf model to use directly, or if you need a framework for training models other than large language models.
Stars
327
Forks
30
Language
Python
License
—
Category
Last pushed
Nov 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JinjieNi/MegaDLMs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ZHZisZZ/dllm
dLLM: Simple Diffusion Language Modeling
pengzhangzhi/Open-dLLM
Open diffusion language model for code generation — releasing pretraining, evaluation,...
EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM...
THUDM/LongWriter
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
AIoT-MLSys-Lab/SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2