Diffusion Language Models
There are 36 diffusion language models tracked. 2 score above 50 (established tier). The highest-rated is ZHZisZZ/dllm at 55/100 with 2,193 stars. 1 of the top 10 are actively maintained.
Get all 36 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=diffusion-language-models&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
ZHZisZZ/dllm
dLLM: Simple Diffusion Language Modeling |
|
Established |
| 2 |
pengzhangzhi/Open-dLLM
Open diffusion language model for code generation — releasing pretraining,... |
|
Established |
| 3 |
EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications... |
|
Emerging |
| 4 |
THUDM/LongWriter
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs |
|
Emerging |
| 5 |
AIoT-MLSys-Lab/SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2 |
|
Emerging |
| 6 |
datamllab/LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning |
|
Emerging |
| 7 |
jxiw/MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and... |
|
Emerging |
| 8 |
DAMO-NLP-SG/CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models |
|
Emerging |
| 9 |
czg1225/dParallel
[ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs |
|
Emerging |
| 10 |
tommyip/mamba2-minimal
Minimal Mamba-2 implementation in PyTorch |
|
Emerging |
| 11 |
JinjieNi/MegaDLMs
GPU-optimized framework for training diffusion language models at any scale.... |
|
Emerging |
| 12 |
hao-ai-lab/DistCA
Efficient Long-context Language Model Training by Core Attention Disaggregation |
|
Emerging |
| 13 |
HKUDS/SepLLM
[ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One... |
|
Emerging |
| 14 |
sail-sg/Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical... |
|
Emerging |
| 15 |
Ereboas/MagiCodec
A single-layer, streaming codec model providing SOTA audio quality and... |
|
Emerging |
| 16 |
zjunlp/ModelKinship
Exploring Model Kinship for Merging Large Language Models |
|
Emerging |
| 17 |
VITA-Group/Ms-PoE
"Found in the Middle: How Language Models Use Long Contexts Better via... |
|
Emerging |
| 18 |
AlgonetLabs/Cable
Context-aware Biases for Length Extrapolation |
|
Emerging |
| 19 |
VITA-Group/TAPE
[ICML'25] "Rethinking Addressing in Language Models via Contextualized... |
|
Emerging |
| 20 |
uiuctml/Localize-and-Stitch
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic |
|
Emerging |
| 21 |
fvliang/DART
Official Implementation of DART (DART: Diffusion-Inspired Speculative... |
|
Emerging |
| 22 |
hao-ai-lab/d3LLM
d3LLM: Ultra-Fast Diffusion LLM 🚀 |
|
Emerging |
| 23 |
OpenMOSS/LongLLaDA
[AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs |
|
Emerging |
| 24 |
SJTU-DENG-Lab/LightningRL
LightningRL: Breaking the Accuracy–Parallelism Trade-off of Block-wise dLLMs... |
|
Emerging |
| 25 |
JarvisPei/MemDLM
MemDLM: Memory-enhanced Diffusion Language Model |
|
Experimental |
| 26 |
zhiyuanhubj/LongRecipe
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models |
|
Experimental |
| 27 |
MouxiaoHuang/PPE
[ICLR 2026] Official code of PPE: Positional Preservation Embedding for... |
|
Experimental |
| 28 |
sayhitosandy/Mamba_SSM
Mamba: Linear-Time Sequence Modeling with Selective State Spaces |
|
Experimental |
| 29 |
declare-lab/della
DELLA-Merging: Reducing Interference in Model Merging through... |
|
Experimental |
| 30 |
Anri-Lombard/Mamba-SAFE
Generating Molecules with the Mamba architecture |
|
Experimental |
| 31 |
yophis/decom-renorm-merge
Decom-Renorm-Merge: Merging deep learning models through shared representation space. |
|
Experimental |
| 32 |
aflah02/Partial_RoPE_Analysis
Code accompanying the paper “Fractional Rotation, Full Potential?... |
|
Experimental |
| 33 |
chen-hao-chao/mdm-prime-v2
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal... |
|
Experimental |
| 34 |
Ghost---Shadow/diff-rouge
A fully vectorized PyTorch implementation of ROUGE scores optimized for... |
|
Experimental |
| 35 |
kduxin/corrdim
Correlation dimension of autoregressive LLMs |
|
Experimental |
| 36 |
soacker/Mesa-Extrapolation
[NeurIPS 2024] Mesa-Extrapolation: A Weave Position Encoding Method for... |
|
Experimental |