Diffusion Language Models

There are 36 diffusion language models tracked. 2 score above 50 (established tier). The highest-rated is ZHZisZZ/dllm at 55/100 with 2,193 stars. 1 of the top 10 are actively maintained.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=diffusion-language-models&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 ZHZisZZ/dllm

dLLM: Simple Diffusion Language Modeling

55
Established
2 pengzhangzhi/Open-dLLM

Open diffusion language model for code generation — releasing pretraining,...

50
Established
3 EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications...

49
Emerging
4 THUDM/LongWriter

[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

48
Emerging
5 AIoT-MLSys-Lab/SVD-LLM

[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2

46
Emerging
6 datamllab/LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

43
Emerging
7 jxiw/MambaInLlama

[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and...

41
Emerging
8 DAMO-NLP-SG/CLEX

[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models

39
Emerging
9 czg1225/dParallel

[ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs

39
Emerging
10 tommyip/mamba2-minimal

Minimal Mamba-2 implementation in PyTorch

37
Emerging
11 JinjieNi/MegaDLMs

GPU-optimized framework for training diffusion language models at any scale....

36
Emerging
12 hao-ai-lab/DistCA

Efficient Long-context Language Model Training by Core Attention Disaggregation

36
Emerging
13 HKUDS/SepLLM

[ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One...

36
Emerging
14 sail-sg/Attention-Sink

[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical...

35
Emerging
15 Ereboas/MagiCodec

A single-layer, streaming codec model providing SOTA audio quality and...

35
Emerging
16 zjunlp/ModelKinship

Exploring Model Kinship for Merging Large Language Models

35
Emerging
17 VITA-Group/Ms-PoE

"Found in the Middle: How Language Models Use Long Contexts Better via...

34
Emerging
18 AlgonetLabs/Cable

Context-aware Biases for Length Extrapolation

33
Emerging
19 VITA-Group/TAPE

[ICML'25] "Rethinking Addressing in Language Models via Contextualized...

33
Emerging
20 uiuctml/Localize-and-Stitch

Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

33
Emerging
21 fvliang/DART

Official Implementation of DART (DART: Diffusion-Inspired Speculative...

32
Emerging
22 hao-ai-lab/d3LLM

d3LLM: Ultra-Fast Diffusion LLM 🚀

31
Emerging
23 OpenMOSS/LongLLaDA

[AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

31
Emerging
24 SJTU-DENG-Lab/LightningRL

LightningRL: Breaking the Accuracy–Parallelism Trade-off of Block-wise dLLMs...

30
Emerging
25 JarvisPei/MemDLM

MemDLM: Memory-enhanced Diffusion Language Model

27
Experimental
26 zhiyuanhubj/LongRecipe

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

25
Experimental
27 MouxiaoHuang/PPE

[ICLR 2026] Official code of PPE: Positional Preservation Embedding for...

24
Experimental
28 sayhitosandy/Mamba_SSM

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

23
Experimental
29 declare-lab/della

DELLA-Merging: Reducing Interference in Model Merging through...

23
Experimental
30 Anri-Lombard/Mamba-SAFE

Generating Molecules with the Mamba architecture

20
Experimental
31 yophis/decom-renorm-merge

Decom-Renorm-Merge: Merging deep learning models through shared representation space.

20
Experimental
32 aflah02/Partial_RoPE_Analysis

Code accompanying the paper “Fractional Rotation, Full Potential?...

17
Experimental
33 chen-hao-chao/mdm-prime-v2

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal...

16
Experimental
34 Ghost---Shadow/diff-rouge

A fully vectorized PyTorch implementation of ROUGE scores optimized for...

13
Experimental
35 kduxin/corrdim

Correlation dimension of autoregressive LLMs

12
Experimental
36 soacker/Mesa-Extrapolation

[NeurIPS 2024] Mesa-Extrapolation: A Weave Position Encoding Method for...

11
Experimental

Comparisons in this category