Diffusion Language Models

There are 36 diffusion language models tracked. 2 score above 50 (established tier). The highest-rated is ZHZisZZ/dllm at 55/100 with 2,193 stars. 1 of the top 10 are actively maintained.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=diffusion-language-models&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	ZHZisZZ/dllm dLLM: Simple Diffusion Language Modeling	55	Established	2,193	Python
2	pengzhangzhi/Open-dLLM Open diffusion language model for code generation — releasing pretraining,...	50	Established	549	Python
3	EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications...	49	Emerging	689	—
4	THUDM/LongWriter [ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs	48	Emerging	1,839	Python
5	AIoT-MLSys-Lab/SVD-LLM [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2	46	Emerging	284	Python
6	datamllab/LongLM [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning	43	Emerging	666	Python
7	jxiw/MambaInLlama [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and...	41	Emerging	238	Python
8	DAMO-NLP-SG/CLEX [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models	39	Emerging	78	Python
9	czg1225/dParallel [ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs	39	Emerging	62	Python
10	tommyip/mamba2-minimal Minimal Mamba-2 implementation in PyTorch	37	Emerging	243	Python
11	JinjieNi/MegaDLMs GPU-optimized framework for training diffusion language models at any scale....	36	Emerging	327	Python
12	hao-ai-lab/DistCA Efficient Long-context Language Model Training by Core Attention Disaggregation	36	Emerging	93	Python
13	HKUDS/SepLLM [ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One...	36	Emerging	567	Python
14	sail-sg/Attention-Sink [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical...	35	Emerging	159	Python
15	Ereboas/MagiCodec A single-layer, streaming codec model providing SOTA audio quality and...	35	Emerging	113	Python
16	zjunlp/ModelKinship Exploring Model Kinship for Merging Large Language Models	35	Emerging	27	Python
17	VITA-Group/Ms-PoE "Found in the Middle: How Language Models Use Long Contexts Better via...	34	Emerging	31	Python
18	AlgonetLabs/Cable Context-aware Biases for Length Extrapolation	33	Emerging	22	Python
19	VITA-Group/TAPE [ICML'25] "Rethinking Addressing in Language Models via Contextualized...	33	Emerging	14	Python
20	uiuctml/Localize-and-Stitch Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic	33	Emerging	32	Python
21	fvliang/DART Official Implementation of DART (DART: Diffusion-Inspired Speculative...	32	Emerging	45	Python
22	hao-ai-lab/d3LLM d3LLM: Ultra-Fast Diffusion LLM 🚀	31	Emerging	105	Python
23	OpenMOSS/LongLLaDA [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs	31	Emerging	53	Python
24	SJTU-DENG-Lab/LightningRL LightningRL: Breaking the Accuracy–Parallelism Trade-off of Block-wise dLLMs...	30	Emerging	23	Python
25	JarvisPei/MemDLM MemDLM: Memory-enhanced Diffusion Language Model	27	Experimental	9	Python
26	zhiyuanhubj/LongRecipe LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models	25	Experimental	79	Python
27	MouxiaoHuang/PPE [ICLR 2026] Official code of PPE: Positional Preservation Embedding for...	24	Experimental	3	Python
28	sayhitosandy/Mamba_SSM Mamba: Linear-Time Sequence Modeling with Selective State Spaces	23	Experimental	3	—
29	declare-lab/della DELLA-Merging: Reducing Interference in Model Merging through...	23	Experimental	36	Python
30	Anri-Lombard/Mamba-SAFE Generating Molecules with the Mamba architecture	20	Experimental	5	Jupyter Notebook
31	yophis/decom-renorm-merge Decom-Renorm-Merge: Merging deep learning models through shared representation space.	20	Experimental	1	Python
32	aflah02/Partial_RoPE_Analysis Code accompanying the paper “Fractional Rotation, Full Potential?...	17	Experimental	1	Shell
33	chen-hao-chao/mdm-prime-v2 MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal...	16	Experimental	2	Python
34	Ghost---Shadow/diff-rouge A fully vectorized PyTorch implementation of ROUGE scores optimized for...	13	Experimental	—	Python
35	kduxin/corrdim Correlation dimension of autoregressive LLMs	12	Experimental	1	—
36	soacker/Mesa-Extrapolation [NeurIPS 2024] Mesa-Extrapolation: A Weave Position Encoding Method for...	11	Experimental	3	Python

Comparisons in this category

dllm and Open-dLLM (55 vs 50)