T2I Evaluation Benchmarks Diffusion Models

Benchmarks, datasets, and metrics for evaluating text-to-image generation quality and alignment. Does NOT include tools for generating images, training models, or prompt optimization.

There are 51 t2i evaluation benchmarks models tracked. 4 score above 50 (established tier). The highest-rated is Vchitect/VBench at 68/100 with 1,537 stars. 1 of the top 10 are actively maintained.

Get all 51 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=diffusion&subcategory=t2i-evaluation-benchmarks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 Vchitect/VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

68
Established
2 VectorSpaceLab/OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

60
Established
3 EndlessSora/focal-frequency-loss

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

51
Established
4 JIA-Lab-research/DreamOmni2

This project is the official implementation of 'DreamOmni2: Multimodal...

50
Established
5 SkyworkAI/UniPic

Open-source SOTA multi-image editing model

48
Emerging
6 PKU-YuanGroup/ChronoMagic-Bench

[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic...

47
Emerging
7 nupurkmr9/syncd

SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization...

45
Emerging
8 ViStoryBench/vistorybench

[CVPR 2026] ViStoryBench: AI Story Visualization Benchmark

45
Emerging
9 Karine-Huang/T2I-CompBench

[Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image...

44
Emerging
10 uni-medical/UniMedVL

Official implementation of "UniMedVL: Unifying Medical Multimodal...

44
Emerging
11 zai-org/CogView2

official code repo for paper "CogView2: Faster and Better Text-to-Image...

44
Emerging
12 zai-org/CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

43
Emerging
13 tobran/GALIP

[CVPR2023] A faster, smaller, and better text-to-image model for large-scale training

42
Emerging
14 Amshaker/Mobile-O

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

42
Emerging
15 OpenGVLab/GenExam

GenExam: A Multidisciplinary Text-to-Image Exam

41
Emerging
16 AIDC-AI/Ovis-U1

An unified model that seamlessly integrates multimodal understanding,...

41
Emerging
17 JustusThies/NeuralTexGen

Image-space texture optimization of 3D meshes using PyTorch

40
Emerging
18 humansensinglab/ITI-GEN

[ICCV 2023 Oral, Best Paper Finalist] ITI-GEN: Inclusive Text-to-Image Generation

39
Emerging
19 360CVGroup/PlanGen

Unified layout planning and image generation, ICCV2025

36
Emerging
20 lxa9867/ImageFolder

High-performance Image Tokenizers for VAR and AR

35
Emerging
21 migs2021/migs

MIGS: Meta Image Generation from Scene Graphs (BMVC 2021)

34
Emerging
22 boomb0om/text2image-benchmark

Benchmark for generative image models

34
Emerging
23 inclusionAI/Ming-UniVision

Code release for Ming-UniVision: Joint Image Understanding and Geneation...

34
Emerging
24 FoundationVision/OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint...

34
Emerging
25 roeiherz/CanonicalSg2Im

Code for "Learning Canonical Representations for Scene Graph to Image...

32
Emerging
26 bcmi/F2GAN-Few-Shot-Image-Generation

Fusing-and-Filling GAN (F2GAN) for few-shot image generation, ACM MM2020

31
Emerging
27 KlingAIResearch/IMBA-Loss

[ICCV 2025] Official Implementation of the Paper "Imbalance in Balance:...

31
Emerging
28 TIGER-AI-Lab/VIEScore

Visual Instruction-guided Explainable Metric. Code for "Towards Explainable...

30
Emerging
29 yunqing-me/A-Closer-Look-at-FSIG

The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2022

29
Experimental
30 ali-vilab/IDEA-Bench

Official repository of IDEA-Bench

29
Experimental
31 GordonChen19/STENCIL

[ICIP2025 Spotlight] Efficient and High-Fidelity Image Generation

29
Experimental
32 hysts/CogView2_demo

Unofficial demo app for CogView2

27
Experimental
33 yongchoooon/stellar

[AAAI'26 Workshops Oral] STELLAR: Scene Text Editor for Low-Resource...

27
Experimental
34 1jsingh/Divide-Evaluate-and-Refine

Repo for our NeurIPS 2023 paper on: Divide, Evaluate, and Refine: Evaluating...

27
Experimental
35 microsoft/BizGenEval

Bridging the gap between image generation and real-world design: a benchmark...

27
Experimental
36 wzhlearning/Tex2Sem

Official Implementation of “Tex2Sem: Learning from Textures to Semantics...

26
Experimental
37 zeyofu/Commonsense-T2I

Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models...

26
Experimental
38 bowen-upenn/ControlText

ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering...

26
Experimental
39 EPFL-VILAB/search-over-tokens

SoT is a framework for test-time search in autoregressive (AR) image...

25
Experimental
40 FtmsdtHosseini/IDPL-PFOD

An Image Dataset of Printed Farsi Text for OCR Research

25
Experimental
41 matsuolab/multibanana

[CVPR 2026 Main] MultiBanana: A Challenging Benchmark for Multi-Reference...

24
Experimental
42 hadi-hosseini/T2I-FineEval

[ECCV 2024 Workshop EVAL-FoMo] T2I-FineEval: Fine-Grained Compositional...

24
Experimental
43 360CVGroup/HiCo_T2I

Layout Conditioned Image Generation, NeurIPS2024

24
Experimental
44 AIGCResearch/styleme3d

Official repo for StyleMe3D

23
Experimental
45 j-min/VPGen

Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)

23
Experimental
46 yczhou001/LongBench-T2I

Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex...

23
Experimental
47 pmh9960/GCDP

Official PyTorch implementation of "Learning to Generate Semantic Layouts...

21
Experimental
48 K1nght/T2I-ConBench

T2I-ConBench: Text-to-Image Benchmark for Continual Post-training

21
Experimental
49 HaoyuanYang-2023/ImagineFSL

Official implementation of "ImagineFSL: Self-Supervised Pretraining Matters...

17
Experimental
50 AIGCResearch/Awesome-Story-Visualization

A Survey of Story Visualization

12
Experimental
51 ai-forever/RusCode

Official repository for RusCode benchmark dataset (NAACL 2025)

12
Experimental