zeyofu/Commonsense-T2I

Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]

/ 100

Experimental

This project helps evaluate how well text-to-image (T2I) models understand real-world common sense when generating images. You provide pairs of similar text prompts with subtle differences in commonsense context (e.g., 'lightbulb without electricity' vs. 'lightbulb with electricity'). The project then generates images and scores how accurately they reflect those everyday concepts. This is for researchers and practitioners who develop or use AI image generation models and need to assess their nuanced understanding of the world.

No commits in the last 6 months.

Use this if you are a researcher or developer who wants to benchmark how well your text-to-image models handle everyday common sense compared to other state-of-the-art models.

Not ideal if you are a general user simply looking to generate images, as this tool is focused on evaluating model performance, not casual image creation.

AI model evaluation image generation commonsense reasoning text-to-image models visual AI research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

Vchitect/VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

VectorSpaceLab/OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

EndlessSora/focal-frequency-loss

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

JIA-Lab-research/DreamOmni2

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing...

SkyworkAI/UniPic

Open-source SOTA multi-image editing model

Explore Diffusion Models

All categories Trending Diffusion directory Insights