xu-shitong/diffusion-image-captioning

implementation of paper https://arxiv.org/abs/2210.04559

40
/ 100
Emerging

This project helps researchers and academics in AI and machine learning explore a novel approach to image captioning. It takes an image as input and generates a descriptive text caption by leveraging diffusion models, which are typically used for image generation. The primary users are researchers interested in state-of-the-art text generation techniques for vision tasks.

Use this if you are an AI researcher experimenting with diffusion models for text generation or looking for alternative methods to traditional autoregressive models for image captioning.

Not ideal if you need a production-ready image captioning system for immediate use or are not comfortable with setting up research-grade code and datasets.

AI Research Image Captioning Natural Language Processing Computer Vision Diffusion Models
No License No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 18 / 25

How are scores calculated?

Stars

57

Forks

14

Language

Jupyter Notebook

License

Last pushed

Nov 26, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/xu-shitong/diffusion-image-captioning"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.