xu-shitong/diffusion-image-captioning
implementation of paper https://arxiv.org/abs/2210.04559
This project helps researchers and academics in AI and machine learning explore a novel approach to image captioning. It takes an image as input and generates a descriptive text caption by leveraging diffusion models, which are typically used for image generation. The primary users are researchers interested in state-of-the-art text generation techniques for vision tasks.
Use this if you are an AI researcher experimenting with diffusion models for text generation or looking for alternative methods to traditional autoregressive models for image captioning.
Not ideal if you need a production-ready image captioning system for immediate use or are not comfortable with setting up research-grade code and datasets.
Stars
57
Forks
14
Language
Jupyter Notebook
License
—
Category
Last pushed
Nov 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/xu-shitong/diffusion-image-captioning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
bghira/SimpleTuner
A general fine-tuning kit geared toward image/video/audio diffusion models.
mcmonkeyprojects/SwarmUI
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an...
nateraw/stable-diffusion-videos
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
TheDesignFounder/DreamLayer
Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.