shreydan/VisionGPT2
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
This project helps you automatically generate descriptive text captions for images. You provide an image, and it outputs a sentence describing the content of that image. This is useful for anyone working with large collections of images, such as content managers, digital archivists, or e-commerce professionals.
No commits in the last 6 months.
Use this if you need to quickly generate textual descriptions for individual images to improve searchability, accessibility, or content organization.
Not ideal if you require highly nuanced or creative descriptions, as the model generates factual, straightforward captions.
Stars
49
Forks
3
Language
Jupyter Notebook
License
—
Category
Last pushed
Oct 02, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/shreydan/VisionGPT2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zarzouram/image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
senadkurtisi/pytorch-image-captioning
Transformer & CNN Image Captioning model in PyTorch.
tojiboyevf/image_captioning
Deep Learning Final project 2022
Hamtech-ai/Persian-Image-Captioning
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.