sharpsalt/Captionforge-Multimodal-Image-Captioning-System
This PyTorch-based image captioning model uses ResNet-50 encoder and Transformer decoder to generate descriptive captions from Flickr8k images. Features include data augmentation (image transforms, synonym replacement), training with AdamW and early stopping, and inference via greedy, beam search (k=3,5,10), or nucleus sampling.
Stars
—
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sharpsalt/Captionforge-Multimodal-Image-Captioning-System"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zarzouram/image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
senadkurtisi/pytorch-image-captioning
Transformer & CNN Image Captioning model in PyTorch.
tojiboyevf/image_captioning
Deep Learning Final project 2022
Hamtech-ai/Persian-Image-Captioning
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.