sharpsalt/Captionforge-Multimodal-Image-Captioning-System

This PyTorch-based image captioning model uses ResNet-50 encoder and Transformer decoder to generate descriptive captions from Flickr8k images. Features include data augmentation (image transforms, synonym replacement), training with AdamW and early stopping, and inference via greedy, beam search (k=3,5,10), or nucleus sampling.

/ 100

Experimental

No License No Package No Dependents

Maintenance 10 / 25

Adoption 0 / 25

Maturity 3 / 25

Community 0 / 25

How are scores calculated?

Stars

—

Forks

—

Language

Python

License

—

Category

image-captioning-transformers

Last pushed

Mar 03, 2026

Commits (30d)

GitHub

Image Captioning Transformers · 33 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sharpsalt/Captionforge-Multimodal-Image-Captioning-System"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

senadkurtisi/pytorch-image-captioning

Transformer & CNN Image Captioning model in PyTorch.

tojiboyevf/image_captioning

Deep Learning Final project 2022

Hamtech-ai/Persian-Image-Captioning

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Explore Transformer Models

All categories Trending Transformer directory Insights