Image Captioning Transformers Transformer Models

Tools for generating textual descriptions from images and videos using transformer-based encoder-decoder architectures. Includes image-to-text, video captioning, and dense captioning systems. Does NOT include general vision-language models for other tasks (VQA, retrieval), text-to-image generation, or vision-only feature extraction.

There are 33 image captioning transformers models tracked. The highest-rated is zarzouram/image_captioning_with_transformers at 38/100 with 68 stars.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=image-captioning-transformers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

38
Emerging
2 rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a...

36
Emerging
3 senadkurtisi/pytorch-image-captioning

Transformer & CNN Image Captioning model in PyTorch.

35
Emerging
4 tojiboyevf/image_captioning

Deep Learning Final project 2022

34
Emerging
5 Hamtech-ai/Persian-Image-Captioning

A Persian Image Captioning model based on Vision Encoder Decoder Models of...

34
Emerging
6 tanishqgautam/Image-Captioning

Implemented 3 different architectures to tackle the Image Caption problem,...

33
Emerging
7 ilya16/deephumor

DeepHumor: Image-based Meme Generation using Deep Learning

27
Experimental
8 slSeanWU/beats-conformer-bart-audio-captioner

PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning...

26
Experimental
9 nateraw/discord-image-captioning-bot

A Discord bot for captioning images

24
Experimental
10 Devnetly/image-captioning

Image captioning model & application based on transformers.

24
Experimental
11 Technolog796/image_captioning

Создание русскоязычной модели для image captioning

24
Experimental
12 farukalamai/background-removal-birefnet

Background Removal Application using BiRefNet

24
Experimental
13 vishaln15/roco-image-captioning

Enhanced Image Captioning on ROCO Multimodal dataset using step-by-step distillation

23
Experimental
14 shreydan/VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model...

23
Experimental
15 PRITHIVSAKTHIUR/Florence-2-Image-Caption

This application utilizes the powerful Florence-2 vision-language model from...

23
Experimental
16 therrshan/image-captioning

Comparitive analysis of image captioning model using RNN, BiLSTM and...

23
Experimental
17 Merterm/COSMic

Public repo for the paper: "COSMic: A Coherence-Aware Generation Metric for...

19
Experimental
18 anto18671/image-to-dense-caption

Generate vivid, human-like captions for portrait images using the...

18
Experimental
19 AHMEDSANA/Image-Captioning-with-ViT-and-BERT

A concise image-captioning pipeline that fine-tunes a ViT encoder with a...

18
Experimental
20 theSohamTUmbare/DETR_powered_Image_Captioning

The excellent Image captioning model using the DETR inspired architecture

18
Experimental
21 suryanshgupta9933/Scene-Script

An image to text model/pipeline using VIT and Transformers and deployment...

18
Experimental
22 jshwanth/image-captioning

Error-centric comparison of CNN-LSTM, attention-based, and transformer...

14
Experimental
23 sharpsalt/Captionforge-Multimodal-Image-Captioning-System

This PyTorch-based image captioning model uses ResNet-50 encoder and...

13
Experimental
24 Akhan521/Snaption

📸 My first deep dive into multi-modal ML! Built an end-to-end image...

13
Experimental
25 Riya-l209/ImageCaptioning_Segmentation

AI-powered Image Captioning & Segmentation | ViT-GPT2 + Mask R-CNN |...

13
Experimental
26 batmac/captioner-api

API to get captions for images using a transformers pipeline

11
Experimental
27 pedrorio/image_caption_augmentation

A text generation library to paraphrase image captions using back...

11
Experimental
28 MuhammadHadiofficial/urdu_caption_generator

This repository contains the implementation of a Transformer-based model for...

11
Experimental
29 tanishqgautam/Reddit-RoastMe-Captioning

A Reddit RoastMe Image Captioning System using Transformers

11
Experimental
30 Mahmood-Anaam/violet

Violet: A Vision-Language model for generating Arabic image captions using a...

11
Experimental
31 larissasantesso/IA025A_FinalProject_ImageCaptioning

Image Captioning using Transformers in PyTorch

10
Experimental
32 Abhimanyu08/image_caption

Captioning images using a CNN and Transformer architecture

10
Experimental
33 Hansha111/NeuroLens

🔍NeuroLens - your AI companion for image captioning and visual question...

10
Experimental