shreydan/VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.

/ 100

Experimental

This project helps you automatically generate descriptive text captions for images. You provide an image, and it outputs a sentence describing the content of that image. This is useful for anyone working with large collections of images, such as content managers, digital archivists, or e-commerce professionals.

No commits in the last 6 months.

Use this if you need to quickly generate textual descriptions for individual images to improve searchability, accessibility, or content organization.

Not ideal if you require highly nuanced or creative descriptions, as the model generates factual, straightforward captions.

image-description content-tagging digital-asset-management visual-search accessibility-enhancement

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

senadkurtisi/pytorch-image-captioning

Transformer & CNN Image Captioning model in PyTorch.

tojiboyevf/image_captioning

Deep Learning Final project 2022

Hamtech-ai/Persian-Image-Captioning

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Explore Transformer Models

All categories Trending Transformer directory Insights