shreydan/VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.

23
/ 100
Experimental

This project helps you automatically generate descriptive text captions for images. You provide an image, and it outputs a sentence describing the content of that image. This is useful for anyone working with large collections of images, such as content managers, digital archivists, or e-commerce professionals.

No commits in the last 6 months.

Use this if you need to quickly generate textual descriptions for individual images to improve searchability, accessibility, or content organization.

Not ideal if you require highly nuanced or creative descriptions, as the model generates factual, straightforward captions.

image-description content-tagging digital-asset-management visual-search accessibility-enhancement
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

49

Forks

3

Language

Jupyter Notebook

License

Last pushed

Oct 02, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/shreydan/VisionGPT2"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.