ProGamerGov/VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
This tool helps automate the process of describing large collections of images using AI. You provide it with one or more folders of images, and it outputs a structured file containing detailed and short textual descriptions for each image. This is ideal for researchers, content managers, or anyone needing to categorize or search through many images.
No commits in the last 6 months.
Use this if you need to automatically generate comprehensive captions for thousands or millions of images.
Not ideal if you only have a few images to caption or if you need highly specialized captions requiring human expert knowledge.
Stars
45
Forks
—
Language
Python
License
MIT
Category
Last pushed
Apr 23, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ProGamerGov/VLM-Captioning-Tools"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice