ProGamerGov/VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

/ 100

Experimental

This tool helps automate the process of describing large collections of images using AI. You provide it with one or more folders of images, and it outputs a structured file containing detailed and short textual descriptions for each image. This is ideal for researchers, content managers, or anyone needing to categorize or search through many images.

No commits in the last 6 months.

Use this if you need to automatically generate comprehensive captions for thousands or millions of images.

Not ideal if you only have a few images to caption or if you need highly specialized captions requiring human expert knowledge.

image-cataloging digital-asset-management visual-content-analysis data-labeling media-archive

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights