michelecafagna26/VinVL

Original VinVL (and Oscar) repo with API designed for an easy inference

/ 100

Experimental

This project helps developers integrate powerful image captioning and scene description capabilities into their applications. It takes visual features extracted from images, along with optional object labels, and generates natural language captions or detailed scene descriptions. This is primarily used by developers who need to add advanced vision-language understanding to their software, such as for content moderation, accessibility features, or automated content generation.

No commits in the last 6 months.

Use this if you are a developer looking for an easy-to-use API to add state-of-the-art image captioning or scene description generation to your Python application.

Not ideal if you are an end-user without programming experience, as this tool requires coding to implement and use.

image-captioning computer-vision natural-language-generation application-development AI-integration

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 8 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

chuanyangjin/MMToM-QA

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

Explore Transformer Models

All categories Trending Transformer directory Insights