michelecafagna26/VinVL
Original VinVL (and Oscar) repo with API designed for an easy inference
This project helps developers integrate powerful image captioning and scene description capabilities into their applications. It takes visual features extracted from images, along with optional object labels, and generates natural language captions or detailed scene descriptions. This is primarily used by developers who need to add advanced vision-language understanding to their software, such as for content moderation, accessibility features, or automated content generation.
No commits in the last 6 months.
Use this if you are a developer looking for an easy-to-use API to add state-of-the-art image captioning or scene description generation to your Python application.
Not ideal if you are an end-user without programming experience, as this tool requires coding to implement and use.
Stars
8
Forks
1
Language
Python
License
—
Category
Last pushed
Jun 27, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/michelecafagna26/VinVL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle