Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
Vilio helps researchers and machine learning engineers analyze how images and text interact, particularly for tasks like detecting harmful content. You provide it with multimodal data (images with associated text, like memes), and it outputs predictions or classifications based on advanced vision-language models. It's designed for those working with cutting-edge AI for content understanding.
No commits in the last 6 months.
Use this if you are a researcher or ML engineer developing or evaluating state-of-the-art vision-language models for tasks that combine image and text understanding.
Not ideal if you need a simple, off-the-shelf tool for basic image or text analysis without deep dives into model architectures or multimodal learning.
Stars
91
Forks
28
Language
Python
License
MIT
Category
Last pushed
Jun 08, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Muennighoff/vilio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
kyegomez/PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"