Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

/ 100

Emerging

Vilio helps researchers and machine learning engineers analyze how images and text interact, particularly for tasks like detecting harmful content. You provide it with multimodal data (images with associated text, like memes), and it outputs predictions or classifications based on advanced vision-language models. It's designed for those working with cutting-edge AI for content understanding.

No commits in the last 6 months.

Use this if you are a researcher or ML engineer developing or evaluating state-of-the-art vision-language models for tasks that combine image and text understanding.

Not ideal if you need a simple, off-the-shelf tool for basic image or text analysis without deep dives into model architectures or multimodal learning.

Multimodal AI Content Moderation Deep Learning Research Natural Language Processing Computer Vision

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

chuanyangjin/MMToM-QA

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

kyegomez/PALM-E

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

Explore Transformer Models

All categories Trending Transformer directory Insights