zerovl/ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
This project helps machine learning researchers efficiently train models to understand both images and text, even with limited computational resources and data. You input image-text pairs, and it produces a powerful model capable of tasks like image-text retrieval or image classification. It's designed for academic researchers and practitioners who need high-performing vision-language models without access to supercomputers or massive datasets.
No commits in the last 6 months.
Use this if you are a machine learning researcher or practitioner aiming to pre-train vision-language models but are constrained by typical academic or small-scale industry computing environments and data availability.
Not ideal if you already have access to vast computational resources (hundreds of GPUs, specialized TPUs) and billion-scale datasets for model training.
Stars
46
Forks
5
Language
Python
License
MIT
Category
Last pushed
Sep 29, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zerovl/ZeroVL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle