AILab-CVC/M2PT
[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
This project helps AI researchers and practitioners enhance their machine learning models for tasks like image, video, point cloud, or audio recognition. It takes an existing Transformer model trained on one type of data (e.g., images) and improves its performance by integrating insights from a separate Transformer trained on entirely different, unrelated data (e.g., audio). The result is a more robust and accurate model for the original task without needing new task-specific data or incurring extra processing costs during use.
101 stars. No commits in the last 6 months.
Use this if you are a machine learning researcher or engineer looking to boost the accuracy of your Transformer models for specific tasks like image or audio classification by leveraging knowledge from models trained on other data types.
Not ideal if you are looking for a ready-to-use application rather than a method for improving existing deep learning models, or if your tasks do not involve Transformer architectures.
Stars
101
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/AILab-CVC/M2PT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dorarad/gansformer
Generative Adversarial Transformers
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
rkansal47/MPGAN
The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle...
Yachay-AI/byt5-geotagging
Confidence and Byt5 - based geotagging model predicting coordinates from text alone.