AILab-CVC/M2PT

[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

/ 100

Emerging

This project helps AI researchers and practitioners enhance their machine learning models for tasks like image, video, point cloud, or audio recognition. It takes an existing Transformer model trained on one type of data (e.g., images) and improves its performance by integrating insights from a separate Transformer trained on entirely different, unrelated data (e.g., audio). The result is a more robust and accurate model for the original task without needing new task-specific data or incurring extra processing costs during use.

101 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer looking to boost the accuracy of your Transformer models for specific tasks like image or audio classification by leveraging knowledge from models trained on other data types.

Not ideal if you are looking for a ready-to-use application rather than a method for improving existing deep learning models, or if your tasks do not involve Transformer architectures.

deep-learning-research computer-vision audio-analysis 3d-point-cloud-processing multimodal-ai

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

101

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

dorarad/gansformer

Generative Adversarial Transformers

j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

rkansal47/MPGAN

The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle...

Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

Explore Transformer Models

All categories Trending Transformer directory Insights