Multimodal Fusion Transformers Transformer Models

Tools for combining multiple input modalities (text, image, audio, video, tabular data) using transformer architectures to perform unified tasks. Does NOT include single-modality models, recommendation systems, or domain-specific applications like robotics/translation unless multimodal fusion is the primary focus.

There are 36 multimodal fusion transformers models tracked. The highest-rated is dorarad/gansformer at 47/100 with 1,346 stars.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-fusion-transformers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 dorarad/gansformer

Generative Adversarial Transformers

47
Emerging
2 j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

46
Emerging
3 invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

44
Emerging
4 rkansal47/MPGAN

The message passing GAN https://arxiv.org/abs/2106.11535 and generative...

44
Emerging
5 Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

42
Emerging
6 sisinflab/Ducho

Ducho is a Python framework aimed to extract multimodal features used in...

40
Emerging
7 zinengtang/TVLT

PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)

39
Emerging
8 albrateanu/ModalFormer

[2025] ModalFormer: Multimodal Transformer for Low-Light Image Enhancement

38
Emerging
9 OFA-Sys/OFASys

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

38
Emerging
10 GT-RIPL/robo-vln

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics...

37
Emerging
11 Shanghai-Digital-Brain-Laboratory/BDM-DB1

A large-scale multi-modal pre-trained model

37
Emerging
12 kyegomez/VortexFusion

Transformers + Mambas + LSTMS All in One Model

37
Emerging
13 devdhananjay14/multim

🔍 Experiment with neural networks for binary classification on multimodal...

36
Emerging
14 Jathurshan0330/Cross-Modal-Transformer

Official repository of cross-modal transformer for interpretable automatic...

35
Emerging
15 aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting...

34
Emerging
16 DunnBC22/Vision_Audio_and_Multimodal_Projects

This repository includes all computer vision, audio, document AI, and...

34
Emerging
17 GiorgiaAuroraAdorni/gansformer-reproducibility-challenge

Replication of the novel Generative Adversarial Transformer.

33
Emerging
18 KhoiDOO/vitvqganvae

Benchmark for Evaluating Data Reconstruction using Vector Quantization

32
Emerging
19 wangxiao5791509/MultiModal_BigModels_Survey

[MIR-2023-Survey] A continuously updated paper list for multi-modal...

32
Emerging
20 AILab-CVC/M2PT

[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data...

32
Emerging
21 PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo

Physical AI models understand physical common sense and generate appropriate...

31
Emerging
22 kyegomez/primus

A multimodal foundation model for humanoid robotics that integrates multiple...

31
Emerging
23 andreaceto/multimodal-crisis-classification

Multimodal Classification of Crisis-related social media contents.

30
Emerging
24 chasemetoyer/visual-internal-reasoning

Investigates causal visual reasoning in transformers by integrating discrete...

29
Experimental
25 IsaacRodgz/multimodal-transformers-movies

Experiments with multimodal deep learning models based on transformers

28
Experimental
26 kyegomez/Multi-Model-Training

An experimental repository on research for training multiple models all at...

28
Experimental
27 mosh98/MMBT

Multi modal BiTransformer [ Reimplementation ] in Pytorch That Acutally Works !

24
Experimental
28 5seoyoung/lightweight-multimodal-healthcare-ai

[Research] Efficient multimodal transformers for clinical decision support...

22
Experimental
29 jianzhnie/MultimodalTookit

Incorporate Image, Text and Tabular Data with HuggingFace Transformers

21
Experimental
30 Kind-Unes/MultiModal-Model

This project is a multi-modal model that works with multiple models combined...

21
Experimental
31 Manu-Fraile/Multimodal-Human-Robot-Feedback

A novel approach of Transformers and CNNs for Human Feedback classification

18
Experimental
32 ToshikiNakamura0412/docker_lightglue

Docker image for LightGlue

17
Experimental
33 Shreya831/multimodal-ai-visual-analyzer

Multimodal AI system that detects objects in images and answers questions...

14
Experimental
34 muanderson/Multimodal-transformer-product-matching

Repo for multimodal transformer model to product match on the Shopee Product...

13
Experimental
35 koninik/multimodal_machine_translation

A PyTorch implementation of a Transformer Network for Machine Translation...

11
Experimental
36 ShowMeModel/transformers-multimodal-example

Example of a multimodal (end-to-end) deep learning model with transformers...

11
Experimental