3D Vision Transformers Transformer Models
Tools for 3D computer vision tasks using transformers, including depth estimation, multi-view geometry, structure-from-motion, point cloud processing, 3D pose estimation, and novel view synthesis. Does NOT include general 2D vision tasks, 2D pose estimation, or 3D shape generation without vision inputs.
There are 85 3d vision transformers models tracked. 4 score above 50 (established tier). The highest-rated is NVlabs/MambaVision at 63/100 with 2,060 stars.
Get all 85 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=3d-vision-transformers&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid... |
|
Established |
| 2 |
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the... |
|
Established |
| 3 |
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model" |
|
Established |
| 4 |
autonomousvision/transfuser
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for... |
|
Established |
| 5 |
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high... |
|
Emerging |
| 6 |
dali92002/DocEnTR
DocEnTr: An end-to-end document image enhancement transformer - ICPR 2022 |
|
Emerging |
| 7 |
fashn-AI/fashn-human-parser
Human parsing model for fashion and virtual try-on applications |
|
Emerging |
| 8 |
buaacyw/MeshAnything
[ICLR 2025] From anything to mesh like human artists. Official impl. of... |
|
Emerging |
| 9 |
buaacyw/MeshAnythingV2
[ICCV 2025] From anything to mesh like human artists. Official impl. of... |
|
Emerging |
| 10 |
linjieli222/HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for... |
|
Emerging |
| 11 |
csiro-robotics/HOTFormerLoc
[IEEE/CVF CVPR 2025] Hierarchical Octree Transformer for Versatile Lidar... |
|
Emerging |
| 12 |
wgcban/HyperTransformer
[CVPR'22] HyperTransformer: A Textural and Spectral Feature Fusion... |
|
Emerging |
| 13 |
PediaMedAI/AggPose
[IJCAI 2022] Official PyTorch implementation of AggPose: Deep Aggregation... |
|
Emerging |
| 14 |
AllenXiangX/SnowflakeNet
(TPAMI 2023) Snowflake Point Deconvolution for Point Cloud Completion and... |
|
Emerging |
| 15 |
snktshrma/ngps_flight
Global vision positioning system for UAVs in outdoor GNSS-denied environments |
|
Emerging |
| 16 |
jhcho99/GSRTR
[BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition... |
|
Emerging |
| 17 |
ChenRocks/UNITER
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt... |
|
Emerging |
| 18 |
AyushExel/trolo
An SDK for Transformers + YOLO and other SSD family models |
|
Emerging |
| 19 |
padeler/PE-former
2D Human Pose estimation using transformers. Implementation in Pytorch |
|
Emerging |
| 20 |
xingyizhou/GTR
Global Tracking Transformers, CVPR 2022 |
|
Emerging |
| 21 |
hasanirtiza/PedesFormer-Transformer-Networks-For-Pedestrian-Detection
Transformer Networks for Pedestrian Detection |
|
Emerging |
| 22 |
icon-lab/SLATER
Official implementation of the paper: Unsupervised MRI Reconstruction via... |
|
Emerging |
| 23 |
jhcho99/CoFormer
[CVPR'22] Official PyTorch Implementation of "Collaborative Transformers for... |
|
Emerging |
| 24 |
VachanVY/Transfusion.torch
PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse... |
|
Emerging |
| 25 |
kyegomez/AudioMamba
Implementation of the paper: "Audio Mamba: Bidirectional State Space Model... |
|
Emerging |
| 26 |
desaixie/zeroverse
Official code for NeurIPS 2024 paper LRM-Zero: Training Large Reconstruction... |
|
Emerging |
| 27 |
yihongXU/TransCenter
This is the official implementation of TransCenter (TPAMI). The code and... |
|
Emerging |
| 28 |
kyegomez/MambaDecoderBlock
MambaDecoderBlock is a novel decoder architecture that replaces traditional... |
|
Emerging |
| 29 |
DEV-D-GR8/SignSense
This repository contains a transformer-based model for real-time American... |
|
Emerging |
| 30 |
sam575/axial-gan
Code for "Simultaneous Face Hallucination and Translation for Thermal to... |
|
Emerging |
| 31 |
AndrewBoessen/PerfectRep
PerfectRep is a 3D pose estimation model tailored specifically for... |
|
Emerging |
| 32 |
kyegomez/VLM-Mamba
We introduce VLM-Mamba, the first Vision-Language Model built entirely on... |
|
Emerging |
| 33 |
ShengcaiLiao/TransMatcher
[NeurIPS 2021] TransMatcher: Deep Image Matching Through Transformers for... |
|
Emerging |
| 34 |
XunshanMan/MVGFormer
This is the official implementation of the work presented at CVPR 2024,... |
|
Emerging |
| 35 |
zubair-irshad/NeRF-MAE
[ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders... |
|
Emerging |
| 36 |
xmartlabs/spoter-embeddings
Create embeddings from sign pose videos using Transformers |
|
Emerging |
| 37 |
Merterm/Modeling-Intensification-for-SLG
Public repo for the paper: "Modeling Intensification for Sign Language... |
|
Emerging |
| 38 |
NeurAI-Lab/MT-SfMLearner
Official code for 'Transformers in Unsupervised Structure-from-Motion' and... |
|
Emerging |
| 39 |
bhanuprathap2000/sign-language-recognition
This repo contains the code for sign-language-recognition as part of our... |
|
Emerging |
| 40 |
hukenovs/slovo
Slovo: Russian Sign Language Dataset and Models |
|
Emerging |
| 41 |
GregorKobsik/ImageTransformer
This notebook shows a basic implementation of a transformer (decoder)... |
|
Emerging |
| 42 |
kyegomez/Simba
A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified... |
|
Emerging |
| 43 |
eslambakr/LAR-Look-Around-and-Refer
This is the official implementation for our paper;"LAR:Look Around and Refer". |
|
Experimental |
| 44 |
lamm-mit/FieldCompleter
GAN/convolutional and Transformer models to predict missing mechanical... |
|
Experimental |
| 45 |
loubnabnl/Sign-Segmentation-with-Transformers
Detection of temporal boundaries in sign language videos, as part of the... |
|
Experimental |
| 46 |
tthinking/MATR
[IEEE TIP 2022] Official implementation of MATR: Multimodal Medical Image... |
|
Experimental |
| 47 |
sauradip/STALE
[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot... |
|
Experimental |
| 48 |
xiuqhou/DAPE
[AAAI2026] Official implementation of the paper "DAPE: Harmonizing... |
|
Experimental |
| 49 |
sauradip/fewshotQAT
[BMVC 2021]: Official PyTorch implementation of : "Few Shot Temporal Action... |
|
Experimental |
| 50 |
kyegomez/SimpleMamba
Implementation of a modular, high-performance, and simplistic mamba for... |
|
Experimental |
| 51 |
exitudio/GaitMixer
Official repository for "GaitMixer: Skeleton-based Gait Representation... |
|
Experimental |
| 52 |
icon-lab/TranSMS
Official Implementation of Transformers for System Matrix Super-resolution (TranSMS) |
|
Experimental |
| 53 |
musialski-lab/LayoutEnhancer
Source code for the Paper: Layout Enahancer |
|
Experimental |
| 54 |
AshutoshKulkarni4998/AIDTransformer
Inference code for "Aerial Image Dehazing with Attentive Deformable... |
|
Experimental |
| 55 |
albrateanu/KANT
[Sensors 2025] Enhancing Low-Light Images with Kolmogorov–Arnold Networks in... |
|
Experimental |
| 56 |
mabdn/feasible-interpretable-trajectory-prediction
A Transformer neural network for autonomous driving to predict the future... |
|
Experimental |
| 57 |
mustafa1728/Person-Re-ID
Experiments on some existing Re-ID methods on a different dataset with... |
|
Experimental |
| 58 |
artem-gorodetskii/TransPix2Pix
Rethinking the Pix2Pix architecture with attention mechanisms and transformers. |
|
Experimental |
| 59 |
LookUpMark/dylem-grid
DYLEM-GRID is a deep learning project for dynamic hand gesture recognition... |
|
Experimental |
| 60 |
RisabBiswas/T2T-BinFormer
SOTA Document Image Enhancement - T2T-BinFormer: Effective Document Image... |
|
Experimental |
| 61 |
arafathosense/Real-Time-Face-Glitch-Effect-Controlled-by-Hand-Gestures
A real-time interactive computer vision art project using OpenCV. Control a... |
|
Experimental |
| 62 |
Abdullah-Shah-26/Sign-Cast
Real-time AI-powered voice-to-sign language translator. Converts speech to... |
|
Experimental |
| 63 |
HowieMa/PPT
[ECCV 2022] "PPT: token-Pruned Pose Transformer for monocular and multi-view... |
|
Experimental |
| 64 |
Microsatellites-and-Space-Microsystems/pose_estimation_domain_gap
Two methods for solving domain gap in satellite pose estimation in space... |
|
Experimental |
| 65 |
gmongaras/2Mamba2Furious
Code for the paper "2Mamba2Furious: Linear in complexity, competitive in accuracy" |
|
Experimental |
| 66 |
freddxvill/Proyecto_Traductor_de_la_LSB
Traductor de Lengua de Señas Boliviana (LSB) a texto utilizando redes... |
|
Experimental |
| 67 |
zwh0527/AGRNet
Code for "Mining Global Relativity Consistency without Neighborhood Modeling... |
|
Experimental |
| 68 |
aliebayani/TransGAN-DX
A Hybrid Transformer-GAN Approach for Cardiovascular Disease Diagnosis |
|
Experimental |
| 69 |
anupvna/street-view-geolocation
Multi-view Deep Learning pipeline using PyTorch to predict global... |
|
Experimental |
| 70 |
GregorKobsik/Octree-Transformer
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically... |
|
Experimental |
| 71 |
junayed-hasan/spontaneous-smile-recognition
A deep learning framework for distinguishing spontaneous from posed smiles... |
|
Experimental |
| 72 |
tthinking/SETFusion
[PR 2026] Official implementation of SETFusion: A Semantic Transformer for... |
|
Experimental |
| 73 |
rukmini-17/scalable-sequence-modeling
Comparative analysis of Mamba vs. Transformers trained from scratch.... |
|
Experimental |
| 74 |
codedmachine111/Image_generation_using_transformers_in_GANs
Image Generation using Transformers in GANs |
|
Experimental |
| 75 |
ImKeTT/ReSee
[EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual... |
|
Experimental |
| 76 |
botmahn/slowfast
An unofficial pytorch implementation of "Early Anticipation of Driving... |
|
Experimental |
| 77 |
fabiosilva781/top-cvpr-2025-papers
🌟 Discover top CVPR 2025 papers for insightful research in computer vision,... |
|
Experimental |
| 78 |
Ricardosc97/T-PIE
Pedestrian Intention Estimation using stacked Transformers Encoders |
|
Experimental |
| 79 |
bihani-g/LASeR
Code and Analysis for our paper titled 'Low Anisotropy Sense Retrofitting... |
|
Experimental |
| 80 |
tayo4christ/transformer-gesture
Real-time gesture recognition system using Vision Transformers, ONNX, and... |
|
Experimental |
| 81 |
aditi184/Person_Re-Identification
Person ReIdentification using Locally Aware Transformers |
|
Experimental |
| 82 |
tthinking/EAT
[IEEE TMM 2025] Official implementation of EAT: Multi-Exposure Image Fusion... |
|
Experimental |
| 83 |
Geetanshu0410/Gesture-Bridge
Sign Language Translator GestureBridge is a cutting-edge AI-driven system... |
|
Experimental |
| 84 |
harshavardhan-patil/where-am-i
Transformer backed geo-localizer to find an address in the USA based on... |
|
Experimental |
| 85 |
retkowsky/synthetic_images
Synthetic images with Transformers |
|
Experimental |