3D Vision Transformers Transformer Models

Tools for 3D computer vision tasks using transformers, including depth estimation, multi-view geometry, structure-from-motion, point cloud processing, 3D pose estimation, and novel view synthesis. Does NOT include general 2D vision tasks, 2D pose estimation, or 3D shape generation without vision inputs.

There are 85 3d vision transformers models tracked. 4 score above 50 (established tier). The highest-rated is NVlabs/MambaVision at 63/100 with 2,060 stars.

Get all 85 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=3d-vision-transformers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	NVlabs/MambaVision [CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid...	63	Established	2,060	Python
2	sign-language-translator/sign-language-translator Python library & framework to build custom translators for the...	58	Established	329	Python
3	kyegomez/Jamba PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"	56	Established	208	Python
4	autonomousvision/transfuser [PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for...	55	Established	1,516	Python
5	kyegomez/MultiModalMamba A novel implementation of fusing ViT with Mamba into a fast, agile, and high...	49	Emerging	465	Python
6	dali92002/DocEnTR DocEnTr: An end-to-end document image enhancement transformer - ICPR 2022	47	Emerging	186	Jupyter Notebook
7	fashn-AI/fashn-human-parser Human parsing model for fashion and virtual try-on applications	47	Emerging	24	Python
8	buaacyw/MeshAnything [ICLR 2025] From anything to mesh like human artists. Official impl. of...	44	Emerging	2,272	Python
9	buaacyw/MeshAnythingV2 [ICCV 2025] From anything to mesh like human artists. Official impl. of...	44	Emerging	970	Python
10	linjieli222/HERO Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for...	44	Emerging	236	Python
11	csiro-robotics/HOTFormerLoc [IEEE/CVF CVPR 2025] Hierarchical Octree Transformer for Versatile Lidar...	43	Emerging	26	Python
12	wgcban/HyperTransformer [CVPR'22] HyperTransformer: A Textural and Spectral Feature Fusion...	41	Emerging	140	Python
13	PediaMedAI/AggPose [IJCAI 2022] Official PyTorch implementation of AggPose: Deep Aggregation...	40	Emerging	30	Python
14	AllenXiangX/SnowflakeNet (TPAMI 2023) Snowflake Point Deconvolution for Point Cloud Completion and...	40	Emerging	200	Python
15	snktshrma/ngps_flight Global vision positioning system for UAVs in outdoor GNSS-denied environments	40	Emerging	11	C++
16	jhcho99/GSRTR [BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition...	40	Emerging	27	Python
17	ChenRocks/UNITER Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt...	39	Emerging	800	Python
18	AyushExel/trolo An SDK for Transformers + YOLO and other SSD family models	39	Emerging	64	Jupyter Notebook
19	padeler/PE-former 2D Human Pose estimation using transformers. Implementation in Pytorch	39	Emerging	34	Python
20	xingyizhou/GTR Global Tracking Transformers, CVPR 2022	38	Emerging	379	Python
21	hasanirtiza/PedesFormer-Transformer-Networks-For-Pedestrian-Detection Transformer Networks for Pedestrian Detection	38	Emerging	43	Python
22	icon-lab/SLATER Official implementation of the paper: Unsupervised MRI Reconstruction via...	38	Emerging	41	Python
23	jhcho99/CoFormer [CVPR'22] Official PyTorch Implementation of "Collaborative Transformers for...	37	Emerging	50	Python
24	VachanVY/Transfusion.torch PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse...	37	Emerging	28	Python
25	kyegomez/AudioMamba Implementation of the paper: "Audio Mamba: Bidirectional State Space Model...	37	Emerging	14	Shell
26	desaixie/zeroverse Official code for NeurIPS 2024 paper LRM-Zero: Training Large Reconstruction...	37	Emerging	153	Python
27	yihongXU/TransCenter This is the official implementation of TransCenter (TPAMI). The code and...	36	Emerging	118	—
28	kyegomez/MambaDecoderBlock MambaDecoderBlock is a novel decoder architecture that replaces traditional...	35	Emerging	5	Python
29	DEV-D-GR8/SignSense This repository contains a transformer-based model for real-time American...	35	Emerging	12	Jupyter Notebook
30	sam575/axial-gan Code for "Simultaneous Face Hallucination and Translation for Thermal to...	35	Emerging	13	Python
31	AndrewBoessen/PerfectRep PerfectRep is a 3D pose estimation model tailored specifically for...	33	Emerging	7	Python
32	kyegomez/VLM-Mamba We introduce VLM-Mamba, the first Vision-Language Model built entirely on...	32	Emerging	14	Python
33	ShengcaiLiao/TransMatcher [NeurIPS 2021] TransMatcher: Deep Image Matching Through Transformers for...	32	Emerging	29	Python
34	XunshanMan/MVGFormer This is the official implementation of the work presented at CVPR 2024,...	32	Emerging	68	Python
35	zubair-irshad/NeRF-MAE [ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders...	32	Emerging	104	Python
36	xmartlabs/spoter-embeddings Create embeddings from sign pose videos using Transformers	32	Emerging	32	Python
37	Merterm/Modeling-Intensification-for-SLG Public repo for the paper: "Modeling Intensification for Sign Language...	31	Emerging	14	Python
38	NeurAI-Lab/MT-SfMLearner Official code for 'Transformers in Unsupervised Structure-from-Motion' and...	31	Emerging	14	Python
39	bhanuprathap2000/sign-language-recognition This repo contains the code for sign-language-recognition as part of our...	31	Emerging	3	Jupyter Notebook
40	hukenovs/slovo Slovo: Russian Sign Language Dataset and Models	30	Emerging	83	Python
41	GregorKobsik/ImageTransformer This notebook shows a basic implementation of a transformer (decoder)...	30	Emerging	6	Jupyter Notebook
42	kyegomez/Simba A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified...	30	Emerging	28	Python
43	eslambakr/LAR-Look-Around-and-Refer This is the official implementation for our paper;"LAR:Look Around and Refer".	29	Experimental	30	C++
44	lamm-mit/FieldCompleter GAN/convolutional and Transformer models to predict missing mechanical...	29	Experimental	20	Python
45	loubnabnl/Sign-Segmentation-with-Transformers Detection of temporal boundaries in sign language videos, as part of the...	29	Experimental	9	Python
46	tthinking/MATR [IEEE TIP 2022] Official implementation of MATR: Multimodal Medical Image...	29	Experimental	99	Python
47	sauradip/STALE [ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot...	29	Experimental	113	Python
48	xiuqhou/DAPE [AAAI2026] Official implementation of the paper "DAPE: Harmonizing...	27	Experimental	6	Python
49	sauradip/fewshotQAT [BMVC 2021]: Official PyTorch implementation of : "Few Shot Temporal Action...	26	Experimental	20	Python
50	kyegomez/SimpleMamba Implementation of a modular, high-performance, and simplistic mamba for...	26	Experimental	40	Python
51	exitudio/GaitMixer Official repository for "GaitMixer: Skeleton-based Gait Representation...	25	Experimental	26	Python
52	icon-lab/TranSMS Official Implementation of Transformers for System Matrix Super-resolution (TranSMS)	24	Experimental	4	Python
53	musialski-lab/LayoutEnhancer Source code for the Paper: Layout Enahancer	23	Experimental	4	Python
54	AshutoshKulkarni4998/AIDTransformer Inference code for "Aerial Image Dehazing with Attentive Deformable...	22	Experimental	21	Python
55	albrateanu/KANT [Sensors 2025] Enhancing Low-Light Images with Kolmogorov–Arnold Networks in...	22	Experimental	9	Python
56	mabdn/feasible-interpretable-trajectory-prediction A Transformer neural network for autonomous driving to predict the future...	22	Experimental	6	Python
57	mustafa1728/Person-Re-ID Experiments on some existing Re-ID methods on a different dataset with...	22	Experimental	1	Jupyter Notebook
58	artem-gorodetskii/TransPix2Pix Rethinking the Pix2Pix architecture with attention mechanisms and transformers.	22	Experimental	21	Python
59	LookUpMark/dylem-grid DYLEM-GRID is a deep learning project for dynamic hand gesture recognition...	22	Experimental	1	Jupyter Notebook
60	RisabBiswas/T2T-BinFormer SOTA Document Image Enhancement - T2T-BinFormer: Effective Document Image...	21	Experimental	24	Python
61	arafathosense/Real-Time-Face-Glitch-Effect-Controlled-by-Hand-Gestures A real-time interactive computer vision art project using OpenCV. Control a...	21	Experimental	—	Python
62	Abdullah-Shah-26/Sign-Cast Real-time AI-powered voice-to-sign language translator. Converts speech to...	21	Experimental	—	TypeScript
63	HowieMa/PPT [ECCV 2022] "PPT: token-Pruned Pose Transformer for monocular and multi-view...	21	Experimental	63	Python
64	Microsatellites-and-Space-Microsystems/pose_estimation_domain_gap Two methods for solving domain gap in satellite pose estimation in space...	21	Experimental	9	Jupyter Notebook
65	gmongaras/2Mamba2Furious Code for the paper "2Mamba2Furious: Linear in complexity, competitive in accuracy"	20	Experimental	3	Jupyter Notebook
66	freddxvill/Proyecto_Traductor_de_la_LSB Traductor de Lengua de Señas Boliviana (LSB) a texto utilizando redes...	20	Experimental	—	Jupyter Notebook
67	zwh0527/AGRNet Code for "Mining Global Relativity Consistency without Neighborhood Modeling...	19	Experimental	3	Python
68	aliebayani/TransGAN-DX A Hybrid Transformer-GAN Approach for Cardiovascular Disease Diagnosis	19	Experimental	3	Python
69	anupvna/street-view-geolocation Multi-view Deep Learning pipeline using PyTorch to predict global...	19	Experimental	—	Jupyter Notebook
70	GregorKobsik/Octree-Transformer Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically...	19	Experimental	18	Python
71	junayed-hasan/spontaneous-smile-recognition A deep learning framework for distinguishing spontaneous from posed smiles...	19	Experimental	3	Python
72	tthinking/SETFusion [PR 2026] Official implementation of SETFusion: A Semantic Transformer for...	18	Experimental	1	—
73	rukmini-17/scalable-sequence-modeling Comparative analysis of Mamba vs. Transformers trained from scratch....	17	Experimental	—	Jupyter Notebook
74	codedmachine111/Image_generation_using_transformers_in_GANs Image Generation using Transformers in GANs	17	Experimental	1	Python
75	ImKeTT/ReSee [EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual...	13	Experimental	13	Python
76	botmahn/slowfast An unofficial pytorch implementation of "Early Anticipation of Driving...	13	Experimental	—	Python
77	fabiosilva781/top-cvpr-2025-papers 🌟 Discover top CVPR 2025 papers for insightful research in computer vision,...	13	Experimental	—	—
78	Ricardosc97/T-PIE Pedestrian Intention Estimation using stacked Transformers Encoders	12	Experimental	6	Python
79	bihani-g/LASeR Code and Analysis for our paper titled 'Low Anisotropy Sense Retrofitting...	11	Experimental	—	Python
80	tayo4christ/transformer-gesture Real-time gesture recognition system using Vision Transformers, ONNX, and...	11	Experimental	2	Python
81	aditi184/Person_Re-Identification Person ReIdentification using Locally Aware Transformers	11	Experimental	3	Jupyter Notebook
82	tthinking/EAT [IEEE TMM 2025] Official implementation of EAT: Multi-Exposure Image Fusion...	11	Experimental	3	Python
83	Geetanshu0410/Gesture-Bridge Sign Language Translator GestureBridge is a cutting-edge AI-driven system...	10	Experimental	1	Python
84	harshavardhan-patil/where-am-i Transformer backed geo-localizer to find an address in the USA based on...	10	Experimental	2	Jupyter Notebook
85	retkowsky/synthetic_images Synthetic images with Transformers	10	Experimental	2	Jupyter Notebook

Comparisons in this category

MeshAnything and MeshAnythingV2 (44 vs 44)