prismformore/Multi-Task-Transformer
Code of ICLR2023 paper "TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding" and ECCV2022 paper "Inverted Pyramid Multi-task Transformer for Dense Scene Understanding"
This project helps computer vision researchers and AI practitioners extract multiple types of information simultaneously from images, such as identifying objects, segmenting areas, and estimating depth. You input a single image, and it outputs several detailed maps or segmentations, each highlighting a different characteristic of the scene. This is ideal for those developing advanced perception systems for robotics, autonomous vehicles, or surveillance.
327 stars. No commits in the last 6 months.
Use this if you need to perform several dense scene understanding tasks (like object detection, semantic segmentation, and depth estimation) from a single image efficiently.
Not ideal if your focus is on a single, highly specialized image analysis task or if you require an extremely lightweight solution for edge devices.
Stars
327
Forks
25
Language
Python
License
MIT
Category
Last pushed
Apr 24, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/prismformore/Multi-Task-Transformer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
pairlab/SlotFormer
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
ChristophReich1996/Swin-Transformer-V2
PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution"...
DirtyHarryLYL/Transformer-in-Vision
Recent Transformer-based CV and related works.
kyegomez/MegaVIT
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
uakarsh/latr
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal...