prismformore/Multi-Task-Transformer

Code of ICLR2023 paper "TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding" and ECCV2022 paper "Inverted Pyramid Multi-task Transformer for Dense Scene Understanding"

/ 100

Emerging

This project helps computer vision researchers and AI practitioners extract multiple types of information simultaneously from images, such as identifying objects, segmenting areas, and estimating depth. You input a single image, and it outputs several detailed maps or segmentations, each highlighting a different characteristic of the scene. This is ideal for those developing advanced perception systems for robotics, autonomous vehicles, or surveillance.

327 stars. No commits in the last 6 months.

Use this if you need to perform several dense scene understanding tasks (like object detection, semantic segmentation, and depth estimation) from a single image efficiently.

Not ideal if your focus is on a single, highly specialized image analysis task or if you require an extremely lightweight solution for edge devices.

computer-vision autonomous-driving robotics-perception image-analysis scene-understanding

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

327

Forks

Language

Python

License

MIT

Higher-rated alternatives

pairlab/SlotFormer

Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models

ChristophReich1996/Swin-Transformer-V2

PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution"...

DirtyHarryLYL/Transformer-in-Vision

Recent Transformer-based CV and related works.

kyegomez/MegaVIT

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

uakarsh/latr

Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal...

Explore Transformer Models

All categories Trending Transformer directory Insights