jhcho99/CoFormer

[CVPR'22] Official PyTorch Implementation of "Collaborative Transformers for Grounded Situation Recognition"

/ 100

Emerging

This project helps computer vision researchers and AI developers advance the task of 'grounded situation recognition.' It takes an image as input and identifies the main activity (verb), the entities involved (nouns), and their precise locations (bounding boxes). Researchers focusing on improving visual understanding of complex scenes will find this useful.

No commits in the last 6 months.

Use this if you are developing or evaluating AI models for scene understanding and need to identify actions, objects, and their spatial relationships within images.

Not ideal if you are looking for an out-of-the-box solution for general image classification or object detection, as this is a research-oriented implementation.

computer-vision scene-understanding image-analysis AI-research object-recognition

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

NVlabs/MambaVision

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

sign-language-translator/sign-language-translator

Python library & framework to build custom translators for the hearing-impaired and translate...

kyegomez/Jamba

PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"

autonomousvision/transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...

kyegomez/MultiModalMamba

A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...

Explore Transformer Models

All categories Trending Transformer directory Insights