jhcho99/CoFormer
[CVPR'22] Official PyTorch Implementation of "Collaborative Transformers for Grounded Situation Recognition"
This project helps computer vision researchers and AI developers advance the task of 'grounded situation recognition.' It takes an image as input and identifies the main activity (verb), the entities involved (nouns), and their precise locations (bounding boxes). Researchers focusing on improving visual understanding of complex scenes will find this useful.
No commits in the last 6 months.
Use this if you are developing or evaluating AI models for scene understanding and need to identify actions, objects, and their spatial relationships within images.
Not ideal if you are looking for an out-of-the-box solution for general image classification or object detection, as this is a research-oriented implementation.
Stars
50
Forks
7
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 09, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jhcho99/CoFormer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the hearing-impaired and translate...
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
autonomousvision/transfuser
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...