jhcho99/GSRTR
[BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition with Transformers"
This project helps identify and locate specific actions (verbs) and associated objects (nouns) within images. You input an image, and it outputs a description of the primary action, the key entities involved in that action, and their precise locations (bounding boxes). It's designed for researchers or developers working on advanced image understanding and computer vision applications.
No commits in the last 6 months.
Use this if you need to go beyond simple object detection and understand the full 'situation' depicted in an image, including the verb, its semantic roles, and the location of each role filler.
Not ideal if you only need to classify objects, perform basic image categorization, or if you're not comfortable working with machine learning models and datasets.
Stars
27
Forks
11
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 30, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jhcho99/GSRTR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the hearing-impaired and translate...
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
autonomousvision/transfuser
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...