kkakkkka/ETRIS
[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
This helps computer vision researchers and practitioners efficiently identify and outline specific objects within images based on natural language descriptions. You provide an image and a text prompt (e.g., "the red car on the left"), and it outputs a precise mask highlighting that object. This is useful for anyone working with automated image analysis and semantic understanding.
138 stars. No commits in the last 6 months.
Use this if you need to precisely segment objects from images using descriptive text without extensive model retraining.
Not ideal if you require object detection or image classification without specific pixel-level segmentation, or if you don't have programming experience.
Stars
138
Forks
6
Language
Python
License
MIT
Category
Last pushed
Jun 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/kkakkkka/ETRIS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BR-IDL/PaddleViT
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
pathak22/unsupervised-video
[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web
IBM/CrossViT
Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
NVlabs/GCVit
[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
ViTAE-Transformer/ViTDet
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object...