ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection
A comprehensive list [Hi-SAM@TPAMI'24, GoMatching@NeurIPS'24, DeepSolo(++)@ CVPR'23, DPText-DETR@AAAI'23, I3CL@IJCV'22] of our research works related to scene text detection, spotting, etc., including papers, codes.
This project offers tools to precisely identify and extract text from images and videos, including complex scenarios like curved or multilingual text and hierarchical structures (strokes, words, lines, paragraphs). It takes an image or video as input and outputs the detected text, often with bounding box or segmentation masks. This is for researchers and developers working on advanced computer vision applications involving optical character recognition (OCR) in real-world environments.
No commits in the last 6 months.
Use this if you need to perform highly accurate scene text detection, spotting, or hierarchical text segmentation from various image and video sources, especially when dealing with challenging text forms.
Not ideal if you're looking for a simple, off-the-shelf OCR solution for document scanning or basic text extraction from clean images, as this focuses on complex scene text research.
Stars
93
Forks
5
Language
TeX
License
—
Category
Last pushed
Nov 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BR-IDL/PaddleViT
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
pathak22/unsupervised-video
[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web
IBM/CrossViT
Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
NVlabs/GCVit
[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
ViTAE-Transformer/ViTDet
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object...