joanrod/ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
This project helps researchers and technical writers improve the quality of generated images containing text, such as diagrams, charts, and figures from academic papers. It takes an input image containing text and processes it to produce a reconstructed image where the text is much clearer and more readable. This tool is designed for anyone working with synthetic image generation, particularly those focusing on technical illustrations and diagrams where text legibility is critical.
No commits in the last 6 months.
Use this if you are generating images that include text, like diagrams or scientific figures, and need to ensure the text within those images is highly readable and well-defined.
Not ideal if your primary goal is generating natural images without embedded text, as its specialized focus is on text clarity within technical images.
Stars
83
Forks
2
Language
Python
License
—
Category
Last pushed
Jan 30, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/joanrod/ocr-vqgan"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OBA-Research/VAAS
VAAS is an inference-first, research-driven library for image integrity analysis. It integrates...
deepmancer/clip-object-detection
Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.
ABaldrati/CLIP4Cir
[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented...
kyegomez/CLIPQ
A simple implementation of a CLIP that splits up an image into quandrants and then gets the...
IvanAer/G-Universal-CLIP
4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level...