joanrod/ocr-vqgan

OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers

21
/ 100
Experimental

This project helps researchers and technical writers improve the quality of generated images containing text, such as diagrams, charts, and figures from academic papers. It takes an input image containing text and processes it to produce a reconstructed image where the text is much clearer and more readable. This tool is designed for anyone working with synthetic image generation, particularly those focusing on technical illustrations and diagrams where text legibility is critical.

No commits in the last 6 months.

Use this if you are generating images that include text, like diagrams or scientific figures, and need to ensure the text within those images is highly readable and well-defined.

Not ideal if your primary goal is generating natural images without embedded text, as its specialized focus is on text clarity within technical images.

academic publishing technical illustration diagram generation computer vision research scientific communication
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 4 / 25

How are scores calculated?

Stars

83

Forks

2

Language

Python

License

Last pushed

Jan 30, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/joanrod/ocr-vqgan"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.