jarobyte91/post_ocr_correction
Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"
This project helps improve the accuracy of text extracted from scanned documents by Optical Character Recognition (OCR) systems. It takes the potentially error-filled text output from an OCR system and provides a corrected version, making it more reliable for further use. Researchers and professionals who work with historical documents, digitized archives, or large volumes of scanned text can use this to enhance data quality.
No commits in the last 6 months.
Use this if you need to correct errors in text that has already been processed by an OCR system and want to improve the overall accuracy for better analysis or storage.
Not ideal if you are looking for an OCR system itself, as this tool focuses on refining the output of existing OCR processes rather than performing the initial text recognition.
Stars
39
Forks
5
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Dec 02, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jarobyte91/post_ocr_correction"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin,...
breezedeus/CnSTD
CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula...
githubharald/SimpleHTR
Handwritten Text Recognition (HTR) system implemented with TensorFlow.
felixdittrich92/OnnxTR
OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless,...
mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for...