jarobyte91/post_ocr_correction

Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"

/ 100

Emerging

This project helps improve the accuracy of text extracted from scanned documents by Optical Character Recognition (OCR) systems. It takes the potentially error-filled text output from an OCR system and provides a corrected version, making it more reliable for further use. Researchers and professionals who work with historical documents, digitized archives, or large volumes of scanned text can use this to enhance data quality.

No commits in the last 6 months.

Use this if you need to correct errors in text that has already been processed by an OCR system and want to improve the overall accuracy for better analysis or storage.

Not ideal if you are looking for an OCR system itself, as this tool focuses on refining the output of existing OCR processes rather than performing the initial text recognition.

document-digitization text-correction data-quality archival-processing digital-humanities

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin,...

breezedeus/CnSTD

CnSTD: 基于 PyTorch/MXNet 的中文/英文场景文字检测（Scene Text Detection）、数学公式检测（Mathematical Formula...

githubharald/SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

felixdittrich92/OnnxTR

OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless,...

mindee/doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for...

Explore ML Frameworks

All categories Trending ML Framework directory Insights