jarobyte91/post_ocr_correction

Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"

34
/ 100
Emerging

This project helps improve the accuracy of text extracted from scanned documents by Optical Character Recognition (OCR) systems. It takes the potentially error-filled text output from an OCR system and provides a corrected version, making it more reliable for further use. Researchers and professionals who work with historical documents, digitized archives, or large volumes of scanned text can use this to enhance data quality.

No commits in the last 6 months.

Use this if you need to correct errors in text that has already been processed by an OCR system and want to improve the overall accuracy for better analysis or storage.

Not ideal if you are looking for an OCR system itself, as this tool focuses on refining the output of existing OCR processes rather than performing the initial text recognition.

document-digitization text-correction data-quality archival-processing digital-humanities
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

39

Forks

5

Language

Jupyter Notebook

License

MIT

Last pushed

Dec 02, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jarobyte91/post_ocr_correction"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.