sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

/ 100

Established

This tool helps researchers and engineers working with document image processing by creating realistic, 'dirty' versions of clean digital documents. It takes a pristine digital document image and outputs many variations that look like they've been printed, faxed, scanned, or copied, complete with smudges, low resolution, or paper imperfections. Anyone training AI models to extract information from real-world scanned or photographed documents will find this useful.

512 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to generate a vast dataset of realistic, degraded document images from clean originals to train machine learning models for document analysis or restoration.

Not ideal if you're looking for a simple tool to clean up existing noisy documents; this project focuses on generating noise, not removing it.

document-image-analysis ocr-training-data digital-document-archiving image-processing-pipelines

Stale 6m

Maintenance 2 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 18 / 25

How are scores calculated?

Stars

512

Forks

Language

Python

License

MIT

Related tools

imtishalch/free-augmentcode

🧹 Clean AugmentCode data with this tool, enabling multiple account logins on one device without...

Explore Generative AI Tools

All categories Trending Generative AI directory Insights