sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

55
/ 100
Established

This tool helps researchers and engineers working with document image processing by creating realistic, 'dirty' versions of clean digital documents. It takes a pristine digital document image and outputs many variations that look like they've been printed, faxed, scanned, or copied, complete with smudges, low resolution, or paper imperfections. Anyone training AI models to extract information from real-world scanned or photographed documents will find this useful.

512 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to generate a vast dataset of realistic, degraded document images from clean originals to train machine learning models for document analysis or restoration.

Not ideal if you're looking for a simple tool to clean up existing noisy documents; this project focuses on generating noise, not removing it.

document-image-analysis ocr-training-data digital-document-archiving image-processing-pipelines
Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 18 / 25

How are scores calculated?

Stars

512

Forks

60

Language

Python

License

MIT

Last pushed

Jul 20, 2025

Commits (30d)

0

Dependencies

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/sparkfish/augraphy"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.