sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
This tool helps researchers and engineers working with document image processing by creating realistic, 'dirty' versions of clean digital documents. It takes a pristine digital document image and outputs many variations that look like they've been printed, faxed, scanned, or copied, complete with smudges, low resolution, or paper imperfections. Anyone training AI models to extract information from real-world scanned or photographed documents will find this useful.
512 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to generate a vast dataset of realistic, degraded document images from clean originals to train machine learning models for document analysis or restoration.
Not ideal if you're looking for a simple tool to clean up existing noisy documents; this project focuses on generating noise, not removing it.
Stars
512
Forks
60
Language
Python
License
MIT
Category
Last pushed
Jul 20, 2025
Commits (30d)
0
Dependencies
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/sparkfish/augraphy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.