Uli-Z/autoPDFtagger
autoPDFtagger is a Python tool designed for efficient home-office organization, focusing on digitizing and organizing both digital and paper-based documents. By automating the tagging of PDF files, including image-rich documents and scans of varying quality, it aims to streamline the organization of digital archives.
This tool helps individuals and small businesses organize their digital and scanned PDF documents, such as presentations or paper archives. It takes your existing PDFs, even low-quality scans or image-heavy files, and automatically adds standard metadata like titles, summaries, tags, creation dates, and authors. The output is an organized archive with enriched files and an optional JSON or CSV database for easy review and integration.
Use this if you need to quickly classify, sort, and tag a large collection of diverse PDF documents, including scans, without manually opening each one.
Not ideal if you need a full document management system with advanced features like version control, collaboration, or complex workflow automation.
Stars
19
Forks
—
Language
Python
License
GPL-3.0
Category
Last pushed
Nov 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Uli-Z/autoPDFtagger"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
GoogleCloudPlatform/document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on...
aws-solutions/document-understanding-solution
Example of integrating & using Amazon Textract, Amazon Comprehend, Amazon Comprehend Medical,...
naiveHobo/InvoiceNet
Deep neural network to extract intelligent information from invoice documents.
aphp/edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides...