th1nhhdk/local_ai_ocr
An local, offline (after initial setup), portable OCR software that can process images and PDF files, using DeepSeek-OCR AI (running directly on your machine).
This tool helps you convert text from images and PDF documents into editable text, all on your own computer without needing an internet connection. You input scanned documents, photos, or multi-page PDFs, and it outputs formatted text that you can paste directly into applications like Microsoft Word, preserving layout. It's designed for anyone who needs to digitize information from physical documents or image-based files privately and securely.
713 stars.
Use this if you need to extract text from various image and PDF files, especially if data privacy is critical and you prefer processing everything offline on your own machine.
Not ideal if you're looking for a cloud-based OCR service, frequently process extremely complex layouts that might occasionally cause the software to get stuck, or have very old hardware that doesn't meet the recommended specifications.
Stars
713
Forks
181
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/th1nhhdk/local_ai_ocr"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...
Dicklesworthstone/llm_aided_ocr
Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...
emcf/thepipe
Get clean data from tricky documents, powered by vision-language models ⚡
langstruct-ai/langstruct
Extract structured data from any content using LLMs.
hashangit/Extract2MD
Extract2MD is a powerful and versatile AI-enabled client-side JavaScript library for extracting...