kotaro-kinoshita/yomitoku

YomiTokuはAIを活用した日本語文書解析エンジンを提供するPythonパッケージです。 Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

/ 100

Emerging

This tool helps Japanese businesses and researchers convert scanned or image-based Japanese documents, like reports or forms, into editable text. It takes images of documents, including those with handwriting or complex layouts, and outputs structured text in formats like HTML, Markdown, JSON, or CSV. Anyone needing to extract and organize information from Japanese document images for analysis or record-keeping would find this useful.

1,356 stars. Actively maintained with 10 commits in the last 30 days.

Use this if you need to accurately extract text and layout information from Japanese document images, including specialized layouts like vertical writing or tables, and export it into structured, machine-readable formats.

Not ideal if your primary need is to read text from signs or other non-document images, or if you consistently work with very low-resolution images.

document-management data-extraction japanese-business research-data records-digitization

No License No Package No Dependents

Maintenance 17 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

1,356

Forks

Language

Python

License

—

Higher-rated alternatives

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin,...

breezedeus/CnSTD

CnSTD: 基于 PyTorch/MXNet 的中文/英文场景文字检测（Scene Text Detection）、数学公式检测（Mathematical Formula...

githubharald/SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

felixdittrich92/OnnxTR

OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless,...

mindee/doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for...

Explore ML Frameworks

All categories Trending ML Framework directory Insights