magicyuan876/mineru-tianshu

天枢 - 企业级 AI 一站式数据预处理平台 | PDF/Office转Markdown | 支持MCP协议AI助手集成 | Vue3+FastAPI全栈方案 | 文档解析 | 多模态信息提取

55
/ 100
Established

This platform helps businesses convert various types of raw data—like PDFs, Word documents, videos, audio, and even specialized bioinformatics files—into structured, AI-ready formats such as Markdown and JSON. It takes diverse unstructured data as input and produces standardized, searchable content. Data scientists, AI engineers, and knowledge managers would use this to prepare large datasets for AI models and analytics.

535 stars.

Use this if you need to automatically process and extract structured information from a large volume of different document types, media files, or scientific data to feed into AI applications or knowledge bases.

Not ideal if you only need to process simple text files or require highly specialized, niche data extraction capabilities not covered by the supported formats.

data-preprocessing AI-data-preparation document-intelligence multimodal-data-extraction knowledge-management
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

535

Forks

66

Language

Python

License

Apache-2.0

Last pushed

Feb 27, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mcp/magicyuan876/mineru-tianshu"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.