lh0x00/docsifer

Docsifer is a powerful tool for converting various data formats into Markdown for applications such as indexing, text analysis, and more. It supports PDF, PowerPoint, Word, Excel, Images, Audio, HTML, and other text-based formats, and leverages LLMs to enhance performance.

29
/ 100
Experimental

This tool helps content managers, researchers, or data analysts transform diverse information from PDFs, presentations, Word documents, Excel sheets, images, or audio files into a standardized Markdown format. It takes your raw files and outputs clean, structured Markdown, making it easy to prepare data for indexing, analysis, or content publishing. You'd use this if you need to unify many different document types into a single, text-based format.

No commits in the last 6 months.

Use this if you regularly work with various document types and need to convert them into a consistent Markdown format for tasks like content management, knowledge base creation, or text-based data preparation.

Not ideal if your primary need is simply to view or edit documents in their original rich-text format, as this tool focuses on converting them to Markdown for other processing.

content-management data-preparation knowledge-indexing text-analysis document-conversion
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

9

Forks

1

Language

Python

License

MIT

Last pushed

Mar 03, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/lh0x00/docsifer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.