lh0x00/docsifer
Docsifer is a powerful tool for converting various data formats into Markdown for applications such as indexing, text analysis, and more. It supports PDF, PowerPoint, Word, Excel, Images, Audio, HTML, and other text-based formats, and leverages LLMs to enhance performance.
This tool helps content managers, researchers, or data analysts transform diverse information from PDFs, presentations, Word documents, Excel sheets, images, or audio files into a standardized Markdown format. It takes your raw files and outputs clean, structured Markdown, making it easy to prepare data for indexing, analysis, or content publishing. You'd use this if you need to unify many different document types into a single, text-based format.
No commits in the last 6 months.
Use this if you regularly work with various document types and need to convert them into a consistent Markdown format for tasks like content management, knowledge base creation, or text-based data preparation.
Not ideal if your primary need is simply to view or edit documents in their original rich-text format, as this tool focuses on converting them to Markdown for other processing.
Stars
9
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/lh0x00/docsifer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AmadeusITGroup/docs2vecs
CLI that helps with docs splitting, embedding and exposing them in a seamless manner
in-c0/updAPI
Free, open-source collection of latest public API documentations - Update LLM's knowledge base...
AlexisBalayre/RagDocs
An AI-powered search engine to interact with documentation using RAG and local LLMs. Privately...
LikithMeruvu/Framework-Docs-AI
Framework Docs AI is a powerful SaaS solution for managing framework documentation. It...
dhruvkshah75/docstream
Turn static PDF archives into an interactive, searchable AI knowledge base