eellak/glossAPI
Greek Dataset Production from PDF+
This tool helps researchers and institutions convert academic PDFs, especially those in Greek, into clean, structured Markdown and JSON. It takes a collection of PDF documents and outputs well-organized text, making it easier to analyze, index, or use for further research. The primary users are researchers, librarians, and data scientists working with academic literature and requiring high-quality text extraction.
128 stars. Available on PyPI.
Use this if you need to reliably extract content from academic PDFs, including those with complex layouts or in Greek, and transform it into a clean, machine-readable format.
Not ideal if you only need basic text extraction from simple documents or are not working with a large corpus where automated cleaning and structuring are crucial.
Stars
128
Forks
29
Language
Python
License
—
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Dependencies
11
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/eellak/glossAPI"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
pymupdf/langchain-pymupdf4llm
An integration package connecting PyMuPDF4LLM to LangChain
KalyanM45/DocGenius-Revolutionizing-PDFs-with-AI
This is a Python application that allows you to load a PDF and ask questions about it using...
mozilla-ai/structured-qa
Blueprint by Mozilla.ai for answering questions about structured documents
alejandro-ao/langchain-ask-pdf
An AI-app that allows you to upload a PDF and ask questions about it. It uses OpenAI's LLMs to...
leehanchung/llm-pdf-qa-workshop
Introduction to LLM App Development Workshop: PDF Q&A App using OpenAI, Langchain, and Chainlit