eellak/glossAPI

Greek Dataset Production from PDF+

65
/ 100
Established

This tool helps researchers and institutions convert academic PDFs, especially those in Greek, into clean, structured Markdown and JSON. It takes a collection of PDF documents and outputs well-organized text, making it easier to analyze, index, or use for further research. The primary users are researchers, librarians, and data scientists working with academic literature and requiring high-quality text extraction.

128 stars. Available on PyPI.

Use this if you need to reliably extract content from academic PDFs, including those with complex layouts or in Greek, and transform it into a clean, machine-readable format.

Not ideal if you only need basic text extraction from simple documents or are not working with a large corpus where automated cleaning and structuring are crucial.

academic-research document-processing digital-humanities scientific-publishing corpus-linguistics
Maintenance 10 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

128

Forks

29

Language

Python

License

Category

pdf-qa-systems

Last pushed

Mar 10, 2026

Commits (30d)

0

Dependencies

11

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/eellak/glossAPI"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.