nanxstats/pdf-word-extraction

Extract meaningful words from a collection of PDF documents and count their frequencies

25
/ 100
Experimental

This tool helps you quickly understand the main topics and common terms across many PDF documents. You provide a folder of PDF files, and it gives you a list of important words from those documents, along with how often each word appears. This is perfect for researchers, analysts, or anyone needing to grasp the essence of large document collections.

No commits in the last 6 months.

Use this if you need to identify key themes or perform content analysis on a large set of PDF documents, such as research papers, reports, or contracts.

Not ideal if you need to extract specific data fields from PDFs, convert PDFs to other formats, or perform detailed layout-sensitive analysis.

document-analysis text-mining content-auditing research-analysis information-discovery
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

7

Forks

2

Language

Python

License

Last pushed

Jun 23, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nanxstats/pdf-word-extraction"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.