nanxstats/pdf-word-extraction

Extract meaningful words from a collection of PDF documents and count their frequencies

/ 100

Experimental

This tool helps you quickly understand the main topics and common terms across many PDF documents. You provide a folder of PDF files, and it gives you a list of important words from those documents, along with how often each word appears. This is perfect for researchers, analysts, or anyone needing to grasp the essence of large document collections.

No commits in the last 6 months.

Use this if you need to identify key themes or perform content analysis on a large set of PDF documents, such as research papers, reports, or contracts.

Not ideal if you need to extract specific data fields from PDFs, convert PDFs to other formats, or perform detailed layout-sensitive analysis.

document-analysis text-mining content-auditing research-analysis information-discovery

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

NatLibFi/Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.

explosion/displacy

:boom: displaCy.js: An open-source NLP visualiser for the modern web

hshindo/react-nlp

Visualization of Natural Language Processing for React

microsoft/browsecloud

A web app to create and browse text visualizations for automated customer listening.

microsoft/VisTalk

A JavaScript toolkit for Natural Language-based Visualization Authoring

Explore NLP Tools

All categories Trending NLP directory Insights