nanxstats/pdf-word-extraction
Extract meaningful words from a collection of PDF documents and count their frequencies
This tool helps you quickly understand the main topics and common terms across many PDF documents. You provide a folder of PDF files, and it gives you a list of important words from those documents, along with how often each word appears. This is perfect for researchers, analysts, or anyone needing to grasp the essence of large document collections.
No commits in the last 6 months.
Use this if you need to identify key themes or perform content analysis on a large set of PDF documents, such as research papers, reports, or contracts.
Not ideal if you need to extract specific data fields from PDFs, convert PDFs to other formats, or perform detailed layout-sensitive analysis.
Stars
7
Forks
2
Language
Python
License
—
Category
Last pushed
Jun 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nanxstats/pdf-word-extraction"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NatLibFi/Annif
Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
explosion/displacy
:boom: displaCy.js: An open-source NLP visualiser for the modern web
hshindo/react-nlp
Visualization of Natural Language Processing for React
microsoft/browsecloud
A web app to create and browse text visualizations for automated customer listening.
microsoft/VisTalk
A JavaScript toolkit for Natural Language-based Visualization Authoring