michaelmml/NLP-Information-Extraction
Automated PDF and text processing with Spacy and NLTK; information extraction from text based on grammatical structure; deployed on extracted raw search data
This tool helps researchers, analysts, or business intelligence professionals automatically process large volumes of text, such as company transcripts, patent documents, or news articles. It takes raw text or PDFs as input and extracts key information like topics, keywords, named entities (like company names), and significant phrases. The output helps you quickly understand content, identify trends, and summarize lengthy documents without manual review.
No commits in the last 6 months.
Use this if you need to quickly extract structured insights and key information from large unstructured text datasets like financial reports, legal documents, or industry news.
Not ideal if you need to perform sentiment analysis, question-answering, or generate new text rather than extract existing information.
Stars
16
Forks
1
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Apr 01, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/michaelmml/NLP-Information-Extraction"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ziqizhang/jate
JATE - Just Automatic Term Extraction (in Python)
mcs07/ChemDataExtractor
Automatically extract chemical information from scientific documents
brucewlee/lftk
[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability...
mmmaurer/elfen
A python package to efficiently extract linguistic features for text/NLP datasets
strangetom/ingredient-parser
A tool to parse recipe ingredients into structured data