DEK11/MoreNLP
Capabilities of StanfordNLP and OpenNLP on Spark
This project helps data scientists and NLP researchers who need to process large volumes of text efficiently. It takes raw text data and provides processed linguistic features like tokenized words, parts of speech, and recognized entities. You would use this to prepare text for further analysis or machine learning tasks.
No commits in the last 6 months.
Use this if you need a flexible way to apply standard Natural Language Processing (NLP) techniques to large text datasets, leveraging either Stanford NLP or OpenNLP within a Spark environment.
Not ideal if you require custom model training for advanced NLP tasks, or if your primary need is for stop-word removal without building a custom list.
Stars
7
Forks
1
Language
Scala
License
Apache-2.0
Category
Last pushed
Sep 23, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/DEK11/MoreNLP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
apache/opennlp
Apache OpenNLP
stanfordnlp/CoreNLP
CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing,...
stanfordnlp/python-stanford-corenlp
Python interface to CoreNLP using a bidirectional server-client interface.
dkpro/dkpro-core
Collection of software components for natural language processing (NLP) based on the Apache UIMA...
apache/opennlp-sandbox
Apache OpenNLP Sandbox