DEK11/MoreNLP

Capabilities of StanfordNLP and OpenNLP on Spark

/ 100

Experimental

This project helps data scientists and NLP researchers who need to process large volumes of text efficiently. It takes raw text data and provides processed linguistic features like tokenized words, parts of speech, and recognized entities. You would use this to prepare text for further analysis or machine learning tasks.

No commits in the last 6 months.

Use this if you need a flexible way to apply standard Natural Language Processing (NLP) techniques to large text datasets, leveraging either Stanford NLP or OpenNLP within a Spark environment.

Not ideal if you require custom model training for advanced NLP tasks, or if your primary need is for stop-word removal without building a custom list.

text-analysis natural-language-processing big-data-text-processing linguistic-feature-extraction

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Scala

License

Apache-2.0

Higher-rated alternatives

apache/opennlp

Apache OpenNLP

stanfordnlp/CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing,...

stanfordnlp/python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.

dkpro/dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA...

apache/opennlp-sandbox

Apache OpenNLP Sandbox

Explore NLP Tools

All categories Trending NLP directory Insights