master/spark-stemming

Spark MLlib wrapper for the Snowball framework

42
/ 100
Emerging

This tool helps data engineers and data scientists clean up text data by reducing words to their root form across many languages. It takes in raw text, often as part of a larger data processing workflow, and outputs a version where inflected words like "running," "ran," and "runs" all become "run." This is especially useful for anyone building search engines, recommendation systems, or sentiment analysis tools.

No commits in the last 6 months.

Use this if you are processing large volumes of text data in Apache Spark and need to standardize words to their base form to improve analysis or search accuracy.

Not ideal if you are working with text in a language not supported by the Snowball framework or if you don't use Apache Spark for your data processing.

information-retrieval natural-language-processing text-analytics data-preprocessing search-engine-optimization
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

34

Forks

20

Language

Java

License

BSD-2-Clause

Last pushed

Nov 27, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/master/spark-stemming"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.