Word Stemming Stemmers NLP Tools

Tools and libraries for reducing words to their root or base form through stemming algorithms across various languages. Includes language-specific stemmers, Porter stemming implementations, and multilingual stemming frameworks. Does NOT include lemmatization, morphological analysis beyond stemming, or general text normalization.

There are 56 word stemming stemmers tools tracked. 6 score above 50 (established tier). The highest-rated is hplt-project/sacremoses at 68/100 with 495 stars.

Get all 56 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=word-stemming-stemmers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 hplt-project/sacremoses

Python port of Moses tokenizer, truecaser and normalizer

68
Established
2 Blake-Madden/OleanderStemmingLibrary

Porter stemming library (C++)

54
Established
3 adbar/simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

51
Established
4 htaghizadeh/PersianStemmer-Python

PersianStemmer-Python

51
Established
5 michmech/lemmatization-lists

Machine-readable lists of lemma-token pairs in 23 languages.

50
Established
6 winkjs/wink-porter2-stemmer

Javascript Implementation of Porter Stemmer Algorithm V2 by Dr Martin F Porter

50
Established
7 damzaky/sastrawijs

Indonesian language stemmer. Javascript port of PHP Sastrawi project.

49
Emerging
8 bastienbot/nlp-js-tools-french

POS Tagger, lemmatizer and stemmer for french language in javascript

49
Emerging
9 sorenlind/lemmy

🀘Lemmy is a lemmatizer for Danish πŸ‡©πŸ‡° and Swedish πŸ‡ΈπŸ‡ͺ

47
Emerging
10 WZBSocialScienceCenter/germalemma

A lemmatizer for German language text

47
Emerging
11 donderom/stemerge

A collection of stemmers in Erlang 🌱

46
Emerging
12 winkjs/wink-lemmatizer

English lemmatizer

44
Emerging
13 FinNLP/lemmatizer

πŸ“¦ English word lemmatizer

44
Emerging
14 deniskyashif/ssfst

Rewrite text in linear time.

44
Emerging
15 master/spark-stemming

Spark MLlib wrapper for the Snowball framework

42
Emerging
16 LeonieWeissweiler/CISTEM

Stemmer for German

41
Emerging
17 yohasebe/lemmatizer

Lemmatizer for text in English. Inspired by Python's...

40
Emerging
18 xiamx/gen_fst

Elixir module that implements a generic finite state transducer with...

39
Emerging
19 xiamx/lemma

A Morphological Parser (Analyser) / Lemmatizer written in Elixir.

39
Emerging
20 luridarmawan/StemmingWord

Tools StemmingWord berbasis web, menggunakan bahasa pascal dengan framework FastPlaz

38
Emerging
21 htaghizadeh/JPersianStemmer

Persian stemmer

37
Emerging
22 writecrow/lemmatizer

A PHP library for getting a lemma from a given word, and getting a list of...

37
Emerging
23 tokenmill/snowball

Snowball version of the Porter stemmer for the Lithuanian language.

34
Emerging
24 dzieciou/pystempel

Python port of Stempel, an algorithmic stemmer for Polish language.

34
Emerging
25 Cirice/Ereina

Language rules for Persian texts

33
Emerging
26 putuwaw/linggapy

Library for Stemming Balinese Text Language

32
Emerging
27 andrianllmm/tagalog-stemmer

A Python library for Tagalog word stemming

31
Emerging
28 andrianllmm/aklanon-stemmer

A Python library for Aklanon word stemming

31
Emerging
29 anishLearnsToCode/porter-stemmer

Python Implimentation of the Famous Porter Stemmer Algorithm used in...

30
Emerging
30 zentrum-lexikographie/sfst-transduce

Python bindings for SFST focusing on transducer usage

30
Emerging
31 htaghizadeh/PersianStemmer

A New Rule-Based Persian Stemmer Using Regular Expression

30
Emerging
32 naomilago/pt_lemmatizer

This repo aims to store code for a Portuguese Lemmatizer, a PyPI package.

30
Emerging
33 kampsy/gwizo

Simple Go implementation of the Porter Stemmer algorithm with powerful features.

30
Emerging
34 greenat92/arabicstemmer_frontend

frontend web app for snowball arabic stemmer algorithm

29
Experimental
35 stdlib-js/nlp-porter-stemmer

Extract the stem of a given word.

29
Experimental
36 mshka/farsi_processor

Farsi processor is a Ruby gem to process (stem and normalize) Persian/Farsi text

28
Experimental
37 SeekStorm/snowball-stemmers-rs

snowball_stemmers_rs: a Snowball stemmer in 38 languages, in Rust

26
Experimental
38 joom/Divan.hs

Ottoman Divan poetry vezin checker in Haskell!

26
Experimental
39 Flight-School/lemma

A command-line utility that lemmatizes words in natural language text.

23
Experimental
40 domPatera/stemmer-bundle

This bundle integrates the dompat/stemmer library into Symfony. It provides...

22
Experimental
41 domPatera/stemmer

PHP Library for word stemming. This library helps reduce words to their base...

22
Experimental
42 N8Brooks/snowball

β›„ Snowball stemmers for Deno.

20
Experimental
43 openderocknlp/extract-lemmatized-nonstop-words

Extracts a pure list of stemmed words of a text filtered by stop words

20
Experimental
44 antonbaumann/german-go-stemmer

An efficient implementation of the German porter-stemming algorithm in Golang.

20
Experimental
45 ancatmara/early-irish-lemmatizer

A DIL-based lemmatizer for Early Irish data.

19
Experimental
46 Wollaston/ArabicStemmer

A small web app that uses NLTK's Arabic stemming algorithms to identify the...

17
Experimental
47 olga-black/truecase_german

A program for truecasing German text with incorrect capitalization

17
Experimental
48 golang-nlp/stopwords

Stopwords module for golang

17
Experimental
49 rojvv/rustress

JavaScript library to mark stresses in Russian text.

15
Experimental
50 dariubs/persian.rb

Ruby Persian gem.

15
Experimental
51 ontypehq/libfst

Finite-state transducer library for text normalization

11
Experimental
52 FinNLP/en-stemmer

πŸ“¦ Porter stemmer implementation

11
Experimental
53 crlwingen/TagalogWordStemmer

Tagalog Word Stemmer made with Java.

11
Experimental
54 renan823/portuguese-stemmer

Go implementation of Snowball Portuguese Stemmer

11
Experimental
55 faisaltareque/Multilingual-Sentence-Tokenizer

This Python package is designed for tokenizing sentences in over 40...

11
Experimental
56 faisaltareque/Multilingual-Stemmer

Python package for stemming words in 15+ different languages. It is a...

10
Experimental

Comparisons in this category