Stopword Lists Datasets NLP Tools

Collections of stopword lists and datasets for removing common words across languages. Includes pre-compiled stopword collections, language-specific stopword resources, and tools for generating stopword lists. Does NOT include general text preprocessing frameworks, stemming/lemmatization tools, or broader NLP preprocessing pipelines.

There are 36 stopword lists datasets tools tracked. 1 score above 50 (established tier). The highest-rated is Alir3z4/python-stop-words at 65/100 with 159 stars.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=stopword-lists-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 Alir3z4/python-stop-words

Get list of common stop words in various languages in Python

65
Established
2 hklemp/dotnet-stop-words

Get list of common stop words in various languages in dotnet

46
Emerging
3 igorbrigadir/stopwords

Default English stopword lists from many different sources

43
Emerging
4 skupriienko/Ukrainian-Stopwords

the list of ~2000 ukrainian stopwords (with numbers)

43
Emerging
5 stdlib-js/datasets-savoy-stopwords-fr

A list of French stop words.

42
Emerging
6 eklem/stopword-trainer

A module for creating stopword lists for any language, based on a set of documents.

41
Emerging
7 stdlib-js/datasets-cmudict

The Carnegie Mellon Pronouncing Dictionary (CMUdict).

40
Emerging
8 skupriienko/Ukrainian-Sentiment-Analysis

The list of Ukrainian words for sentiment analysis and NLP

37
Emerging
9 egorsmkv/ukrainian-accentor

Add accents to words in the Ukrainian language

36
Emerging
10 Sashank222222/massive-english-word-list

πŸ“š Explore a comprehensive English word list with over 68,000 entries,...

33
Emerging
11 pharo-ai/stopwords

Load the stopwords that you need in Pharo

33
Emerging
12 stdlib-js/datasets-savoy-stopwords-por

A list of Portuguese stop words.

32
Emerging
13 stdlib-js/datasets-liu-positive-opinion-words-en

A list of positive opinion words.

32
Emerging
14 contactsunny/RemoveStopWordsInJavaPOC

This is a simple Spring Boot project which removes stop words from a text file.

29
Experimental
15 stdlib-js/datasets-stopwords-en

A list of English stop words.

29
Experimental
16 stdlib-js/datasets-savoy-stopwords-ger

A list German stop words.

29
Experimental
17 stdlib-js/datasets-liu-negative-opinion-words-en

A list of negative opinion words.

29
Experimental
18 stdlib-js/datasets-savoy-stopwords-sp

A list of Spanish stop words.

29
Experimental
19 stdlib-js/datasets-savoy-stopwords-it

A list of Italian stop words.

29
Experimental
20 Rayraegah/adjectives

A data dump of all adjectives in English language

29
Experimental
21 Helsinki-NLP/UkrainianLT

A collection of links to Ukrainian language tools

29
Experimental
22 vikasing/news-stopwords

A huge list of stopwords collected from millions of news articles

28
Experimental
23 kavgan/stop-words

Stop word lists

28
Experimental
24 lang-uk/ukrainian-word-stress-dictionary

Dictionary of word stresses in the Ukrainian language πŸ‡ΊπŸ‡¦

25
Experimental
25 Vidito/norstop

Norstop is a lightweight, zero-dependency Python library to remove Norwegian...

23
Experimental
26 aeleraqi/arabic-stopwords

This repository contains a comprehensive list of Arabic stopwords.

22
Experimental
27 Theodotus1243/ukrainian-accentor-transformer

Add accents to words in the Ukrainian language

20
Experimental
28 olastor/german-word-frequencies

Simple word to frequency mappings for the german language based on text...

19
Experimental
29 ynsrc/german-categorized-wordlist

German Categorized Wordlist Project

19
Experimental
30 latincy/verba

verba.txt - A Latin word list in the style of Unix /usr/share/dict/words

19
Experimental
31 AidaLog/Common-Swahili-stopwords

This curated collection brings together a dataset of common Swahili...

17
Experimental
32 bimarakajati/Javanese-and-Sundanese-Stopwords

This project aims to provide stopwords for the Javanese and Sundanese...

16
Experimental
33 raccoon-hero/uk-dictionary

A paradigm-based morphological dictionary of the Ukrainian language. Built...

14
Experimental
34 Jey-37/sum20

Π‘Π»ΠΎΠ²Π½ΠΈΠΊ ΡƒΠΊΡ€Π°Ρ—Π½ΡΡŒΠΊΠΎΡ— ΠΌΠΎΠ²ΠΈ Ρƒ 20 Ρ‚ΠΎΠΌΠ°Ρ… Ρƒ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Ρ– JSON

12
Experimental
35 lang-uk/fasttext-vectors-uk

Word representation for Ukrainian with fastText

10
Experimental
36 semnan-university-ai/nlp-stopwords

This is a comprehensive stopwords for natural language processing and text mining.

10
Experimental