All NLP Tools
13,598 tools ranked by quality score · Page 4 of 136
| # | Tool | Score | Tier |
|---|---|---|---|
| 301 |
gaphex/bert_experimental
code and supplementary materials for a series of Medium articles about the BERT model |
|
Established |
| 302 |
PetrKorab/Arabica
Python package for text mining of time-series data |
|
Established |
| 303 |
textlint-rule/sentence-splitter
Split {Japanese, English} text into sentences. |
|
Established |
| 304 |
kensk8er/chicksexer
A Python package for gender classification. |
|
Established |
| 305 |
OpenPecha/pybo
🦜 NLP for Tibetan, in Python. |
|
Established |
| 306 |
mpuig/spacy-lookup
Named Entity Recognition based on dictionaries |
|
Established |
| 307 |
natasha/yargy
Rule-based facts extraction for Russian language |
|
Established |
| 308 |
guotong1988/BERT-pre-training
multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism) |
|
Established |
| 309 |
mmmaurer/elfen
A python package to efficiently extract linguistic features for text/NLP datasets |
|
Established |
| 310 |
Ali-Alameer/NLP
This repository offers NLP resources & tutorials using keras/tensorflow.... |
|
Established |
| 311 |
fdalvi/NeuroX
A Python library that encapsulates various methods for neuron interpretation... |
|
Established |
| 312 |
JDongian/python-jamo
Hangul syllable decomposition and synthesis using jamo. |
|
Established |
| 313 |
yongzhuo/Pytorch-NLU
中文文本分类、序列标注工具包(pytorch),支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标... |
|
Established |
| 314 |
jaguarliuu/rookie_text2data
Dify插件 - 自然语言获取数据库数据 |
|
Established |
| 315 |
LSYS/LexicalRichness
:smile_cat: :speech_balloon: A module to compute textual lexical richness... |
|
Established |
| 316 |
PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter |
|
Established |
| 317 |
jalammar/ecco
Explain, analyze, and visualize NLP language models. Ecco creates... |
|
Established |
| 318 |
naver/claf
CLaF: Open-Source Clova Language Framework |
|
Established |
| 319 |
mikahama/uralicNLP
An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and... |
|
Established |
| 320 |
Shubxam/Nifty-500-Live-Sentiment-Analysis
Live Sentiment Analysis dashboard of NIFTY 500 universe of stocks using... |
|
Established |
| 321 |
soaxelbrooke/python-bpe
Byte Pair Encoding for Python! |
|
Established |
| 322 |
neomatrix369/nlp_profiler
A simple NLP library allows profiling datasets with one or more text... |
|
Established |
| 323 |
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN |
|
Established |
| 324 |
thalesbertaglia/enelvo
A flexible normalizer for user-generated content |
|
Established |
| 325 |
daac-tools/vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer |
|
Established |
| 326 |
sildar/potara
Multi-document summarization tool relying on ILP and sentence fusion |
|
Established |
| 327 |
stanford-oval/genienlp
GenieNLP: A versatile codebase for any NLP task |
|
Established |
| 328 |
gagan3012/keytotext
Keywords to Sentences |
|
Established |
| 329 |
explosion/spacy-transformers
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy |
|
Established |
| 330 |
jiaeyan/Jiayan
甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st... |
|
Established |
| 331 |
vgrabovets/multi_rake
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python |
|
Established |
| 332 |
natasha/natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects |
|
Established |
| 333 |
pysentimiento/pysentimiento
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks |
|
Established |
| 334 |
azooKey/AzooKeyKanaKanjiConverter
Kana-Kanji Conversion Module written in Swift, supporting Neural Kana-Kanji... |
|
Established |
| 335 |
bjascob/LemmInflect
A python module for English lemmatization and inflection. |
|
Established |
| 336 |
batzner/tensorlm
Wrapper library for text generation / language models at character and word... |
|
Established |
| 337 |
amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted... |
|
Established |
| 338 |
gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching |
|
Established |
| 339 |
andifunke/topic-labeling
The project proposes a framework to apply topic models on a text-corpus and... |
|
Established |
| 340 |
hamelsmu/ktext
Utilities for preprocessing text for deep learning with Keras |
|
Established |
| 341 |
jfilter/clean-text
🧹 Python package for text cleaning |
|
Established |
| 342 |
ku-nlp/rhoknp
Yet another Python binding for Juman++/KNP/KWJA |
|
Established |
| 343 |
google-research/turkish-morphology
A two-level morphological analyzer for Turkish. |
|
Established |
| 344 |
proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working... |
|
Established |
| 345 |
chakki-works/sumeval
Well tested & Multi-language evaluation framework for text summarization. |
|
Established |
| 346 |
massimoaria/tall
Text Analysis for aLL |
|
Established |
| 347 |
polm/unidic-py
Unidic packaged for installation via pip. |
|
Established |
| 348 |
mouseart2025/AI-Reader-V2
AI 小说分析可视化工具 — 角色关系图谱 · 地理地图 · 时间线 · 百科全书 | 支持 Ollama 本地 + 10 大云端 LLM |... |
|
Established |
| 349 |
boat-group/fancy-nlp
NLP for human. A fast and easy-to-use natural language processing (NLP)... |
|
Established |
| 350 |
urduhack/urduhack
An NLP library for the Urdu language. It comes with a lot of battery... |
|
Established |
| 351 |
vzhong/embeddings
Fast, DB Backed pretrained word embeddings for natural language processing. |
|
Established |
| 352 |
keyATM/keyATM
An R package for Keyword Assisted Topic Models |
|
Established |
| 353 |
explosion/spacy-stanza
💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy |
|
Established |
| 354 |
zhang17173/Event-Extraction
基于法律裁判文书的事件抽取及其应用,包括数据的分词、词性标注、命名实体识别、事件要素抽取和判决结果预测等内容 |
|
Established |
| 355 |
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料 |
|
Established |
| 356 |
bab2min/tomotopy
Python package of Tomoto, the Topic Modeling Tool |
|
Established |
| 357 |
Cyberbolt/Cemotion
A Chinese NLP library based on BERT for sentiment analysis and... |
|
Established |
| 358 |
affjljoo3581/canrevan
대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다. |
|
Established |
| 359 |
lefterisloukas/edgar-crawler
The only open-source toolkit that can download SEC EDGAR financial reports... |
|
Established |
| 360 |
thepushkarp/nalcos
Search Git commits in natural language |
|
Established |
| 361 |
LanguageMachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from... |
|
Established |
| 362 |
howl-anderson/seq2annotation
基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF... |
|
Established |
| 363 |
FreeDiscovery/FreeDiscovery
Web Service for E-Discovery Analytics |
|
Established |
| 364 |
syuoni/eznlp
Easy Natural Language Processing |
|
Established |
| 365 |
google-research/fool-me-twice
Game code and data for Fool Me Twice: Entailment from Wikipedia Gamification... |
|
Established |
| 366 |
dccuchile/wefe
WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework... |
|
Established |
| 367 |
darija-open-dataset/dataset
darija <-> english dataset |
|
Established |
| 368 |
ikegami-yukino/oseti
Dictionary based Sentiment Analysis for Japanese |
|
Established |
| 369 |
NorskRegnesentral/skweak
skweak: A software toolkit for weak supervision applied to NLP tasks |
|
Established |
| 370 |
giacbrd/ShallowLearn
An experiment about re-implementing supervised learning models based on... |
|
Established |
| 371 |
dnanhkhoa/python-vncorenlp
A Python wrapper for VnCoreNLP using a bidirectional communication channel. |
|
Established |
| 372 |
strangetom/ingredient-parser
A tool to parse recipe ingredients into structured data |
|
Established |
| 373 |
lunarwhite/tan-division
Chinese corpus sentiment analysis. 谭松波酒店评论中文文本情感分析 |
|
Established |
| 374 |
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial... |
|
Established |
| 375 |
Ricardokevins/Kevinpro-NLP-demo
All NLP you Need Here. 目前包含15个NLP demo的pytorch实现(大量代码借鉴于其他开源项目,原先是自己玩的,后来干脆也开源出来) |
|
Established |
| 376 |
rosette-api/java
Babel Street Analytics Client Library for Java |
|
Established |
| 377 |
tmalsburg/txl.el
Emacs extension providing direct access to DeepL's machine translation API. |
|
Established |
| 378 |
asahi417/tner
Language model fine-tuning on NER with an easy interface and cross-domain... |
|
Established |
| 379 |
Ars-Linguistica/mlconjug3
A Python library to conjugate verbs in French, English, Spanish, Italian,... |
|
Established |
| 380 |
changwookjun/nlp-paper
NLP Paper |
|
Established |
| 381 |
gerardobort/node-corenlp
CoreNLP @ NodeJS |
|
Established |
| 382 |
winkjs/wink-pos-tagger
English Part-of-speech (POS) tagger |
|
Established |
| 383 |
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке. |
|
Established |
| 384 |
SlapBot/sounder
An intent recognizing algorithm to predict the intent of a given text. |
|
Established |
| 385 |
sillsdev/machine
Machine is a natural language processing library for .NET that is focused on... |
|
Established |
| 386 |
bobxwu/TopMost
A Topic Modeling System Toolkit (ACL 2024 Demo) |
|
Established |
| 387 |
jblake1965/eluciDoc
Screens legal text and extracts sentences containing user input party... |
|
Established |
| 388 |
pd3f/pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based |
|
Established |
| 389 |
gagan3012/PolyDeDupe
PolyDeDupe: Multi-Lingual Data Deduplication |
|
Established |
| 390 |
xv44586/toolkit4nlp
transformers implement (architecture, task example, serving and more) |
|
Established |
| 391 |
fastdatascience/faststylometry
Stylometry library for Burrows' Delta method |
|
Established |
| 392 |
Yale-LILY/SummerTime
An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo |
|
Established |
| 393 |
ysenarath/sinling
A collection of NLP tools for Sinhalese (සිංහල). |
|
Established |
| 394 |
mbejda/Node-OpenNLP
Apache OpenNLP wrapper for Nodejs |
|
Established |
| 395 |
uoneway/KoBertSum
KoBertSum은 BertSum모델을 한국어 데이터에 적용할 수 있도록 수정한 한국어 요약 모델입니다. |
|
Established |
| 396 |
LHNCBC/metamaplite
A near real-time named-entity recognizer |
|
Established |
| 397 |
nitotm/efficient-language-detector-js
Fast and accurate natural language detection. Detector written in... |
|
Established |
| 398 |
Lilykos/pyphonetics
A Python 3 phonetics library. |
|
Established |
| 399 |
gpsyrou/tube-virality
The YouTube Virality project collects and analyzes trending video data from... |
|
Established |
| 400 |
yongzhuo/Macadam
Macadam是一个以Tensorflow(Keras)和bert4keras为基础,专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RAND... |
|
Established |