All NLP Tools

13,598 tools ranked by quality score · Page 4 of 136

Showing 301–400 of 13,598
# Tool Score Tier
301 gaphex/bert_experimental

code and supplementary materials for a series of Medium articles about the BERT model

54
Established
302 PetrKorab/Arabica

Python package for text mining of time-series data

54
Established
303 textlint-rule/sentence-splitter

Split {Japanese, English} text into sentences.

54
Established
304 kensk8er/chicksexer

A Python package for gender classification.

54
Established
305 OpenPecha/pybo

🦜 NLP for Tibetan, in Python.

54
Established
306 mpuig/spacy-lookup

Named Entity Recognition based on dictionaries

54
Established
307 natasha/yargy

Rule-based facts extraction for Russian language

54
Established
308 guotong1988/BERT-pre-training

multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism)

54
Established
309 mmmaurer/elfen

A python package to efficiently extract linguistic features for text/NLP datasets

54
Established
310 Ali-Alameer/NLP

This repository offers NLP resources & tutorials using keras/tensorflow....

54
Established
311 fdalvi/NeuroX

A Python library that encapsulates various methods for neuron interpretation...

54
Established
312 JDongian/python-jamo

Hangul syllable decomposition and synthesis using jamo.

54
Established
313 yongzhuo/Pytorch-NLU

中文文本分类、序列标注工具包(pytorch),支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标...

54
Established
314 jaguarliuu/rookie_text2data

Dify插件 - 自然语言获取数据库数据

54
Established
315 LSYS/LexicalRichness

:smile_cat: :speech_balloon: A module to compute textual lexical richness...

54
Established
316 PyThaiNLP/attacut

A Fast and Accurate Neural Thai Word Segmenter

54
Established
317 jalammar/ecco

Explain, analyze, and visualize NLP language models. Ecco creates...

54
Established
318 naver/claf

CLaF: Open-Source Clova Language Framework

54
Established
319 mikahama/uralicNLP

An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and...

54
Established
320 Shubxam/Nifty-500-Live-Sentiment-Analysis

Live Sentiment Analysis dashboard of NIFTY 500 universe of stocks using...

54
Established
321 soaxelbrooke/python-bpe

Byte Pair Encoding for Python!

54
Established
322 neomatrix369/nlp_profiler

A simple NLP library allows profiling datasets with one or more text...

54
Established
323 brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

54
Established
324 thalesbertaglia/enelvo

A flexible normalizer for user-generated content

54
Established
325 daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

54
Established
326 sildar/potara

Multi-document summarization tool relying on ILP and sentence fusion

54
Established
327 stanford-oval/genienlp

GenieNLP: A versatile codebase for any NLP task

54
Established
328 gagan3012/keytotext

Keywords to Sentences

53
Established
329 explosion/spacy-transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

53
Established
330 jiaeyan/Jiayan

甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st...

53
Established
331 vgrabovets/multi_rake

Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

53
Established
332 natasha/natasha

Solves basic Russian NLP tasks, API for lower level Natasha projects

53
Established
333 pysentimiento/pysentimiento

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

53
Established
334 azooKey/AzooKeyKanaKanjiConverter

Kana-Kanji Conversion Module written in Swift, supporting Neural Kana-Kanji...

53
Established
335 bjascob/LemmInflect

A python module for English lemmatization and inflection.

53
Established
336 batzner/tensorlm

Wrapper library for text generation / language models at character and word...

53
Established
337 amirshnll/Persian-Swear-Words

Persian Swear Dataset - you can use in your production to filter unwanted...

53
Established
338 gentaiscool/code-switching-papers

A curated list of research papers and resources on code-switching

53
Established
339 andifunke/topic-labeling

The project proposes a framework to apply topic models on a text-corpus and...

53
Established
340 hamelsmu/ktext

Utilities for preprocessing text for deep learning with Keras

53
Established
341 jfilter/clean-text

🧹 Python package for text cleaning

53
Established
342 ku-nlp/rhoknp

Yet another Python binding for Juman++/KNP/KWJA

53
Established
343 google-research/turkish-morphology

A two-level morphological analyzer for Turkish.

53
Established
344 proycon/colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working...

53
Established
345 chakki-works/sumeval

Well tested & Multi-language evaluation framework for text summarization.

53
Established
346 massimoaria/tall

Text Analysis for aLL

53
Established
347 polm/unidic-py

Unidic packaged for installation via pip.

53
Established
348 mouseart2025/AI-Reader-V2

AI 小说分析可视化工具 — 角色关系图谱 · 地理地图 · 时间线 · 百科全书 | 支持 Ollama 本地 + 10 大云端 LLM |...

53
Established
349 boat-group/fancy-nlp

NLP for human. A fast and easy-to-use natural language processing (NLP)...

53
Established
350 urduhack/urduhack

An NLP library for the Urdu language. It comes with a lot of battery...

53
Established
351 vzhong/embeddings

Fast, DB Backed pretrained word embeddings for natural language processing.

53
Established
352 keyATM/keyATM

An R package for Keyword Assisted Topic Models

53
Established
353 explosion/spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy

53
Established
354 zhang17173/Event-Extraction

基于法律裁判文书的事件抽取及其应用,包括数据的分词、词性标注、命名实体识别、事件要素抽取和判决结果预测等内容

53
Established
355 CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

53
Established
356 bab2min/tomotopy

Python package of Tomoto, the Topic Modeling Tool

53
Established
357 Cyberbolt/Cemotion

A Chinese NLP library based on BERT for sentiment analysis and...

53
Established
358 affjljoo3581/canrevan

대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다.

53
Established
359 lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports...

53
Established
360 thepushkarp/nalcos

Search Git commits in natural language

53
Established
361 LanguageMachines/ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from...

53
Established
362 howl-anderson/seq2annotation

基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF...

53
Established
363 FreeDiscovery/FreeDiscovery

Web Service for E-Discovery Analytics

53
Established
364 syuoni/eznlp

Easy Natural Language Processing

53
Established
365 google-research/fool-me-twice

Game code and data for Fool Me Twice: Entailment from Wikipedia Gamification...

53
Established
366 dccuchile/wefe

WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework...

53
Established
367 darija-open-dataset/dataset

darija <-> english dataset

52
Established
368 ikegami-yukino/oseti

Dictionary based Sentiment Analysis for Japanese

52
Established
369 NorskRegnesentral/skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

52
Established
370 giacbrd/ShallowLearn

An experiment about re-implementing supervised learning models based on...

52
Established
371 dnanhkhoa/python-vncorenlp

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

52
Established
372 strangetom/ingredient-parser

A tool to parse recipe ingredients into structured data

52
Established
373 lunarwhite/tan-division

Chinese corpus sentiment analysis. 谭松波酒店评论中文文本情感分析

52
Established
374 omicsNLP/Auto-CORPus

Auto-CORPus pipeline developed by a University of Nottingham and Imperial...

52
Established
375 Ricardokevins/Kevinpro-NLP-demo

All NLP you Need Here. 目前包含15个NLP demo的pytorch实现(大量代码借鉴于其他开源项目,原先是自己玩的,后来干脆也开源出来)

52
Established
376 rosette-api/java

Babel Street Analytics Client Library for Java

52
Established
377 tmalsburg/txl.el

Emacs extension providing direct access to DeepL's machine translation API.

52
Established
378 asahi417/tner

Language model fine-tuning on NER with an easy interface and cross-domain...

52
Established
379 Ars-Linguistica/mlconjug3

A Python library to conjugate verbs in French, English, Spanish, Italian,...

52
Established
380 changwookjun/nlp-paper

NLP Paper

52
Established
381 gerardobort/node-corenlp

CoreNLP @ NodeJS

52
Established
382 winkjs/wink-pos-tagger

English Part-of-speech (POS) tagger

52
Established
383 SergeyShk/ruTS

Библиотека для извлечения статистик из текстов на русском языке.

52
Established
384 SlapBot/sounder

An intent recognizing algorithm to predict the intent of a given text.

52
Established
385 sillsdev/machine

Machine is a natural language processing library for .NET that is focused on...

52
Established
386 bobxwu/TopMost

A Topic Modeling System Toolkit (ACL 2024 Demo)

52
Established
387 jblake1965/eluciDoc

Screens legal text and extracts sentences containing user input party...

52
Established
388 pd3f/pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

52
Established
389 gagan3012/PolyDeDupe

PolyDeDupe: Multi-Lingual Data Deduplication

52
Established
390 xv44586/toolkit4nlp

transformers implement (architecture, task example, serving and more)

52
Established
391 fastdatascience/faststylometry

Stylometry library for Burrows' Delta method

52
Established
392 Yale-LILY/SummerTime

An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo

52
Established
393 ysenarath/sinling

A collection of NLP tools for Sinhalese (සිංහල).

52
Established
394 mbejda/Node-OpenNLP

Apache OpenNLP wrapper for Nodejs

52
Established
395 uoneway/KoBertSum

KoBertSum은 BertSum모델을 한국어 데이터에 적용할 수 있도록 수정한 한국어 요약 모델입니다.

52
Established
396 LHNCBC/metamaplite

A near real-time named-entity recognizer

52
Established
397 nitotm/efficient-language-detector-js

Fast and accurate natural language detection. Detector written in...

52
Established
398 Lilykos/pyphonetics

A Python 3 phonetics library.

52
Established
399 gpsyrou/tube-virality

The YouTube Virality project collects and analyzes trending video data from...

52
Established
400 yongzhuo/Macadam

Macadam是一个以Tensorflow(Keras)和bert4keras为基础,专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RAND...

52
Established