Japanese Text Processing NLP Tools
Tools for Japanese-specific morphological analysis, text normalization, kana-kanji conversion, and character processing. Does NOT include general multilingual NLP, machine translation systems, or language learning applications (unless text processing is the primary focus).
There are 97 japanese text processing tools tracked. 1 score above 70 (verified tier). The highest-rated is EmilStenstrom/conllu at 73/100 with 320 stars.
Get all 97 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=japanese-text-processing&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a... |
|
Verified |
| 2 |
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python |
|
Established |
| 3 |
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules |
|
Established |
| 4 |
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks |
|
Established |
| 5 |
natasha/razdel
Rule-based token, sentence segmentation for Russian language |
|
Established |
| 6 |
polm/cutlet
Japanese to romaji converter in Python |
|
Established |
| 7 |
textlint-rule/sentence-splitter
Split {Japanese, English} text into sentences. |
|
Established |
| 8 |
azooKey/AzooKeyKanaKanjiConverter
Kana-Kanji Conversion Module written in Swift, supporting Neural Kana-Kanji... |
|
Established |
| 9 |
ku-nlp/rhoknp
Yet another Python binding for Juman++/KNP/KWJA |
|
Established |
| 10 |
himkt/konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to... |
|
Established |
| 11 |
gold-silver-copper/english
World's most accurate and fast procedural English conjugation library |
|
Emerging |
| 12 |
akaza-im/akaza
Yet another Japanese IME for IBus/Linux |
|
Emerging |
| 13 |
PKSHATechnology-Research/tdmelodic
A Japanese accent dictionary generator |
|
Emerging |
| 14 |
togatoga/karukan
Japanese Input Method System for Linux, Neural Kana-Kanji Conversion Engine... |
|
Emerging |
| 15 |
polm/fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and... |
|
Emerging |
| 16 |
KoichiYasuoka/SuPar-UniDic
Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and... |
|
Emerging |
| 17 |
kevincobain2000/jProcessing
Japanese Natural Langauge Processing Libraries |
|
Emerging |
| 18 |
LibreTranslate/MiniSBD
Free and open source library for fast sentence boundary detection |
|
Emerging |
| 19 |
SOMJANG/Mecab-ko-for-Google-Colab
Use Mecab Library(NLP Library) in Google Colab |
|
Emerging |
| 20 |
fnl/syntok
Text tokenization and sentence segmentation (segtok v2) |
|
Emerging |
| 21 |
rabbit19981023/yomigana-ebook
The fastest converter to add furigana(readings) to Japanese epub eBooks |
|
Emerging |
| 22 |
ku-nlp/jumanpp
Juman++ (a Morphological Analyzer Toolkit) |
|
Emerging |
| 23 |
hamanlp/hama
🦛 Hangul Morphological Analyzer |
|
Emerging |
| 24 |
miurahr/pykakasi
Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman. |
|
Emerging |
| 25 |
ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd |
|
Emerging |
| 26 |
mediacloud/sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and... |
|
Emerging |
| 27 |
andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser |
|
Emerging |
| 28 |
Kensuke-Mitsuzawa/JapaneseTokenizers
aim to use JapaneseTokenizer as easy as possible |
|
Emerging |
| 29 |
javierarce/silabea
Node package that split Spanish words into syllables. |
|
Emerging |
| 30 |
ikegami-yukino/neologdn-java
Japanese text normalizer for mecab-neologd |
|
Emerging |
| 31 |
koshort/pyeunjeon
(deprecated) 은전한닢 프로젝트와 mecab 기반의 한국어 형태소 분석기의 독립형 python 인터페이스 |
|
Emerging |
| 32 |
mkartawijaya/dango
An easy to use tokenizer for Japanese text, aimed at language learners and... |
|
Emerging |
| 33 |
ikegami-yukino/mozcpy
Kana-Kanji converter using Mozc dictionary |
|
Emerging |
| 34 |
alinear-corp/kuzukiri
Japanese Text Segmenter for Python written in Rust |
|
Emerging |
| 35 |
loomchild/segment
Program used to split text into segments |
|
Emerging |
| 36 |
thammin/juman-bin
a User-Extensible Morphological Analyzer for Japanese. 日本語形態素解析システム |
|
Emerging |
| 37 |
andreihar/taibun.js
Taiwanese Hokkien Transliterator and Tokeniser |
|
Emerging |
| 38 |
neelguha/legal-segmenter
A simple library for segmenting legal texts |
|
Emerging |
| 39 |
wwwcojp/ja_sentence_segmenter
japanese sentence segmentation library for python |
|
Emerging |
| 40 |
LanguageMachines/mbt
MBT: Memory-based tagger generation and tagging MBT is a memory-based... |
|
Emerging |
| 41 |
craigtrim/fast-sentence-segment
Fast and Efficient Sentence Segmentation |
|
Emerging |
| 42 |
medspacy/sectionizer
A rule-based Python module for spitting documents into sections. |
|
Emerging |
| 43 |
LR-POR/cl-conllu
tool for working with conllu files in CL |
|
Emerging |
| 44 |
typedgrammar/typed-japanese
🌸 Learn Japanese grammar with TypeScript |
|
Emerging |
| 45 |
azu/morpheme-match
match function that match token(形態素解析) with sentence. |
|
Emerging |
| 46 |
gpizzorno/conllu_tools
A Python toolkit for working with CoNLL-U files, Universal Dependencies... |
|
Emerging |
| 47 |
ABTdomain/dksplit
DKSplit — fast word segmentation for Python. Split domain names and... |
|
Emerging |
| 48 |
tokuhirom/jawiki-kana-kanji-dict
Generate SKK/MeCab dictionary from Wikipedia(Japanese edition) |
|
Emerging |
| 49 |
ejossev/hypherator-java
Java Hyphenation Iterator |
|
Emerging |
| 50 |
ku-nlp/knp
A Japanese Parser |
|
Emerging |
| 51 |
cronokirby/ginkou
Japanese sentence bank program. Add and find sentences for language learning. |
|
Emerging |
| 52 |
yoshoku/suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby |
|
Emerging |
| 53 |
luxiant/sentence_segmentation
A rule-based sentence_segmenter, inspired by ruby pragmatic segmenter by... |
|
Emerging |
| 54 |
retarfi/jptranstokenizer
Japanese Tokenizer for transformers library |
|
Emerging |
| 55 |
tasukuigarashi/j-liwc2015
Japanese version of LIWC2015 |
|
Emerging |
| 56 |
junhewk/RcppMeCab
RcppMeCab: Rcpp Interface of CJK Morpheme Analyzer MeCab |
|
Emerging |
| 57 |
ArthurDevNL/CoNLL-U
A lightweight NuGet package for parsing CoNLL-U files in C# |
|
Emerging |
| 58 |
agatan/yoin
A Japanese Morphological Analyzer written in pure Rust |
|
Emerging |
| 59 |
uribo/sudachir
R Interface to 'Sudachi' |
|
Emerging |
| 60 |
KOLANICH-libs/WordSplitAbs.py
An abstraction layer around word splitters for python |
|
Experimental |
| 61 |
NonJishoKei/NonJishoKei
[WIP] This is a lightweight morphological analyzer designed for Japanese... |
|
Experimental |
| 62 |
azagniotov/solr-lucene-analyzer-sudachi
A Japanese morphological analyzer Sudachi as a Solr plugin. |
|
Experimental |
| 63 |
hephaex/mecab-ko
MeCab-Ko: Rust로 구현된 한국어 형태소 분석기. 세종 코퍼스 호환 97% 정확도. |
|
Experimental |
| 64 |
MiguelNecoechea/Complexa
Yet another Chrome extension for learning Japanese |
|
Experimental |
| 65 |
junhewk/RmecabKo
RmecabKo: R wrapper for eunjeon project (mecab-ko) |
|
Experimental |
| 66 |
apakabarlabs/syllabreak-swift
Multilingual library for accurate and deterministic hyphenation and syllable... |
|
Experimental |
| 67 |
taipalogy/taipa
台灣語形態素解析(Taiwanese morphological parsing) |
|
Experimental |
| 68 |
tchin25/japanese-dependency-visualizer
A dependency visualizer for Japanese to help beginners deconstruct complex... |
|
Experimental |
| 69 |
rmalouf/treesearch
High-performance toolkit for querying linguistic dependency parses |
|
Experimental |
| 70 |
cryshin22/Cutlet-Japan
Japanese to romaji converter in Python |
|
Experimental |
| 71 |
hppRC/jawiki-cleaner
🧹Japanese Wikipedia Cleaner 🧹 |
|
Experimental |
| 72 |
GINK03/boosting-tree-tokenizer
Gradient Boosting Dicision... |
|
Experimental |
| 73 |
atsumari-io/mecab-service
Web app for tokenizing Japanese text using MeCab |
|
Experimental |
| 74 |
btrkeks/jp-deinflector
A high-performance Rust crate for deinflecting Japanese words using perfect... |
|
Experimental |
| 75 |
tetutaro/mecab_dictionaries
create various dictionaries for MeCab and MeCab CLI using fugashi |
|
Experimental |
| 76 |
megagonlabs/desuwa
Feature annotator to morphemes and phrases based on KNP rule files (pure-Python) |
|
Experimental |
| 77 |
rakutentech/pisah
Sentence Splitter Library (C++ port of pySBD) |
|
Experimental |
| 78 |
Shusei-E/RcppJagger
RcppJagger is a wrapper package for Jagger |
|
Experimental |
| 79 |
bureaucratic-labs/conllu
CoNLL-U format parser |
|
Experimental |
| 80 |
jeffhuen/plurality
Fast English plural and singular noun inflection for Elixir. Convert plural... |
|
Experimental |
| 81 |
akiomik/vibrato-dict-ipa-neologd
A compiled mecab-ipadic-neologd dictionary for vibrato |
|
Experimental |
| 82 |
cronokirby/nicer-mecab
Japanese morphological analysis. Wrapper over mecab. |
|
Experimental |
| 83 |
BrambleXu/jp-stopword-filter
A lightweight Python library designed to filter stopwords from Japanese text... |
|
Experimental |
| 84 |
proycon/hyphertool
Command-line tool for syllabification and hyphenisation for multiple languages |
|
Experimental |
| 85 |
milovatjp/hazuki
Japanese complexity analysis app within JLPT framework. |
|
Experimental |
| 86 |
QuyAnh2005/StyleTTS-VC-Japanese
StyleTTS Voice Conversions for Japanese |
|
Experimental |
| 87 |
d108/Samazama
Save keystrokes for iOS and macOS users by comparing shorthand input against... |
|
Experimental |
| 88 |
TaygaHoshi/japanese-i-plus-one-filter
Finds i+1 sentences for a specific word from Jisho.org. |
|
Experimental |
| 89 |
luckasRanarison/kaiseki
A japanese tokenizer and morphological analyzer |
|
Experimental |
| 90 |
mkpoli/ainu-wiktionary
アイヌ語Wiktionary入力補助ツール |
|
Experimental |
| 91 |
PenguinCabinet/Aiueo-sort-with-Kanji-reading
漢字の読みを考慮した、あいうえお順ソートツール |
|
Experimental |
| 92 |
whelk-io/hy-phen-a-tion
Java OSS library for calculating syllables and hyphenation based on Frank... |
|
Experimental |
| 93 |
ku-nlp/jumanpp-jumandic
Scripts for training Jumandic Juman++ model |
|
Experimental |
| 94 |
ensan-hcl/KanaKanjiConversionSamples
KanaKanjiConversionSamples is a sample package which implements very simple... |
|
Experimental |
| 95 |
ru-ka/syllable-divider
A WebAssembly library for syllable division in XML/DOM trees |
|
Experimental |
| 96 |
evamaxfield/cue-queue
Transcript segmentation using the average semantic encodings of cue sentences. |
|
Experimental |
| 97 |
apakabarlabs/syllabreak-kotlin
Kotlin library for multilingual syllabification and hyphenation |
|
Experimental |