evalphobia/go-jp-text-ripper
tokenize text and separate it into words for Japanese
This tool helps Japanese language data analysts and researchers prepare text data for further analysis. It takes a CSV or TSV file containing Japanese text, breaks down the text into individual words (tokenization), and adds new columns with the segmented words and word counts. This is useful for anyone who needs to process large volumes of Japanese text for tasks like sentiment analysis, keyword extraction, or linguistic research.
No commits in the last 6 months.
Use this if you need to quickly and accurately break down Japanese sentences into individual words from structured data files like spreadsheets.
Not ideal if you're working with languages other than Japanese, or if you need advanced natural language processing features beyond basic word segmentation and frequency analysis.
Stars
11
Forks
1
Language
Go
License
—
Category
Last pushed
Jan 05, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/evalphobia/go-jp-text-ripper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ikawaha/kagome-dict
Dictionary Library for Kagome v2
aaaton/golem
A lemmatizer implemented in Go
habeanf/yap
Yet Another (natural language) Parser
clipperhouse/uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.
jdkato/prose
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and...