evalphobia/go-jp-text-ripper

tokenize text and separate it into words for Japanese

/ 100

Experimental

This tool helps Japanese language data analysts and researchers prepare text data for further analysis. It takes a CSV or TSV file containing Japanese text, breaks down the text into individual words (tokenization), and adds new columns with the segmented words and word counts. This is useful for anyone who needs to process large volumes of Japanese text for tasks like sentiment analysis, keyword extraction, or linguistic research.

No commits in the last 6 months.

Use this if you need to quickly and accurately break down Japanese sentences into individual words from structured data files like spreadsheets.

Not ideal if you're working with languages other than Japanese, or if you need advanced natural language processing features beyond basic word segmentation and frequency analysis.

Japanese-linguistics text-analysis data-preparation market-research content-analysis

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

License

—

Higher-rated alternatives

ikawaha/kagome-dict

Dictionary Library for Kagome v2

aaaton/golem

A lemmatizer implemented in Go

habeanf/yap

Yet Another (natural language) Parser

clipperhouse/uax29

A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.

jdkato/prose

:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and...

Explore NLP Tools

All categories Trending NLP directory Insights