JuliaStrings/TinySegmenter.jl

Julia version of TinySegmenter, compact Japanese tokenizer

/ 100

Emerging

When working with Japanese text, this tool helps you break sentences and phrases into individual words or meaningful units. You input a Japanese text string and it outputs a list of these segmented units. This is useful for anyone processing Japanese text for analysis, search, or other language-dependent tasks.

No commits in the last 6 months.

Use this if you need to quickly and accurately segment Japanese text into its constituent words or tokens for further processing.

Not ideal if your primary need is for a full-fledged natural language processing (NLP) library with capabilities beyond basic tokenization.

Japanese-text-analysis text-segmentation linguistic-research information-retrieval text-preprocessing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Julia

License

—

Higher-rated alternatives

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Systemcluster/kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...

daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

Explore NLP Tools

All categories Trending NLP directory Insights