daac-tools/python-vaporetto

🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

/ 100

Emerging

This tool breaks down Japanese text into individual words or meaningful units, similar to how we separate English sentences by spaces. It takes a block of Japanese text as input and outputs a list of tokens, optionally with their part-of-speech tags and pronunciations. It's designed for natural language processing engineers or researchers working with Japanese text analysis.

No commits in the last 6 months. Available on PyPI.

Use this if you need a fast and lightweight way to segment Japanese sentences into words for tasks like text mining, sentiment analysis, or machine translation.

Not ideal if you're not a developer and are looking for a ready-to-use application with a graphical interface for Japanese text segmentation.

Japanese NLP text segmentation natural language processing computational linguistics text analysis

Stale 6m No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 25 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Rust

License

Apache-2.0

Higher-rated alternatives

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Systemcluster/kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...

daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

Explore NLP Tools

All categories Trending NLP directory Insights