daac-tools/python-vaporetto

🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

37
/ 100
Emerging

This tool breaks down Japanese text into individual words or meaningful units, similar to how we separate English sentences by spaces. It takes a block of Japanese text as input and outputs a list of tokens, optionally with their part-of-speech tags and pronunciations. It's designed for natural language processing engineers or researchers working with Japanese text analysis.

No commits in the last 6 months. Available on PyPI.

Use this if you need a fast and lightweight way to segment Japanese sentences into words for tasks like text mining, sentiment analysis, or machine translation.

Not ideal if you're not a developer and are looking for a ready-to-use application with a graphical interface for Japanese text segmentation.

Japanese NLP text segmentation natural language processing computational linguistics text analysis
Stale 6m No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 25 / 25
Community 4 / 25

How are scores calculated?

Stars

21

Forks

1

Language

Rust

License

Apache-2.0

Last pushed

Jun 01, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/daac-tools/python-vaporetto"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.