howl-anderson/PaddleTokenizer
使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
This tool helps split Chinese text into individual words, which is essential for many language processing tasks like search, analysis, or translation. You input a Chinese sentence or document, and it outputs a list of segmented words. Anyone working with Chinese text data, such as linguists, data analysts, or content managers, would find this useful.
No commits in the last 6 months.
Use this if you need accurate word segmentation for Chinese text, especially for tasks requiring a deep understanding of natural language.
Not ideal if you are working with languages other than Chinese or require a simple, rule-based tokenizer without deep learning capabilities.
Stars
15
Forks
2
Language
JavaScript
License
AGPL-3.0
Category
Last pushed
Jul 27, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/howl-anderson/PaddleTokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including...
sugarme/tokenizer
NLP tokenizers written in Go language
elixir-nx/tokenizers
Elixir bindings for 🤗 Tokenizers
openscilab/tocount
ToCount: Lightweight Token Estimator
reinfer/blingfire-rs
Rust wrapper for the BlingFire tokenization library