howl-anderson/PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

/ 100

Emerging

This tool helps split Chinese text into individual words, which is essential for many language processing tasks like search, analysis, or translation. You input a Chinese sentence or document, and it outputs a list of segmented words. Anyone working with Chinese text data, such as linguists, data analysts, or content managers, would find this useful.

No commits in the last 6 months.

Use this if you need accurate word segmentation for Chinese text, especially for tasks requiring a deep understanding of natural language.

Not ideal if you are working with languages other than Chinese or require a simple, rule-based tokenizer without deep learning capabilities.

Chinese-language-processing text-analysis natural-language-understanding content-management linguistics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

AGPL-3.0

Higher-rated alternatives

guillaume-be/rust-tokenizers

Rust-tokenizer offers high-performance tokenizers for modern language models, including...

sugarme/tokenizer

NLP tokenizers written in Go language

elixir-nx/tokenizers

Elixir bindings for 🤗 Tokenizers

openscilab/tocount

ToCount: Lightweight Token Estimator

reinfer/blingfire-rs

Rust wrapper for the BlingFire tokenization library

Explore NLP Tools

All categories Trending NLP directory Insights