phongnt570/UETsegmenter
A toolkit for Vietnamese word segmentation
This tool helps anyone working with Vietnamese text data prepare it for analysis by accurately identifying and grouping multi-word expressions. You provide raw Vietnamese text, and it outputs the text with correctly segmented words, making it easier for downstream text processing tasks. It's designed for researchers, linguists, or data analysts who need precise word boundaries in Vietnamese.
No commits in the last 6 months.
Use this if you need to accurately segment Vietnamese text into individual words and multi-word units for linguistic analysis, natural language processing, or information retrieval.
Not ideal if your primary need is for a broader Vietnamese text processing toolkit that also includes functionalities like part-of-speech tagging, as a related project (UETnlp) might be more suitable.
Stars
74
Forks
13
Language
Java
License
—
Category
Last pushed
Oct 20, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/phongnt570/UETsegmenter"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vunb/vntk
Vietnamese NLP Toolkit for Node
vncorenlp/VnCoreNLP
A Vietnamese natural language processing toolkit (NAACL 2018)
VinAIResearch/PhoNLP
PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity...
IBM/transition-amr-parser
SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch....
duyvuleo/VNTC
A Large-scale Vietnamese News Text Classification Corpus