ngxtnhi/ViLexNorm
A Lexical Normalization Corpus for Vietnamese Social Media Text
This collection of Vietnamese social media comments helps researchers and developers improve language processing. It provides pairs of original, informal social media text and its cleaned, standard Vietnamese equivalent. This is used by computational linguists and NLP engineers to build systems that can better understand casual online conversations.
No commits in the last 6 months.
Use this if you need a dataset to train or evaluate models for cleaning up informal Vietnamese text found on social media.
Not ideal if you're looking for a tool that performs the normalization itself, rather than data to build such a tool, or if your focus is on formal Vietnamese text.
Stars
20
Forks
1
Language
—
License
—
Category
Last pushed
Mar 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ngxtnhi/ViLexNorm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vunb/vntk
Vietnamese NLP Toolkit for Node
vncorenlp/VnCoreNLP
A Vietnamese natural language processing toolkit (NAACL 2018)
VinAIResearch/PhoNLP
PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity...
IBM/transition-amr-parser
SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch....
duyvuleo/VNTC
A Large-scale Vietnamese News Text Classification Corpus