ngxtnhi/ViLexNorm

A Lexical Normalization Corpus for Vietnamese Social Media Text

19
/ 100
Experimental

This collection of Vietnamese social media comments helps researchers and developers improve language processing. It provides pairs of original, informal social media text and its cleaned, standard Vietnamese equivalent. This is used by computational linguists and NLP engineers to build systems that can better understand casual online conversations.

No commits in the last 6 months.

Use this if you need a dataset to train or evaluate models for cleaning up informal Vietnamese text found on social media.

Not ideal if you're looking for a tool that performs the normalization itself, rather than data to build such a tool, or if your focus is on formal Vietnamese text.

Vietnamese-language-processing social-media-analysis text-normalization computational-linguistics NLP-research
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

20

Forks

1

Language

License

Last pushed

Mar 20, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ngxtnhi/ViLexNorm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.