ngxtnhi/ViLexNorm

A Lexical Normalization Corpus for Vietnamese Social Media Text

/ 100

Experimental

This collection of Vietnamese social media comments helps researchers and developers improve language processing. It provides pairs of original, informal social media text and its cleaned, standard Vietnamese equivalent. This is used by computational linguists and NLP engineers to build systems that can better understand casual online conversations.

No commits in the last 6 months.

Use this if you need a dataset to train or evaluate models for cleaning up informal Vietnamese text found on social media.

Not ideal if you're looking for a tool that performs the normalization itself, rather than data to build such a tool, or if your focus is on formal Vietnamese text.

Vietnamese-language-processing social-media-analysis text-normalization computational-linguistics NLP-research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

vunb/vntk

Vietnamese NLP Toolkit for Node

vncorenlp/VnCoreNLP

A Vietnamese natural language processing toolkit (NAACL 2018)

VinAIResearch/PhoNLP

PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity...

IBM/transition-amr-parser

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch....

duyvuleo/VNTC

A Large-scale Vietnamese News Text Classification Corpus

Explore NLP Tools

All categories Trending NLP directory Insights