duongntbk/restore_vietnamese_diacritics
A Transformer based NLP solution to restore diacritics for Vietnamese text with 94.05% accuracy on the test dataset. You don't need to understand Vietnamese to use this, I promise :).
This helps restore correct diacritics (accent marks) to Vietnamese text that has had them removed, which is common in older documents or rapid typing. You input Vietnamese text without diacritics, and it outputs the same text with the correct diacritics added back. This is useful for anyone working with Vietnamese content, such as linguists, content creators, or researchers, who needs to ensure text accuracy and readability.
No commits in the last 6 months.
Use this if you need to quickly and accurately add diacritics to large volumes of Vietnamese text that currently lacks them.
Not ideal if you primarily work with other languages or if you only occasionally deal with very short snippets of Vietnamese text that you can easily correct manually.
Stars
8
Forks
2
Language
Python
License
MIT
Category
Last pushed
Jan 27, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/duongntbk/restore_vietnamese_diacritics"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vunb/vntk
Vietnamese NLP Toolkit for Node
vncorenlp/VnCoreNLP
A Vietnamese natural language processing toolkit (NAACL 2018)
VinAIResearch/PhoNLP
PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity...
IBM/transition-amr-parser
SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch....
duyvuleo/VNTC
A Large-scale Vietnamese News Text Classification Corpus