andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
Translates Taiwanese Hokkien text written in Chinese characters into various Romanization systems or phonetic symbols like Tâi-uân Lô-má-jī (Tailo) or Pe̍h-ōe-jī (POJ). It takes your input text and provides it in a chosen transcription system, helping linguists, language students, or cultural heritage professionals accurately represent and study the sounds of Taiwanese Hokkien.
No commits in the last 6 months. Available on PyPI.
Use this if you need to convert Taiwanese Hokkien text from Chinese characters into a standard phonetic or Romanized script for academic work, language learning, or digital display.
Not ideal if you need to translate Taiwanese Hokkien into a different language, as this tool only transliterates the pronunciation within the language itself.
Stars
44
Forks
4
Language
Python
License
MIT
Category
Last pushed
Aug 31, 2024
Commits (30d)
0
Dependencies
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/andreihar/taibun"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
natasha/razdel
Rule-based token, sentence segmentation for Russian language