andreihar/taibun

Taiwanese Hokkien Transliterator and Tokeniser

/ 100

Emerging

Translates Taiwanese Hokkien text written in Chinese characters into various Romanization systems or phonetic symbols like Tâi-uân Lô-má-jī (Tailo) or Pe̍h-ōe-jī (POJ). It takes your input text and provides it in a chosen transcription system, helping linguists, language students, or cultural heritage professionals accurately represent and study the sounds of Taiwanese Hokkien.

No commits in the last 6 months. Available on PyPI.

Use this if you need to convert Taiwanese Hokkien text from Chinese characters into a standard phonetic or Romanized script for academic work, language learning, or digital display.

Not ideal if you need to translate Taiwanese Hokkien into a different language, as this tool only transliterates the pronunciation within the language itself.

linguistics Taiwanese Hokkien language education transcription dialect studies

Stale 6m

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights