scarletcho/KoLM

Korean text normalization and language preparation package for LM in Kaldi-based ASR system

/ 100

Established

This package helps prepare Korean text data for building speech recognition systems. It takes raw Korean text files, cleans them by normalizing, removing unwanted characters, and breaking them into sentences. The output is a refined text corpus and a pronunciation dictionary ready for use in Kaldi-based Automatic Speech Recognition (ASR) language model development. This tool is designed for speech scientists, linguists, or ASR engineers working with Korean language models.

No commits in the last 6 months. Available on PyPI.

Use this if you need to meticulously clean, normalize, and preprocess large Korean text corpora to create a robust language model and pronunciation dictionary for an ASR system.

Not ideal if your primary goal is general natural language processing tasks that do not require detailed grapheme-to-phone conversion or specific language model file formats for ASR.

speech-recognition korean-linguistics language-modeling text-preprocessing phonetics

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Related tools

daanzu/kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

gooofy/py-kaldi-asr

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as...

nttcslab-sp/kaldiio

A pure python module for reading and writing kaldi ark files

pykaldi/pykaldi

A Python wrapper for Kaldi

kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights