kmario23/KenLM-training

Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2

/ 100

Emerging

This project helps speech recognition and natural language processing engineers train an n-gram based language model using the KenLM toolkit. You provide a large text corpus, and it produces a language model file that can be used to score sentences based on their likelihood. This is primarily for engineers working on speech-to-text systems or other text prediction tasks.

116 stars. No commits in the last 6 months.

Use this if you need to create a custom language model from your own domain-specific text data for applications like speech recognition.

Not ideal if you are not a developer and are looking for a ready-to-use, pre-trained language model without any coding or command-line interaction.

Speech Recognition Natural Language Processing Text Analysis Machine Translation Computational Linguistics

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 18 / 25

How are scores calculated?

Stars

116

Forks

Language

—

License

—

Higher-rated alternatives

yeyupiaoling/PunctuationModel

中文标点符号模型，可以给文本添加标点符号。

mpoyraz/ngram-lm-wiki

Scripts to train a n-gram language models on Wikipedia articles

Explore NLP Tools

All categories Trending NLP directory Insights