gladiaio/normalization

A lightweight library for normalizing speech transcripts before computing WER

/ 100

Emerging

This tool helps speech-to-text (STT) professionals accurately evaluate their systems. It takes raw text from both the original spoken content and the STT system's output, standardizes their formatting, and then produces clean, comparable text ready for calculating Word Error Rate (WER). This ensures that only genuine recognition errors, not formatting differences, impact performance scores. Anyone evaluating or comparing speech recognition technologies, such as data scientists, AI researchers, or product managers, would find this useful.

Use this if you need to reliably measure the accuracy of speech-to-text systems by normalizing transcripts to a consistent format before computing Word Error Rate.

Not ideal if you need to analyze the raw, unformatted output of a speech-to-text system, including punctuation, capitalization, and numbers in their original forms.

speech-to-text ASR-evaluation transcription-quality natural-language-processing AI-benchmarking

No Package No Dependents

Maintenance 13 / 25

Adoption 5 / 25

Maturity 11 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

speechio/chinese_text_normalization

Chinese text normalization for speech processing

NickZaitsev/ru-normalizr

ru-normalizr — лучший open-source нормализатор русского текста. Приводит числа, даты, время,...

34j/mecab-text-cleaner

Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.

repodiac/german_transliterate

Python module to clean and transliterate (i.e. normalize) German text including abbreviations,...

google-research-datasets/TextNormalizationCoveringGrammars

Covering grammars for English and Russian text normalization

Explore Voice AI Tools

All categories Trending Voice AI directory Insights