gladiaio/normalization
A lightweight library for normalizing speech transcripts before computing WER
This tool helps speech-to-text (STT) professionals accurately evaluate their systems. It takes raw text from both the original spoken content and the STT system's output, standardizes their formatting, and then produces clean, comparable text ready for calculating Word Error Rate (WER). This ensures that only genuine recognition errors, not formatting differences, impact performance scores. Anyone evaluating or comparing speech recognition technologies, such as data scientists, AI researchers, or product managers, would find this useful.
Use this if you need to reliably measure the accuracy of speech-to-text systems by normalizing transcripts to a consistent format before computing Word Error Rate.
Not ideal if you need to analyze the raw, unformatted output of a speech-to-text system, including punctuation, capitalization, and numbers in their original forms.
Stars
10
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/gladiaio/normalization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
speechio/chinese_text_normalization
Chinese text normalization for speech processing
NickZaitsev/ru-normalizr
ru-normalizr — лучший open-source нормализатор русского текста. Приводит числа, даты, время,...
34j/mecab-text-cleaner
Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.
repodiac/german_transliterate
Python module to clean and transliterate (i.e. normalize) German text including abbreviations,...
google-research-datasets/TextNormalizationCoveringGrammars
Covering grammars for English and Russian text normalization