TeaPoly/CE-OptimizedLoss

Optimized loss based on cross-entropy (CE), like MWER (minimum WER) Loss with beam search and negative sampling strategy, Smoothed Max Pooling Loss.

/ 100

Emerging

This project provides advanced techniques to refine how speech recognition models learn, especially when aiming for high accuracy in transcribing spoken language. It takes in the raw output (logits) from a speech model and helps fine-tune it by providing more accurate feedback during the training process. This is for machine learning engineers or researchers building or improving speech-to-text systems.

No commits in the last 6 months.

Use this if you are training speech recognition models and want to optimize their performance to minimize word error rates.

Not ideal if you are looking for a pre-trained speech recognition model or a tool for basic speech transcription.

speech-to-text automatic-speech-recognition model-training natural-language-processing audio-transcription

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

githubharald/CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon...

githubharald/CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.

nl8590687/ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

athena-team/athena

an open-source implementation of sequence-to-sequence based speech processing engine

hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Explore Voice AI Tools

All categories Trending Voice AI directory Insights