TeaPoly/CE-OptimizedLoss
Optimized loss based on cross-entropy (CE), like MWER (minimum WER) Loss with beam search and negative sampling strategy, Smoothed Max Pooling Loss.
This project provides advanced techniques to refine how speech recognition models learn, especially when aiming for high accuracy in transcribing spoken language. It takes in the raw output (logits) from a speech model and helps fine-tune it by providing more accurate feedback during the training process. This is for machine learning engineers or researchers building or improving speech-to-text systems.
No commits in the last 6 months.
Use this if you are training speech recognition models and want to optimize their performance to minimize word error rates.
Not ideal if you are looking for a pre-trained speech recognition model or a tool for basic speech transcription.
Stars
24
Forks
6
Language
Python
License
—
Category
Last pushed
Oct 11, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/TeaPoly/CE-OptimizedLoss"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
githubharald/CTCDecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon...
githubharald/CTCWordBeamSearch
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
athena-team/athena
an open-source implementation of sequence-to-sequence based speech processing engine
hirofumi0810/tensorflow_end2end_speech_recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)