georgesterpu/avsr-tf1
Audio-Visual Speech Recognition using Sequence to Sequence Models
This research system helps scientists and engineers working on speech recognition to develop and test models that can interpret speech from both audio and visual cues. It takes audio and video files as input and outputs trained speech recognition models, along with evaluations like Character Error Rate and Word Error Rate. This tool is designed for academic researchers or advanced students in speech technology.
No commits in the last 6 months.
Use this if you are a researcher developing new audio-visual speech recognition models and need a flexible system to experiment with different architectures and data modalities.
Not ideal if you are looking for a ready-to-use, production-grade speech recognition system for immediate deployment.
Stars
83
Forks
28
Language
Python
License
GPL-3.0
Category
Last pushed
Jul 10, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/georgesterpu/avsr-tf1"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
githubharald/CTCDecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon...
githubharald/CTCWordBeamSearch
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
athena-team/athena
an open-source implementation of sequence-to-sequence based speech processing engine
hirofumi0810/tensorflow_end2end_speech_recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)