georgesterpu/avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models

/ 100

Emerging

This research system helps scientists and engineers working on speech recognition to develop and test models that can interpret speech from both audio and visual cues. It takes audio and video files as input and outputs trained speech recognition models, along with evaluations like Character Error Rate and Word Error Rate. This tool is designed for academic researchers or advanced students in speech technology.

No commits in the last 6 months.

Use this if you are a researcher developing new audio-visual speech recognition models and need a flexible system to experiment with different architectures and data modalities.

Not ideal if you are looking for a ready-to-use, production-grade speech recognition system for immediate deployment.

speech-recognition computational-linguistics machine-learning-research audio-processing video-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Higher-rated alternatives

githubharald/CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon...

githubharald/CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.

nl8590687/ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

athena-team/athena

an open-source implementation of sequence-to-sequence based speech processing engine

hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Explore Voice AI Tools

All categories Trending Voice AI directory Insights