georgesterpu/avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models

45
/ 100
Emerging

This research system helps scientists and engineers working on speech recognition to develop and test models that can interpret speech from both audio and visual cues. It takes audio and video files as input and outputs trained speech recognition models, along with evaluations like Character Error Rate and Word Error Rate. This tool is designed for academic researchers or advanced students in speech technology.

No commits in the last 6 months.

Use this if you are a researcher developing new audio-visual speech recognition models and need a flexible system to experiment with different architectures and data modalities.

Not ideal if you are looking for a ready-to-use, production-grade speech recognition system for immediate deployment.

speech-recognition computational-linguistics machine-learning-research audio-processing video-analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

83

Forks

28

Language

Python

License

GPL-3.0

Last pushed

Jul 10, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/georgesterpu/avsr-tf1"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.