smeetrs/deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

46
/ 100
Emerging

This project offers a speech-to-text transcription solution that can process audio, video (lip movements), or a combination of both. It takes speech recordings as input and outputs the spoken words as text, even in noisy environments where audio alone might be unclear. This is ideal for researchers and engineers working on automated transcription or assistive technologies for speech.

243 stars. No commits in the last 6 months.

Use this if you need to transcribe spoken language from video footage or audio recordings, especially when dealing with poor audio quality or silent lip-read scenarios.

Not ideal if you need a simple, out-of-the-box transcription service without delving into model training or setup, as this requires some technical configuration.

speech-to-text audio-analysis video-analysis lip-reading transcription-services
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

243

Forks

42

Language

Python

License

MIT

Last pushed

Feb 15, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/smeetrs/deep_avsr"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.