SMIL-SPCRAS/DAVIS
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
This project provides a unique dataset and an advanced method for understanding speech in noisy vehicle environments, even from different camera angles. It takes in audio and video recordings of people speaking in cars and delivers highly accurate transcriptions of their voice commands. This is invaluable for researchers and developers building robust voice control systems for in-car applications, especially for languages beyond English.
No commits in the last 6 months.
Use this if you are developing or testing speech recognition systems for cars and need realistic, 'in-the-wild' data with varied angles and background noise.
Not ideal if your focus is on general-purpose speech recognition outside of vehicle environments or if you require a simple, ready-to-use API.
Stars
9
Forks
—
Language
JavaScript
License
—
Category
Last pushed
Apr 08, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/SMIL-SPCRAS/DAVIS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
cmusphinx/pocketsphinx
A small speech recognizer
tensorflow/lingvo
Lingvo
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models,...
PyThaiNLP/pythaiasr
Python Thai Automatic Speech Recognition