Sreyan88/LipGER
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
This project helps improve the accuracy of Automatic Speech Recognition (ASR) systems, especially in noisy conditions. It takes your speech audio files, their corresponding video, and an initial ASR transcription, then generates a more accurate transcription by using both the sound and the speaker's lip movements. This is ideal for anyone working with audio-visual data who needs highly reliable speech-to-text conversion.
No commits in the last 6 months.
Use this if you have existing audio-visual recordings and need to refine or correct errors in their automatically generated speech transcripts.
Not ideal if you only have audio recordings without corresponding video or if you need an ASR system from scratch rather than an error correction tool.
Stars
18
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/Sreyan88/LipGER"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Mrkomiljon/awesome-generative-ai
Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.
NVIDIA/Maya-ACE
Maya-ACE: A Reference Client Implementation for NVIDIA ACE Audio2Face Service
OpenVGLab/OmniLottie
[CVPR 2026🔥] 🧑🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator...
jdh-algo/JoyHallo
JoyHallo: Digital human model for Mandarin
michaelzhang-ai/Speech2Video
ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses"