Sreyan88/LipGER

Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

/ 100

Experimental

This project helps improve the accuracy of Automatic Speech Recognition (ASR) systems, especially in noisy conditions. It takes your speech audio files, their corresponding video, and an initial ASR transcription, then generates a more accurate transcription by using both the sound and the speaker's lip movements. This is ideal for anyone working with audio-visual data who needs highly reliable speech-to-text conversion.

No commits in the last 6 months.

Use this if you have existing audio-visual recordings and need to refine or correct errors in their automatically generated speech transcripts.

Not ideal if you only have audio recordings without corresponding video or if you need an ASR system from scratch rather than an error correction tool.

speech-to-text audio-transcription video-analysis multimodal-data noise-reduction

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

Mrkomiljon/awesome-generative-ai

Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.

NVIDIA/Maya-ACE

Maya-ACE: A Reference Client Implementation for NVIDIA ACE Audio2Face Service

OpenVGLab/OmniLottie

[CVPR 2026🔥] 🧑‍🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator...

jdh-algo/JoyHallo

JoyHallo: Digital human model for Mandarin

michaelzhang-ai/Speech2Video

ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses"

Explore Generative AI Tools

All categories Trending Generative AI directory Insights