Sreyan88/LipGER

Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

27
/ 100
Experimental

This project helps improve the accuracy of Automatic Speech Recognition (ASR) systems, especially in noisy conditions. It takes your speech audio files, their corresponding video, and an initial ASR transcription, then generates a more accurate transcription by using both the sound and the speaker's lip movements. This is ideal for anyone working with audio-visual data who needs highly reliable speech-to-text conversion.

No commits in the last 6 months.

Use this if you have existing audio-visual recordings and need to refine or correct errors in their automatically generated speech transcripts.

Not ideal if you only have audio recordings without corresponding video or if you need an ASR system from scratch rather than an error correction tool.

speech-to-text audio-transcription video-analysis multimodal-data noise-reduction
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

18

Forks

1

Language

Python

License

Apache-2.0

Last pushed

Jul 16, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/Sreyan88/LipGER"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.