JaesungHuh/look-listen-recognise
Dataset page for Look, Listen and Recognise : character-aware audio-visual subtitling (ICASSP 2024)
This project offers a dataset for creating accurate subtitles that identify who is speaking. It takes raw audio and video, along with character and actor names, to produce detailed subtitle files including speaker names and precise timings. This resource is ideal for researchers and developers working on advanced subtitling technologies, especially for film, television, or multimedia content.
Use this if you are developing or evaluating systems that automatically generate subtitles and need to include specific speaker identification, not just spoken dialogue.
Not ideal if you're looking for a tool to generate basic subtitles without advanced speaker attribution, or if you need a dataset for simple speech-to-text transcription.
Stars
7
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 30, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/JaesungHuh/look-listen-recognise"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
cmusphinx/pocketsphinx
A small speech recognizer
tensorflow/lingvo
Lingvo
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models,...
PyThaiNLP/pythaiasr
Python Thai Automatic Speech Recognition