JaesungHuh/look-listen-recognise

Dataset page for Look, Listen and Recognise : character-aware audio-visual subtitling (ICASSP 2024)

26
/ 100
Experimental

This project offers a dataset for creating accurate subtitles that identify who is speaking. It takes raw audio and video, along with character and actor names, to produce detailed subtitle files including speaker names and precise timings. This resource is ideal for researchers and developers working on advanced subtitling technologies, especially for film, television, or multimedia content.

Use this if you are developing or evaluating systems that automatically generate subtitles and need to include specific speaker identification, not just spoken dialogue.

Not ideal if you're looking for a tool to generate basic subtitles without advanced speaker attribution, or if you need a dataset for simple speech-to-text transcription.

subtitling media accessibility speech processing video analysis linguistics
No Package No Dependents
Maintenance 6 / 25
Adoption 4 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

7

Forks

Language

Python

License

Apache-2.0

Last pushed

Oct 30, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/JaesungHuh/look-listen-recognise"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.