JaesungHuh/look-listen-recognise

Dataset page for Look, Listen and Recognise : character-aware audio-visual subtitling (ICASSP 2024)

/ 100

Experimental

This project offers a dataset for creating accurate subtitles that identify who is speaking. It takes raw audio and video, along with character and actor names, to produce detailed subtitle files including speaker names and precise timings. This resource is ideal for researchers and developers working on advanced subtitling technologies, especially for film, television, or multimedia content.

Use this if you are developing or evaluating systems that automatically generate subtitles and need to include specific speaker identification, not just spoken dialogue.

Not ideal if you're looking for a tool to generate basic subtitles without advanced speaker attribution, or if you need a dataset for simple speech-to-text transcription.

subtitling media accessibility speech processing video analysis linguistics

No Package No Dependents

Maintenance 6 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Featured in

Things AI Won't Tell You About Building a Voice App Choosing a Voice AI Library in 2026: What's Actually Worth Building On

Higher-rated alternatives

Uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

cmusphinx/pocketsphinx

A small speech recognizer

tensorflow/lingvo

Lingvo

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models,...

PyThaiNLP/pythaiasr

Python Thai Automatic Speech Recognition

Explore Voice AI Tools

All categories Trending Voice AI directory Insights