DragonLiu1995/video-to-audio-through-text

[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”

/ 100

Emerging

This project helps video creators, educators, and content strategists automatically add realistic audio to silent videos or generate audio descriptions. It takes a video and an optional text prompt, then produces relevant audio and, if desired, a text description of that audio. This is perfect for anyone needing to enhance video content with rich, context-aware sound without manual sound design.

No commits in the last 6 months.

Use this if you need to generate high-quality, context-relevant audio or audio descriptions for videos, especially when you want to guide the audio generation with specific text prompts.

Not ideal if you're looking for a simple drag-and-drop tool for quick, casual video edits, as it requires some technical setup and a GPU.

video-editing content-creation media-production digital-storytelling accessibility

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

Mrkomiljon/awesome-generative-ai

Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.

NVIDIA/Maya-ACE

Maya-ACE: A Reference Client Implementation for NVIDIA ACE Audio2Face Service

OpenVGLab/OmniLottie

[CVPR 2026🔥] 🧑‍🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator...

jdh-algo/JoyHallo

JoyHallo: Digital human model for Mandarin

michaelzhang-ai/Speech2Video

ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses"

Explore Generative AI Tools

All categories Trending Generative AI directory Insights