harlanhong/ACTalker

ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).

/ 100

Emerging

ACTalker helps you create realistic talking head videos from just a still image and audio, or even more complex controls like facial expressions. It takes a reference image and an audio file (or a video for expression control) and outputs a video of the person in the image speaking the audio, with synchronized lip movements and natural expressions. This is ideal for content creators, marketers, educators, or anyone needing to generate dynamic video presentations from static visuals and sound.

447 stars. No commits in the last 6 months.

Use this if you need to generate high-quality, natural-looking talking head videos for presentations, marketing, or digital content using an image and audio.

Not ideal if you need a quick, low-resource solution, as it requires significant GPU power (24GB+ VRAM) and specific software environments for optimal performance.

video-generation digital-avatar content-creation synthetic-media virtual-presenter

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 18 / 25

How are scores calculated?

Stars

447

Forks

Language

Python

License

—

Higher-rated alternatives

hao-ai-lab/FastVideo

A unified inference and post-training framework for accelerated video generation.

ModelTC/LightX2V

Light Image Video Generation Inference Framework

thu-ml/TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

PKU-YuanGroup/Helios

Helios: Real Real-Time Long Video Generation Model

PKU-YuanGroup/MagicTime

[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Explore Diffusion Models

All categories Trending Diffusion directory Insights