harlanhong/ACTalker

ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).

38
/ 100
Emerging

ACTalker helps you create realistic talking head videos from just a still image and audio, or even more complex controls like facial expressions. It takes a reference image and an audio file (or a video for expression control) and outputs a video of the person in the image speaking the audio, with synchronized lip movements and natural expressions. This is ideal for content creators, marketers, educators, or anyone needing to generate dynamic video presentations from static visuals and sound.

447 stars. No commits in the last 6 months.

Use this if you need to generate high-quality, natural-looking talking head videos for presentations, marketing, or digital content using an image and audio.

Not ideal if you need a quick, low-resource solution, as it requires significant GPU power (24GB+ VRAM) and specific software environments for optimal performance.

video-generation digital-avatar content-creation synthetic-media virtual-presenter
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 18 / 25

How are scores calculated?

Stars

447

Forks

53

Language

Python

License

Last pushed

Aug 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/harlanhong/ACTalker"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.