harlanhong/ACTalker
ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
ACTalker helps you create realistic talking head videos from just a still image and audio, or even more complex controls like facial expressions. It takes a reference image and an audio file (or a video for expression control) and outputs a video of the person in the image speaking the audio, with synchronized lip movements and natural expressions. This is ideal for content creators, marketers, educators, or anyone needing to generate dynamic video presentations from static visuals and sound.
447 stars. No commits in the last 6 months.
Use this if you need to generate high-quality, natural-looking talking head videos for presentations, marketing, or digital content using an image and audio.
Not ideal if you need a quick, low-resource solution, as it requires significant GPU power (24GB+ VRAM) and specific software environments for optimal performance.
Stars
447
Forks
53
Language
Python
License
—
Category
Last pushed
Aug 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/harlanhong/ACTalker"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hao-ai-lab/FastVideo
A unified inference and post-training framework for accelerated video generation.
ModelTC/LightX2V
Light Image Video Generation Inference Framework
thu-ml/TurboDiffusion
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
PKU-YuanGroup/Helios
Helios: Real Real-Time Long Video Generation Model
PKU-YuanGroup/MagicTime
[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators