A new AI research paper from Microsoft promises a future where you can upload a photo, a sample of your voice and create a live, animated talking head of your own face.
VASA-1 takes in a single portrait photo and an audio file and converts it into a hyper realistic talking face video complete with lip sync, realistic facial features and head movement.
The model is currently only a research preview and not available for anyone outside of the Microsoft Research team to try, but the demo videos look impressive.
Similar lip sync and head movement technology is already available from Runway and Nvidia but this seems to be of a much higher quality and realism, reducing mouth artifacts. This approach to audio-driven animation is also similar to a recent VLOGGER AI model from Google Research.
https://www.tomsguide.com/ai/ai-ima...ew-ai-model-to-turn-images-into-talking-faces
VASA-1 takes in a single portrait photo and an audio file and converts it into a hyper realistic talking face video complete with lip sync, realistic facial features and head movement.
The model is currently only a research preview and not available for anyone outside of the Microsoft Research team to try, but the demo videos look impressive.
Similar lip sync and head movement technology is already available from Runway and Nvidia but this seems to be of a much higher quality and realism, reducing mouth artifacts. This approach to audio-driven animation is also similar to a recent VLOGGER AI model from Google Research.
https://www.tomsguide.com/ai/ai-ima...ew-ai-model-to-turn-images-into-talking-faces