Realistic Face Animation for Speech
Gregor A. Kalberer
Computer Vision Group ETH Zürich, Switzerland
kalberer@vision.ee.ethz.ch
Luc Van Gool
Computer Vision Group ETH Zürich, Switzerland
ESAT / VISICS, Kath. Univ. Leuven, Belgium
vangool@vision.ee.ethz.ch
Keywords:
face animation, speech, visemes, eigen space, realism
Abstract
Realistic face animation is especially hard as we are all experts in the perception and interpretation
of face dynamics. One approach is to simulate facial anatomy. Alternatively, animation can be based on
first observing the visible 3D dynamics, extracting the basic modes, and putting these together according
to the required performance. This is the strategy followed by the paper, which focuses on speech. The
approach follows a kind of bootstrap procedure. First, 3D shape statistics are learned from a talking
face with a relatively small number of markers. A 3D reconstruction is produced at temporal intervals
of 1/25 seconds. A topological mask of the lower half of the face is fitted to the motion. Principal
component analysis (PCA) of the mask shapes reduces the dimension of the mask shape space. The
result is two-fold. On the one hand, the face can be animated, in our case it can be made to speak new
sentences. On the other hand, face dynamics can be tracked in 3D without markers for performance
capture.
Introduction
Realistic face animation is a hard problem. Humans will typically focus on faces and are incredibly
good at spotting the slightest glitch in the animation. On the other hand, there is probably no shape more
important for animation than the human face. Several applications come immediately to mind, such as
games, special effects for the movies, avatars, virtual assistants for information kiosks, etc. This paper
focuses on the realistic animation of the mouth area for speech.
Face animation research dates back to the early 70’s. Since then, the level of sophistication has
increased dramatically. For example, the human face models used in Pixar’s Toy Story had several
thousan