Submission ID: papers_0435
When creating virtual
actors that move in complex environments, it can be difficult to reconcile
proper foot placement with the full range of body postures needed to convey
emotion and attitude, without compromising quality of movement. Both posture
and foot placement should ideally anticipate the higher level goals and
intentions of the actor, and should vary according to what the actor's goals
are at higher cognitive levels.
Recasting movement as a
function of a path of values through time (rather than simply through space),
allows a way to construct an approximate solution to the foot/body-posture
problem which simplifies and unifies these two sub-problems.
The key insight is to look
at all body parameters along this time-varying path, and then to
consider the actor's support foot and lifted foot as though they are travelling along the same time-varying path as the rest of
the body, only at variable rates. For the supporting foot planted on the
ground, time stops and posture is frozen. For the lifted foot, time accelerates
into the future and posture changes rapidly.
Such a framework allows all
aspects of body posture to be constructed as efficiently computed procedural shaders which can be evaluated at a single point in time,
in much the way that shaders used for procedural
texture synthesis can be evaluated at a single point in space.
The result is not biomechanically perfect, but it is extraordinarily fast and
flexible. We demonstrate that this approach allows the artist to exert
tremendous control over character attitude and emotion via the use of simple
and intuitive parameters. Hundreds of actors can be interactively directed in
real-time on consumer level hardware, with each displaying emotively convincing
body language and proper foot placement. These actors can be made to precisely
follow arbitrary paths, including uneven terrain and stairs, with equal
facility in all walk styles.
Even when more
biophysically correct actors are required, this technique can serve as a highly
controllable and expressive instrument to create approximate solutions that
feed into more computationally heavy constraint optimization and energy
minimization techniques.
Related work:
The first computer graphic
system to do 3D keyframe based human joint animation
was a pure kinematic system introduced by [Stern
1978], in which hand-made static poses were interpolated by smooth splines. Since then, there has been much work on automating
the computer simulation of 3D walking characters. This work has variously
focused on kinematics and dynamics.
The first work to do high
level parameterized kinematic
based automation of walking was that of [Zeltzer
1982]. Hierarchical concurent state machines were
used to control the gait of a synthetic skeleton. A pre-defined key posture was
associated with each FSM state. Transitions between states produced linear
interpolations between key-postures, producing joint angles that drove the
animation.
Bruderlin and Calvert [1989] provide non-slip
foot placement by making the contact foot the root of the kinematic
chain, and treat it as an inverted pendulum. They later combined this approach
with parameterizable styles of walking [Bruderlin 1993].
Boulic [1990] used forward kinematics
followed by an inverse kinematics post-processing step to modify foot position
when feet penetrated the ground, and then later modified this approach to
maintain proper balance centering of the body over the feet [1996].
Ko and Badler [1996] did
post-processing inverse dynamics on kinematic
walking, adjusting the results of forward kinematics to ensure that movement
embodied appropriate forces. Spacetime constraints on
the body center were introduced by [Witkin 1988] to
compute these correcting dynamics, specified as the minimization of a cost
function, over the entire course of an animation, rather than sequentially in
time. Cohen [1992] refined this approach to allow space-time windows to be
specified, so that these dynamic adjustments could be applied interactively. Gleicher [1997] extended space-time constraints to handling
of constraints on all kinematic trajectories of the
walking figure.
The computational
framework
The computational framework
of the technique is based on space-time paths.
|
Let space-time path P(time) → V be a function that
maps time to a vector of parameter values V. In
practice, V will contain values for the root position x,y, and z of the actor, as well as values for several
dozen postural parameters such as how far apart the feet are, the degree of
bending at the knees, how far back the shoulders are set, and how much the
hips sway from side to side when walking. We define
an actor A(P,M,T,time) → J
as a function which takes as input:
and produces as its
output a set of joint matrices J. |
|
In practice, the space-time
path P(time) is implemented as a set of
key-frames, interpolated by cubic curves. The actor converts these key frame
values to joint values. For any given time, the actor examines only a
small window of successive key frames.
For a human figure, the
actor function computes joint values by evaluating six modules in the following
order, to accommodate dependencies between computations:
Note that the hands are
visited in two different modules. Reaching and grasping are computed before
the pelvis and spine, whereas hand position for arm swinging is computed after.
In order for the body to be
emotionally expressive, the spine needs to be sufficiently flexible. In our
articulation model the spine is represented by the fifth and first lumbar
joints (the lower back), the sixth thoracic joint (the chest cage) and the
fifth and first cervical joints (the base and top of the neck).
After the pelvis, torso and
feet have been animated, the pelvis is connected to the feet, and the torso is
connected to the hands, via a simple closed-form two link inverse kinematics
computation which positions the knees and elbows. Both knee and elbow turn-out
default to values that appear natural. Deviations from these values are then
parametrically adjustable by the artist.
|
All of
the modules receive a value of time as their input except for the
module that computes the foot positions and ankle joints. The left foot is
given an altered time input value of: time +
saw(time/pace) whereas
the right foot is given an altered time input value of: time +
saw(time/pace + ½) where pace
is in units of walk cycles per second, and saw(t) is defined by: saw(t) = if tmod 1<½ then tmod 1 - ¼ else ¾
- tmod 1 |
|
|
|
The
effect of the saw displacement is to jog time back and forth: In the
first half of the walk cycle the left foot travels forward in time one full
cycle, while the right foot remains fixed at the start of the cycle. In the
second half of the walk cycle, the right foot travels forward in time one
full cycle, while the left foot remains fixed at the end of the cycle, as in
the accompanying figure. The one exception to the
above is the continuation heel-to-toe foot roll
during while a support foot is in contact with the floor. Even though the
values of all body parameters are fixed in time while the support foot is in
contact with the floor (because time is not changing), we nonetheless
continue the rotation of the foot that produces the rolling floor contact for
first the heel, then the ball of the foot, and finally the toe. The bending
at the metatarsal during the last stages of floor contact is handled as a
simple mechanical constraint. When the actor is walking backwards, we proceed
similarly, but reverse the direction of roll: the toe contacts the floor
first, and then the foot rolls onto the heel. |
Because the entire system
is a function of a single input value of time, any given setting of
target points T and input parameters V at that time
will produce a unique and well defined pose for the collection of joints J
at that time.
This framework confers several
advantages. Because reasonable foot placement is guaranteed, the artist is free
to implement any desired body language shaders.
Expressive movements can be blended in the artist's preferred parameter space,
providing significant freedom to create controls which are intuitive and easy
to work with.
An illustrated example
In the accompanying
sequence a walking actor has been directed first to crouch backwards, and then
to turn sideways with feet spread, before continuing on his way. To create such
a behavior, the artist simply specifies a keyframe
(corresponding to the fourth image below) in which the actor's pelvis should be
lowered and rotated by 1800, and then a keyframe
(corresponding to the eighth image below) in which the actor's feet should be spread
and the pelvis should be rotated 900. In the course of executing
this transition, the actor produces a trajectory for the feet and for all body
joints (seen in images two and three, and images five through seven) that
results in reasonable foot placement and no foot sliding. This will be true for
any walking pace and for all settings of other body parameter values.









Terrain following and
going up or down steps
Reasonable foot placement
when walking along uneven terrain is automatically guaranteed by footstep
temporal displacement. The pelvis and the two feet exist in three different
timeframes. Within its own respective timeframe, each of these three body parts
simply maintains the proper distance above whatever is the current terrain
height at that moment. The result is a walk in which the pelvis maintains a
proper distance from the ground, and the feet land at ground level of the
varying terrain at each foot-fall. As in [Rose 1998] we blend in adverbial
parameter settings to convey change in stance when climbing or descending.
These settings are described below in more detail. In contrast, walking up or
down stairs requires some intervention, so that the actor will walk upon the
stair steps, rather than upon the height discontinuities that separate them.
In order to go up and down
steps, the key modification to the basic algorithm is to allow the pace of the
actor to vary. In the following images the actor is portraying a slithering
creature skulking down stairs. The height of the actor's pelvis above the floor
is computed by blurring the terrain height function (in this case, via a Mip-Map [WILLIAMS 1983]). The higher the pelvis, the
"blurrier" is the effective height. This results in smooth movement
of the pelvis over stair steps and other discontinuities in floor height.
The floor height at each
foot is only evaluated at places where that foot touches down; that is, where
the phase of the walk cycle is a multiple of ½.


|
We
guarantee that the actor's feet land on the steps, not on the rise between
steps, by shifting the phase slightly in the keyframes.
Because we are two keyframes ahead, we have enough
time to modify footstep phase before the edge of a stair-step [Chung99]. The
algorithm is illustrated in the accompanying diagram. The keyframes
themselves (shown in the diagram as circular rings) are not moved, but the
walk-cycle phase at each keyframe is adjusted
slightly, so that each of the places where the left foot touches down (where
phase is a multiple of 1.0), and where the right foot touches down (where
phase is an odd multiple of 0.5) is shifted to the center of a stair step.
This process is independent of style parameters: the actor can go up and down
stairs equally well in any walking style. |
|
In the context of producing
a linear animation, it is important to be able to effect
such phase shifts locally, without changing the phase of the walk cycle at all
later points in the animation. In this way the animator can be sure that foot
placement does not change unexpectedly in later scenes due to edits in earlier
scenes. To this end, the system always takes care to localize any imposed phase
shifts, such as those caused by negotiating stairs; in which case the original
phase is restored after a small number of additional keyframes.
Staggered gaits
This
approach to walking movement facilititates the
creation of staggered walks or gaits, by providing control over the amount of
temporal phase displacement between the two feet. For example, a character can
transition from a walk to a broadjump by bringing the
temporal phase difference between the two feet (normally half a cycle) gradually
to zero. The example below shows a transition first into, and then out of, a broadjump. By setting the phase displacement to
intermediate values, the actor can be made to perform staggered or skipping
gaits.






Similarly,
the "weight" of the step, as defined by the percentage of time that
each foot remains in contact with the ground, can be modulated by sinusoidally modulating each foot's temporal phase
displacement throughout the period of the walk cycle. In this way, footsteps can be made
"heavy" (more time spent in fixed support, as would be be found if an actor is wearing lead shoes), or
"light" (both feet off the ground simultaneously, as would occur
while running).
Variation in body
skeleton
Because the behavior of
body language parameters is not constrained by any special requirement to accomodate foot placement, we have found it easy in
practice to implement the algorithms that vary these parameters in a way that
is insensitive to the lengths of the various limbs. An actor can have very long
or short legs, unequal lengths between the femur and tibia, long or short
spine, or very wide or narrow pelvis. The knees can just as easily bend
backwards, which is useful for actors playing avian characters. Limb length can
change during an animation, which can be useful in certain exaggerated cartoon
contexts. This allows a result similar to the motion retargeting of [Gleicher 1998], at smaller computational cost. Below are
several examples of highly divergent actors performing the same movement.




The foot animation system
is part of a larger framework. In this framework, animation shaders
add attitude to the entire body posture and motion.
Motion idioms
The use of intuitive body
attitude parameters facilitates the simple creation of "idioms" -
combinations of static and rhythmic posture that create a characteristic higher
level mood or attitude. For example, one parameter sinusoidally
varies the pelvis height at twice the frequency of the walk cycle, so that the
pelvis rises each time a support foot strikes the ground. This creates a
"bouncy" or carefree gait. If this same parameter is set to a
negative value, then the pelvis descends each time a support foot strikes the
ground, which produces a jaunty, "bopping" style of walk. In this
latter case, the impression of bopping is greatly enhanced by making the
footsteps "heavy," as described above, so that the actor appears to
be deliberately placing each footstep. A negative "bounce" together
with a heavy step can be merged into a single combination "bop"
parameter, in which these two degrees of freedom are varied together.
Similarly, it is easy to
create parameter combinations that suggest such attitudes as
"fearful", "sexy", "careful", or
"slithering". The system allows such combinations to be saved and
then blended with various weights during the course of an animation or
simulation, to suggest subtle momentary moods and attitudes. This is similar in
spirit to the interpolation approaches described by [Perlin
1995; Wiley 1997] and [Rose 1998]. One difference is that our framework
automatically gives proper foot placement for the interpolated movement.
For example, we used the
system to define a sultry walk by setting a few parameters: bring the elbows
in, feet together, heels up, pelvis and shoulders forward, increase the arm
swing and vertical pelvis bounce, and increase both the sway and axial rotation
of hips and shoulders. The resulting walk automatically finds proper footfall
positions for any path the actor is directed to follow. In practice, fractional
quantities of body language shadings like "sexiness" can be mixed
into an actor momentarily, as when a character is acting in a seductive manner
in order to catch the eye of another character.
In the sequence below, the
actor transitions between several idiomatic behaviors in a
the span of a few steps: first from a shuffling old man to a feral
Gollum-like creature; then to a sultry walk. To create this sequence it was
necessary only to specify the appropriate idiom at successive key frames. The
original creation of each of the three component idioms required only a few
minutes of parameter adjustment.








Types of
posture variation
Posture parameters have
been selected to provide intuitive isolated controls over those attributes of
posture or body language that a trained actor or dancer might wish to
independently modulate. A parameter can be either static or rhythmic.
Static parameters include things like spacing between the feet, knee bend,
pointing of toes, axial twist of the pelvis, and how far forward or back the
pelvis is thrust. Rhythmic parameters include amplitude of rolling shoulders or
hips, the vertical bounce and thrust of the pelvis during the walk cycle, and
the amplitude of arm swing.
Blending in body
language to express effort
As in [Rose 1998], we mix
in higher level parameter settings as adverbs to convey a temporary change in
an actor's attitude when performing a difficult or interesting task. For
example, when the actor is climbing or descending stairs, we fractionally blend
in the idiomatic parameter settings for more careful and deliberate balancing
(arms slightly raised, elbows out, feet apart and turned out, pelvis slightly
lowered, shoulders hunched and forward, spine curved forward, heavy steps,
slightly wandering gaze, slight wandering of path, slow pace, slightly
staggered foot rhythm, high shakiness, very little movement in the hips). The
resulting performance is surprisingly compelling.
Production advantages:
frame-independent evaluation and path following
Like any system that is
based on a closed-form kinematic solution, the
complete pose of the actor is a function of time which can be evaluated
efficiently at any single input value. This means that there is no dependency
on forward differencing, or variation in behavior due to variable frame rate.
Any individual frame in an animation can be computed and examined
independently. Body posture and foot placement are completely determined and
repeatable; a highly desireable property in a
production animation setting.
In addition, the path
following is precise: The actor is guaranteed to follow the exact path
laid out by the specified keyframes. This is
extremely important in situations containing tight physical constraints such as
narrow doors and hallways or tightly packed crowds. It is also essential for
the use of virtual actors in implementing dance choreography, as well as in
dramatic scenes in which the actor needs to accurately hit its mark so that it
will be properly framed in the camera shot. In effect, the actor is constrained
to function as pure cerebellum - an implementer of external cognitive choices.
All actual cerebral cognitive decisions are directed by the animator or A.I.
engine, and conveyed to the actor within the parameter settings of successive keyframes.
Linear animation versus
real-time simulation
There are two distinct
types of environments in which this approach can be used: a) the design of
linear animation with a fixed set of key frames, and b) interactive
simulations, in which streams of key frames are continually provided by an A.I.
program.
The animation system itself
is identical in both environments. The system is always tasked with converting
data from a small time window of keyframes into joint
matrices. The difference lies entirely in the source of the keyframe
data. For hand-tuned linear animation, parameters within a fixed set of keyframes are altered by a skilled animator/director. For
interactive simulations, a moving window of temporary keyframes
is generated by programs written by the AI designer of a game or simulation
engine.
Interactive crowd simulation
We can compute the AI for a
crowd simulation without any knowledge of the body animation system. Because
the positions of the actors feet is well-determined,
it is possible to articulate large numbers of actors without collisions and
with repeatable results. We can also guarantee that each walking actor will
precisely follow the output position of the corresponding agent in the AI that
directs the crowd.
In the example below, we
ran an interactive scene of 1000 dancing Imps. The key frame parameters for
each Imp are randomized, so that it dances with a unique personality and
character. In the first image the camera is flying through the crowd. As the
camera approaches, the individual Imps in the camera's path turn toward us and
dance out of the way, creating an aisle for the camera to travel through. In
the second image the crowd has made space for a lone dancer at center stage.


All of the behavior for the
imp scene is computed via a separate 2D crowd simulation. Below, side by side, are a visualization of just the AI for the crowd simulation,
and then of the actors who are taking their position cues from this AI. For
each actor, the individual body language, limb positions and foot placement are
computed in our joint synthesis system.


Performance:
Because the system provides
a closed-form mapping from parameters and target points to joint matrices,
computation speed does not depend upon input values. We have successfully
animated crowds consisting of hundreds of interactive responsive actors
simultaneously, with each actor expressing a unique set of body language
parameters. In benchmark tests on a 1.6 Ghz
Pentium-based notebook PC, the system was able to evaluate 14,000 complete body
poses per second. When displayed with OpenGL rendering of each actor, we
measure animation rates of 1000 actors at 9 frames per second, and 500 actors
at 18 frames per second.
Using together with
other work
This work would be highly
complementary to a run-time animation engine that blends and overlays movement
parameters through time with various degrees of "transparency", such
as the Improv System described by [Perlin 1996].
We are also seeing it
combined with a post-process refinement or energy minimization algorithm, such
as those of [Ko 1994; Cohen 1992; Gleicher
1997] and their successors, as well as adjustment by model-based
terrain-following foot-step placement algorithms such as that of [Chung 1999].
However, such post-processing would compromise the extreme speed of the current
method. We envision the method as presented here to be used for interactive
placement and choreography of figures, with complementary post-processing
techniques being applied off-line in final production.
Conclusions:
We have shown a simple way
to unify foot placement with kinematic control of an
ambulatory human figure. The advantage of the approach is that the unification
of foot placement and body pose allows artists to focus on creating good
artistic shaders for expressing body language. The
result will always produce foot placement that is well matched to body posture.
This approach also allows all aspects of body posture to be computed cheaply
and independently for any point in time. As we have demonstrated, this approach
is well suited to real-time simulations containing hundreds of actors, in which
each actor possesses expressive and responsive body posture customized for its
identity, personality and situation.
These techniques can be
very effectively combined with behavioral level of detail, since beyond the
first several hundred actors, members of a crowd who appear in a wide shot are
effectively just blobs. In this way the approach described in this paper could
support real-time responsive emotively expressive behavior for crowds of
essentially unlimited size, on commodity-level computers.
In future work we will be
looking at enforcing physical realism through constraints on parameter
settings. One potential deficiency of the current technique is that it gives
its user complete freedom to set parameters in ways that are not physically or biomechanically plausible. This is acceptable for a purely
emotional or "cartoon" level of expressiveness, but less than
acceptable for applications that require the virtual actor to convey physical
realism. We are investigating an approach to imposing such constraints that
treats the existing system as a black box. Energy costs of various transitions
can be computed off-line, and used to train a corpus-based method such as
Convolution Networks that recognize time-series patterns [LeCun
1995], to identify desired correlations between parameter values in successive keyframes. For example, just before an actor leaps high in
the air, the previous keyframe can be constrained to
brace for the jump by crouching. Adding such pre-filters would not compromise
the computational speed of the approach, and yet could make it more relevant
for applications that require greater physical realism.
BOULIC
R., MAGNETAT-THALMANN N., and MAGNETAT-THALMANN D., 1990. A global human
walking model with real-time kinematic
personification. Visual Computer, 6(6):344-358.
BOULIC
R., MAS R., and MAGNETAT-THALMANN D., 1996. A robust approach
for the center of mass position control with inverse kinetics. Journal of Computer and Graphics, 20(5).
BRUDERLIN
A., and CALVERT, T., 1989. Goal directed, dynamic animation of human walking, In Computer
Graphics (Proceedings of ACM SIGGRAPH 89), 23, 4, ACM.
BRUDERLIN
A., and CALVERT, T., 1993. Interactive Animation of Personalized
Human Locomotion. In Proc. of Graphics Interface 93, pages 17-23.
BRUDERLIN
A., and WILLIAMS, L., 1995. Motion Signal Processing, In Proceedings of ACM SIGGRAPH
1995 ACM Press / ACM SIGGRAPH, New York. Computer Graphics Proceedings,
Annual Conference Series, ACM, 97-104.
CHUNG,
S., and HAHN J., 1999. Animation of Human Walking in Virtual Environments.
In Proc. Int. Conf. on Computer Animation, pages 4-15. IEEE.
COHEN,
M., 1992. Interactive spacedtime control for
animation. In Computer Graphics (Proceedings of ACM SIGGRAPH 92),, 26, 4, ACM,
293-302.
GLEICHER,
M., 1997. Motion editing with spacetme constraints.
In Proc. of Symposium on Interactive 3D Graphics.
GLEICHER,
M., 1998. Retargetting
Motion to New Characters, In Proceedings of ACM SIGGRAPH 1998 ACM Press
/ ACM SIGGRAPH,
KO, H. and BADLER, N.,
1996. Animating Human Locomotion in Real-time using Invers Dynamics, Balance and Comfort Control. IEEE
Computer Graphics and Applications, 16(2):50-59.
LECUN, Y.
and BENGIO, 1995.
Convolutional Networks for Images, Speech, and
Time-Series, in The Handbook of Brain Theory
and Neural Networks, (M. A. Arbib, ed.).
MULTON,
PERLIN, K., and GOLDBERG,
A., 1996. Improv: A System for Scripting Interactive
Actors in Virtual Worlds, In Proceedings of ACM SIGGRAPH 1996 ACM Press
/ ACM SIGGRAPH, New York. Computer Graphics Proceedings, Annual Conference
Series, ACM.
PERLIN, K., 1995. Real Time Responsive Animation with Personality, IEEE
Transactions on Visualization and Computer Graphics; Vol
1 No. 1.
ROSE, C. COHEN, M, and BODENHEIMER, B.,
1998. Verbs and Adverbs: Multidimensional Motion Interpolation. IEEE
Computer Graphics and Applicatoins, 18(5):32-40.
STERN, G., 1978.
SUN, H., and METAXAS, D.,
2001. Automating gait generation. In
Proceedings of ACM SIGGRAPH 2001 ACM Press / ACM SIGGRAPH,
WILEY, D.
and HAHN, J., 1997. Interpolation Synthesis of Articulated Figure Motion, IEEE Computer
Graphics and Applications, 17(6):39-45.
WILLIAMS, L., 1983. Pyramidal Parametrics. In Computer Graphics (Proceedings of ACM SIGGRAPH 83), 17,
4, ACM.
WITKIN,
A. and KASS, M., 1988. In Computer Graphics (Proceedings of ACM SIGGRAPH
88), 22, 4, ACM.
ZELTZER, D., 1982. Motor Control Techniques for Figure Animation. IEEE
Computer Graphics and Applications, 2(9):53-59.