Submission ID: papers_0435
When creating virtual actors that move in complex environments, it can be difficult to reconcile proper foot placement with the full range of body postures needed to convey emotion and attitude, without compromising quality of movement. Both posture and foot placement should ideally anticipate the higher level goals and intentions of the actor, and should vary according to what the actor's goals are at higher cognitive levels.
Recasting movement as a function of a path of values through time (rather than simply through space), allows a way to construct an approximate solution to the foot/body-posture problem which simplifies and unifies these two sub-problems.
The key insight is to look at all body parameters along this time-varying path, and then to consider the actor's support foot and lifted foot as though they are travelling along the same time-varying path as the rest of the body, only at variable rates. For the supporting foot planted on the ground, time stops and posture is frozen. For the lifted foot, time accelerates into the future and posture changes rapidly.
Such a framework allows all aspects of body posture to be constructed as efficiently computed procedural shaders which can be evaluated at a single point in time, in much the way that shaders used for procedural texture synthesis can be evaluated at a single point in space.
The result is not biomechanically perfect, but it is extraordinarily fast and flexible. We demonstrate that this approach allows the artist to exert tremendous control over character attitude and emotion via the use of simple and intuitive parameters. Hundreds of actors can be interactively directed in real-time on consumer level hardware, with each displaying emotively convincing body language and proper foot placement. These actors can be made to precisely follow arbitrary paths, including uneven terrain and stairs, with equal facility in all walk styles.
Even when more biophysically correct actors are required, this technique can serve as a highly controllable and expressive instrument to create approximate solutions that feed into more computationally heavy constraint optimization and energy minimization techniques.
The first computer graphic system to do 3D keyframe based human joint animation was a pure kinematic system introduced by [Stern 1978], in which hand-made static poses were interpolated by smooth splines. Since then, there has been much work on automating the computer simulation of 3D walking characters. This work has variously focused on kinematics and dynamics.
The first work to do high level parameterized kinematic based automation of walking was that of [Zeltzer 1982]. Hierarchical concurent state machines were used to control the gait of a synthetic skeleton. A pre-defined key posture was associated with each FSM state. Transitions between states produced linear interpolations between key-postures, producing joint angles that drove the animation.
Bruderlin and Calvert  provide non-slip foot placement by making the contact foot the root of the kinematic chain, and treat it as an inverted pendulum. They later combined this approach with parameterizable styles of walking [Bruderlin 1993].
Boulic  used forward kinematics followed by an inverse kinematics post-processing step to modify foot position when feet penetrated the ground, and then later modified this approach to maintain proper balance centering of the body over the feet .
Ko and Badler  did post-processing inverse dynamics on kinematic walking, adjusting the results of forward kinematics to ensure that movement embodied appropriate forces. Spacetime constraints on the body center were introduced by [Witkin 1988] to compute these correcting dynamics, specified as the minimization of a cost function, over the entire course of an animation, rather than sequentially in time. Cohen  refined this approach to allow space-time windows to be specified, so that these dynamic adjustments could be applied interactively. Gleicher  extended space-time constraints to handling of constraints on all kinematic trajectories of the walking figure.
The computational framework
The computational framework of the technique is based on space-time paths.
Let space-time path P(time) → V be a function that maps time to a vector of parameter values V. In practice, V will contain values for the root position x,y, and z of the actor, as well as values for several dozen postural parameters such as how far apart the feet are, the degree of bending at the knees, how far back the shoulders are set, and how much the hips sway from side to side when walking.
We define an actor A(P,M,T,time) → J as a function which takes as input:
and produces as its output a set of joint matrices J.
In practice, the space-time path P(time) is implemented as a set of key-frames, interpolated by cubic curves. The actor converts these key frame values to joint values. For any given time, the actor examines only a small window of successive key frames.
For a human figure, the actor function computes joint values by evaluating six modules in the following order, to accommodate dependencies between computations:
Note that the hands are visited in two different modules. Reaching and grasping are computed before the pelvis and spine, whereas hand position for arm swinging is computed after.
In order for the body to be emotionally expressive, the spine needs to be sufficiently flexible. In our articulation model the spine is represented by the fifth and first lumbar joints (the lower back), the sixth thoracic joint (the chest cage) and the fifth and first cervical joints (the base and top of the neck).
After the pelvis, torso and feet have been animated, the pelvis is connected to the feet, and the torso is connected to the hands, via a simple closed-form two link inverse kinematics computation which positions the knees and elbows. Both knee and elbow turn-out default to values that appear natural. Deviations from these values are then parametrically adjustable by the artist.
All of the modules receive a value of time as their input except for the module that computes the foot positions and ankle joints. The left foot is given an altered time input value of:
time + saw(time/pace)
whereas the right foot is given an altered time input value of:
time + saw(time/pace + ½)
where pace is in units of walk cycles per second, and saw(t) is defined by:
saw(t) = if tmod 1<½ then tmod 1 - ¼ else ¾ - tmod 1
The effect of the saw displacement is to jog time back and forth: In the first half of the walk cycle the left foot travels forward in time one full cycle, while the right foot remains fixed at the start of the cycle. In the second half of the walk cycle, the right foot travels forward in time one full cycle, while the left foot remains fixed at the end of the cycle, as in the accompanying figure.
The one exception to the above is the continuation heel-to-toe foot roll during while a support foot is in contact with the floor. Even though the values of all body parameters are fixed in time while the support foot is in contact with the floor (because time is not changing), we nonetheless continue the rotation of the foot that produces the rolling floor contact for first the heel, then the ball of the foot, and finally the toe. The bending at the metatarsal during the last stages of floor contact is handled as a simple mechanical constraint. When the actor is walking backwards, we proceed similarly, but reverse the direction of roll: the toe contacts the floor first, and then the foot rolls onto the heel.
Because the entire system is a function of a single input value of time, any given setting of target points T and input parameters V at that time will produce a unique and well defined pose for the collection of joints J at that time.
This framework confers several advantages. Because reasonable foot placement is guaranteed, the artist is free to implement any desired body language shaders. Expressive movements can be blended in the artist's preferred parameter space, providing significant freedom to create controls which are intuitive and easy to work with.
An illustrated example
In the accompanying sequence a walking actor has been directed first to crouch backwards, and then to turn sideways with feet spread, before continuing on his way. To create such a behavior, the artist simply specifies a keyframe (corresponding to the fourth image below) in which the actor's pelvis should be lowered and rotated by 1800, and then a keyframe (corresponding to the eighth image below) in which the actor's feet should be spread and the pelvis should be rotated 900. In the course of executing this transition, the actor produces a trajectory for the feet and for all body joints (seen in images two and three, and images five through seven) that results in reasonable foot placement and no foot sliding. This will be true for any walking pace and for all settings of other body parameter values.
Terrain following and going up or down steps
Reasonable foot placement when walking along uneven terrain is automatically guaranteed by footstep temporal displacement. The pelvis and the two feet exist in three different timeframes. Within its own respective timeframe, each of these three body parts simply maintains the proper distance above whatever is the current terrain height at that moment. The result is a walk in which the pelvis maintains a proper distance from the ground, and the feet land at ground level of the varying terrain at each foot-fall. As in [Rose 1998] we blend in adverbial parameter settings to convey change in stance when climbing or descending. These settings are described below in more detail. In contrast, walking up or down stairs requires some intervention, so that the actor will walk upon the stair steps, rather than upon the height discontinuities that separate them.
In order to go up and down steps, the key modification to the basic algorithm is to allow the pace of the actor to vary. In the following images the actor is portraying a slithering creature skulking down stairs. The height of the actor's pelvis above the floor is computed by blurring the terrain height function (in this case, via a Mip-Map [WILLIAMS 1983]). The higher the pelvis, the "blurrier" is the effective height. This results in smooth movement of the pelvis over stair steps and other discontinuities in floor height.
The floor height at each foot is only evaluated at places where that foot touches down; that is, where the phase of the walk cycle is a multiple of ½.
We guarantee that the actor's feet land on the steps, not on the rise between steps, by shifting the phase slightly in the keyframes. Because we are two keyframes ahead, we have enough time to modify footstep phase before the edge of a stair-step [Chung99]. The algorithm is illustrated in the accompanying diagram. The keyframes themselves (shown in the diagram as circular rings) are not moved, but the walk-cycle phase at each keyframe is adjusted slightly, so that each of the places where the left foot touches down (where phase is a multiple of 1.0), and where the right foot touches down (where phase is an odd multiple of 0.5) is shifted to the center of a stair step. This process is independent of style parameters: the actor can go up and down stairs equally well in any walking style.
In the context of producing a linear animation, it is important to be able to effect such phase shifts locally, without changing the phase of the walk cycle at all later points in the animation. In this way the animator can be sure that foot placement does not change unexpectedly in later scenes due to edits in earlier scenes. To this end, the system always takes care to localize any imposed phase shifts, such as those caused by negotiating stairs; in which case the original phase is restored after a small number of additional keyframes.
This approach to walking movement facilititates the creation of staggered walks or gaits, by providing control over the amount of temporal phase displacement between the two feet. For example, a character can transition from a walk to a broadjump by bringing the temporal phase difference between the two feet (normally half a cycle) gradually to zero. The example below shows a transition first into, and then out of, a broadjump. By setting the phase displacement to intermediate values, the actor can be made to perform staggered or skipping gaits.
Similarly, the "weight" of the step, as defined by the percentage of time that each foot remains in contact with the ground, can be modulated by sinusoidally modulating each foot's temporal phase displacement throughout the period of the walk cycle. In this way, footsteps can be made "heavy" (more time spent in fixed support, as would be be found if an actor is wearing lead shoes), or "light" (both feet off the ground simultaneously, as would occur while running).
Variation in body skeleton
Because the behavior of body language parameters is not constrained by any special requirement to accomodate foot placement, we have found it easy in practice to implement the algorithms that vary these parameters in a way that is insensitive to the lengths of the various limbs. An actor can have very long or short legs, unequal lengths between the femur and tibia, long or short spine, or very wide or narrow pelvis. The knees can just as easily bend backwards, which is useful for actors playing avian characters. Limb length can change during an animation, which can be useful in certain exaggerated cartoon contexts. This allows a result similar to the motion retargeting of [Gleicher 1998], at smaller computational cost. Below are several examples of highly divergent actors performing the same movement.
The foot animation system is part of a larger framework. In this framework, animation shaders add attitude to the entire body posture and motion.
The use of intuitive body attitude parameters facilitates the simple creation of "idioms" - combinations of static and rhythmic posture that create a characteristic higher level mood or attitude. For example, one parameter sinusoidally varies the pelvis height at twice the frequency of the walk cycle, so that the pelvis rises each time a support foot strikes the ground. This creates a "bouncy" or carefree gait. If this same parameter is set to a negative value, then the pelvis descends each time a support foot strikes the ground, which produces a jaunty, "bopping" style of walk. In this latter case, the impression of bopping is greatly enhanced by making the footsteps "heavy," as described above, so that the actor appears to be deliberately placing each footstep. A negative "bounce" together with a heavy step can be merged into a single combination "bop" parameter, in which these two degrees of freedom are varied together.
Similarly, it is easy to create parameter combinations that suggest such attitudes as "fearful", "sexy", "careful", or "slithering". The system allows such combinations to be saved and then blended with various weights during the course of an animation or simulation, to suggest subtle momentary moods and attitudes. This is similar in spirit to the interpolation approaches described by [Perlin 1995; Wiley 1997] and [Rose 1998]. One difference is that our framework automatically gives proper foot placement for the interpolated movement.
For example, we used the system to define a sultry walk by setting a few parameters: bring the elbows in, feet together, heels up, pelvis and shoulders forward, increase the arm swing and vertical pelvis bounce, and increase both the sway and axial rotation of hips and shoulders. The resulting walk automatically finds proper footfall positions for any path the actor is directed to follow. In practice, fractional quantities of body language shadings like "sexiness" can be mixed into an actor momentarily, as when a character is acting in a seductive manner in order to catch the eye of another character.
In the sequence below, the actor transitions between several idiomatic behaviors in a the span of a few steps: first from a shuffling old man to a feral Gollum-like creature; then to a sultry walk. To create this sequence it was necessary only to specify the appropriate idiom at successive key frames. The original creation of each of the three component idioms required only a few minutes of parameter adjustment.
Types of posture variation
Posture parameters have been selected to provide intuitive isolated controls over those attributes of posture or body language that a trained actor or dancer might wish to independently modulate. A parameter can be either static or rhythmic. Static parameters include things like spacing between the feet, knee bend, pointing of toes, axial twist of the pelvis, and how far forward or back the pelvis is thrust. Rhythmic parameters include amplitude of rolling shoulders or hips, the vertical bounce and thrust of the pelvis during the walk cycle, and the amplitude of arm swing.
Blending in body language to express effort
As in [Rose 1998], we mix in higher level parameter settings as adverbs to convey a temporary change in an actor's attitude when performing a difficult or interesting task. For example, when the actor is climbing or descending stairs, we fractionally blend in the idiomatic parameter settings for more careful and deliberate balancing (arms slightly raised, elbows out, feet apart and turned out, pelvis slightly lowered, shoulders hunched and forward, spine curved forward, heavy steps, slightly wandering gaze, slight wandering of path, slow pace, slightly staggered foot rhythm, high shakiness, very little movement in the hips). The resulting performance is surprisingly compelling.
Production advantages: frame-independent evaluation and path following
Like any system that is based on a closed-form kinematic solution, the complete pose of the actor is a function of time which can be evaluated efficiently at any single input value. This means that there is no dependency on forward differencing, or variation in behavior due to variable frame rate. Any individual frame in an animation can be computed and examined independently. Body posture and foot placement are completely determined and repeatable; a highly desireable property in a production animation setting.
In addition, the path following is precise: The actor is guaranteed to follow the exact path laid out by the specified keyframes. This is extremely important in situations containing tight physical constraints such as narrow doors and hallways or tightly packed crowds. It is also essential for the use of virtual actors in implementing dance choreography, as well as in dramatic scenes in which the actor needs to accurately hit its mark so that it will be properly framed in the camera shot. In effect, the actor is constrained to function as pure cerebellum - an implementer of external cognitive choices. All actual cerebral cognitive decisions are directed by the animator or A.I. engine, and conveyed to the actor within the parameter settings of successive keyframes.
Linear animation versus real-time simulation
There are two distinct types of environments in which this approach can be used: a) the design of linear animation with a fixed set of key frames, and b) interactive simulations, in which streams of key frames are continually provided by an A.I. program.
The animation system itself is identical in both environments. The system is always tasked with converting data from a small time window of keyframes into joint matrices. The difference lies entirely in the source of the keyframe data. For hand-tuned linear animation, parameters within a fixed set of keyframes are altered by a skilled animator/director. For interactive simulations, a moving window of temporary keyframes is generated by programs written by the AI designer of a game or simulation engine.
Interactive crowd simulation
We can compute the AI for a crowd simulation without any knowledge of the body animation system. Because the positions of the actors feet is well-determined, it is possible to articulate large numbers of actors without collisions and with repeatable results. We can also guarantee that each walking actor will precisely follow the output position of the corresponding agent in the AI that directs the crowd.
In the example below, we ran an interactive scene of 1000 dancing Imps. The key frame parameters for each Imp are randomized, so that it dances with a unique personality and character. In the first image the camera is flying through the crowd. As the camera approaches, the individual Imps in the camera's path turn toward us and dance out of the way, creating an aisle for the camera to travel through. In the second image the crowd has made space for a lone dancer at center stage.
All of the behavior for the imp scene is computed via a separate 2D crowd simulation. Below, side by side, are a visualization of just the AI for the crowd simulation, and then of the actors who are taking their position cues from this AI. For each actor, the individual body language, limb positions and foot placement are computed in our joint synthesis system.
Because the system provides a closed-form mapping from parameters and target points to joint matrices, computation speed does not depend upon input values. We have successfully animated crowds consisting of hundreds of interactive responsive actors simultaneously, with each actor expressing a unique set of body language parameters. In benchmark tests on a 1.6 Ghz Pentium-based notebook PC, the system was able to evaluate 14,000 complete body poses per second. When displayed with OpenGL rendering of each actor, we measure animation rates of 1000 actors at 9 frames per second, and 500 actors at 18 frames per second.
Using together with other work
This work would be highly complementary to a run-time animation engine that blends and overlays movement parameters through time with various degrees of "transparency", such as the Improv System described by [Perlin 1996].
We are also seeing it combined with a post-process refinement or energy minimization algorithm, such as those of [Ko 1994; Cohen 1992; Gleicher 1997] and their successors, as well as adjustment by model-based terrain-following foot-step placement algorithms such as that of [Chung 1999]. However, such post-processing would compromise the extreme speed of the current method. We envision the method as presented here to be used for interactive placement and choreography of figures, with complementary post-processing techniques being applied off-line in final production.
We have shown a simple way to unify foot placement with kinematic control of an ambulatory human figure. The advantage of the approach is that the unification of foot placement and body pose allows artists to focus on creating good artistic shaders for expressing body language. The result will always produce foot placement that is well matched to body posture. This approach also allows all aspects of body posture to be computed cheaply and independently for any point in time. As we have demonstrated, this approach is well suited to real-time simulations containing hundreds of actors, in which each actor possesses expressive and responsive body posture customized for its identity, personality and situation.
These techniques can be very effectively combined with behavioral level of detail, since beyond the first several hundred actors, members of a crowd who appear in a wide shot are effectively just blobs. In this way the approach described in this paper could support real-time responsive emotively expressive behavior for crowds of essentially unlimited size, on commodity-level computers.
In future work we will be looking at enforcing physical realism through constraints on parameter settings. One potential deficiency of the current technique is that it gives its user complete freedom to set parameters in ways that are not physically or biomechanically plausible. This is acceptable for a purely emotional or "cartoon" level of expressiveness, but less than acceptable for applications that require the virtual actor to convey physical realism. We are investigating an approach to imposing such constraints that treats the existing system as a black box. Energy costs of various transitions can be computed off-line, and used to train a corpus-based method such as Convolution Networks that recognize time-series patterns [LeCun 1995], to identify desired correlations between parameter values in successive keyframes. For example, just before an actor leaps high in the air, the previous keyframe can be constrained to brace for the jump by crouching. Adding such pre-filters would not compromise the computational speed of the approach, and yet could make it more relevant for applications that require greater physical realism.
BOULIC R., MAGNETAT-THALMANN N., and MAGNETAT-THALMANN D., 1990. A global human walking model with real-time kinematic personification. Visual Computer, 6(6):344-358.
BOULIC R., MAS R., and MAGNETAT-THALMANN D., 1996. A robust approach for the center of mass position control with inverse kinetics. Journal of Computer and Graphics, 20(5).
BRUDERLIN A., and CALVERT, T., 1989. Goal directed, dynamic animation of human walking, In Computer Graphics (Proceedings of ACM SIGGRAPH 89), 23, 4, ACM.
BRUDERLIN A., and CALVERT, T., 1993. Interactive Animation of Personalized Human Locomotion. In Proc. of Graphics Interface 93, pages 17-23.
BRUDERLIN A., and WILLIAMS, L., 1995. Motion Signal Processing, In Proceedings of ACM SIGGRAPH 1995 ACM Press / ACM SIGGRAPH, New York. Computer Graphics Proceedings, Annual Conference Series, ACM, 97-104.
CHUNG, S., and HAHN J., 1999. Animation of Human Walking in Virtual Environments. In Proc. Int. Conf. on Computer Animation, pages 4-15. IEEE.
COHEN, M., 1992. Interactive spacedtime control for animation. In Computer Graphics (Proceedings of ACM SIGGRAPH 92),, 26, 4, ACM, 293-302.
GLEICHER, M., 1997. Motion editing with spacetme constraints. In Proc. of Symposium on Interactive 3D Graphics.
M., 1998. Retargetting
Motion to New Characters, In Proceedings of ACM SIGGRAPH 1998 ACM Press
/ ACM SIGGRAPH,
KO, H. and BADLER, N., 1996. Animating Human Locomotion in Real-time using Invers Dynamics, Balance and Comfort Control. IEEE Computer Graphics and Applications, 16(2):50-59.
LECUN, Y. and BENGIO, 1995. Convolutional Networks for Images, Speech, and Time-Series, in The Handbook of Brain Theory and Neural Networks, (M. A. Arbib, ed.).
PERLIN, K., and GOLDBERG, A., 1996. Improv: A System for Scripting Interactive Actors in Virtual Worlds, In Proceedings of ACM SIGGRAPH 1996 ACM Press / ACM SIGGRAPH, New York. Computer Graphics Proceedings, Annual Conference Series, ACM.
PERLIN, K., 1995. Real Time Responsive Animation with Personality, IEEE Transactions on Visualization and Computer Graphics; Vol 1 No. 1.
ROSE, C. COHEN, M, and BODENHEIMER, B., 1998. Verbs and Adverbs: Multidimensional Motion Interpolation. IEEE Computer Graphics and Applicatoins, 18(5):32-40.
STERN, G., 1978.
SUN, H., and METAXAS, D.,
2001. Automating gait generation. In
Proceedings of ACM SIGGRAPH 2001 ACM Press / ACM SIGGRAPH,
WILEY, D. and HAHN, J., 1997. Interpolation Synthesis of Articulated Figure Motion, IEEE Computer Graphics and Applications, 17(6):39-45.
WILLIAMS, L., 1983. Pyramidal Parametrics. In Computer Graphics (Proceedings of ACM SIGGRAPH 83), 17, 4, ACM.
WITKIN, A. and KASS, M., 1988. In Computer Graphics (Proceedings of ACM SIGGRAPH 88), 22, 4, ACM.
ZELTZER, D., 1982. Motor Control Techniques for Figure Animation. IEEE Computer Graphics and Applications, 2(9):53-59.