Title: Altiverse Animation System
Authors: Gedalia Pasternak (contributions by Becca Abensur and Janak Parekh)
Created: 5/26/99
Last modified: November 1, 2000

To Do:

A long exegesis on the nature of the animation system, specifically going through how to do make a Hierarchal Fuzzy State Machine (HfuSM pronounced Hi-Fuzzy for simplicities sake) that would procedurally create the body language from basic primitives.

Isolate into compound states and primitives.

Animation Controllers

There are a number of different places where the events that control the gesture/animation system can come from:

Movement
When the user moves the animation system is required to keep track of the users physical state and issue the relevant events. Looking, walking forward, turning around, ducking, all cause either animation data to play or a targeted real-time IK solution to be calculated.

Replication
Another obvious place where gestures can come from is the replication of events from other users via the server. See the replication design document for further details on replication issues and methodology.

Items, Scripts and other triggers
Depending on the rules of a space, an item or script should be able to induce another avatar to change state. This seems most interesting in the case of emotional state. For example fear, this might be as simple as having an “if state = = fear do x” clause in the finite state machine, which could be activated either by someone clicking fear or getting an event from the event system that triggers a fearful attitude. These could eventually be used to establish more than just a facial change, the fear state could last x amount of time and change y variables during that time. By doing this we facilitate role-playing or force role-playing for appropriate environments.

Mirror
Shows users what the face of their character avatar, allowing the user to see as well as change their current emotional state. The Mirror actually consists of two components. One is a docked UI plug-in that shows the user’s face and current facial expression. The UI plug-in will also have a menu system that allows the user to change their current mood. The second component is an autonomous agent that automates some of the expressions of an avatar. If an individual specifies their character as being “happy”, this agent would then attempt to combine (over time) the basic primitives that would constitute a “happy” avatar. This both simplifies the user’s interface in expressing emotions and makes the real-life simulation much more elaborate.

In the first column, the first row represents the various moods as well as the ability to change them.
The second row represents specific gestures; these could represent facial expressions, body expressions or a combination of the two.
The last two rows are fairly arbitrary at the moment, the first icon representing an attack mode, while the second icon represents a personal teleporter.

Flow Chart for a few of the mood states

Obviously the logic for the change state machinery can be simpler for a user avatar then for an AI, since the mirror has a person sitting behind the state machine controlling it. We can rely on their memory and their attitude to control the avatar in a logical manner; this is based on the Eliza principle. However, we will still need a FuSM for the lower-level moods, i.e. happy, sad or fear, to create variety and automate tedium.

(GP Note: this is the autonomous agent part, and this is effectively what Juanita was studying in Snow Crash)

Information Driving the Mirror

There are a lot of hidden cues that an individual is not consciously aware of. Things like how often an individual blinks when at rest versus talking and the and at what point while talking is an individual more likely to blink. These will probably turn into a fairly fixed probabilistic formula for most avatars.

Lower Level State Machines

Altiverse will probably need to hire an expert on human facial expressions and emotional state for this section of the project. These variables will probably be used as weights in the HFuSM. A simulation that this would be appropriate for is flirting, which winds up begging the question: What do people do while flirting? Some of the states would be:

make eye contact
bat eyelashes
smile
toss hair
Fidget
Footsee (this would require a bit more of immersive hardware setup then we currently foresee).

When a user creates their avatars, we will be able to give the user the ability to customize some of the weights or completely turn them off.

Other controllers

Task based animations

Task based animations are another way of establishing a logical hierarchy for animations to play within. For the example picking up an object, this could be broken down into its component sub tasks/animations as:

Walk to proximity of objects.
Get into the right position to pick up object.
Pick up the object.

Motivate actually excels of this type of task based animation and planning, UnrealScript also has facilities for doing this but probably not as advanced. Task based animations are actually more necessary for the automation of AI’s that it is for a human driven actor. The actual animation is just the visual representation of the artificial intelligence planning and executing a task.

Targeted interactive animations

Background

The traditional paradigm for working in three dimensions with objects is very often seen in 3D shooters. The controls are not extremely intuitive but can be accessed by the keyboard and any traditional input device. Interacting with objects is usually accomplished by a running over them, the object is then either immediately used or in more advanced programs stored in inventory, which can be accessed later.

More recently we've seen Trespasser by DreamWorks Interactive, this was DWI's attempt to be much more elaborate in its object interaction. They failed miserably. (See postmortem in Gamasutra for a complete rundown). They attempted to do was make a visible avatar with a user-operated hand and a visible upper body, the character was female so that amounted to a pair of breasts obstructing the view of the ground. The hand’s IK was done very badly. It flailed about and was difficult to control even while standing still. While moving it acted as an inverse kinematic chain that attached to the player the motions of a player to propagated in a fashion that was aesthetically and game-playing wise unappealing. Emulating Trespasser's mistakes does not really strike me as being the best thing to do.

However in the case of picking up an object off the table or examining something on a bookshelf, what would be acceptable inside a FPS (running over the object) would not look right. Beyond that I do not know how evolved graphics engines are at this point. Would it be possible to have someone sit down or open a door? Obviously collision has been done but we will probably need something much more sophisticated for our purposes. Being able to take a target and play animation that would be based on the location of your avatar relative to the target. This animation would have the avatar interact with the target based upon that information.

Current solutions and such?

For actions such as grasping, sitting, picking up etc, etc. A product named Motivate made by Motion Factory seemed to promise a certain amount of these facilities. It couples adding basic information to objects that individuals will interact with, such as the shape of its handle, with a real-time inverse kinematics solution. In all honesty this is really Motivates domain. I have not really seen any other program to do this in general manner. The main reason for this is that this is relatively simple problem to solve with a robust skeletal system. Which motivate has, however the industry has just started moving to skeletal system and until that point the solution will be more akin to just allowing a weapon to be held and not much else. (Note: quite a bit has changed since this document was started, Unreal now has a skeletal animation system, Intel and RadGameTools are both selling skeletal system SDKs. Rune has a Softimage based one. Jeff Lander’s articles in game developer are also a good resource to look at. Additionally the 3DSMax opens ource website has an app called dejaview that is a stand alone 3d skeletal exporter)

Another issue with targeting is that you are typically interacting with someone, hence you would need a target for eye contact or smiling (Note: this is not necessarily true, for example in the case of a very drunk individual. For now however the simulation of extremely drunk individuals will be left out of the system). A person could wind up flirting with the wall, which in my opinion would be extremely funny but would definitely hurt the verisimilitude of the simulation. (Janak: Why prevent this? Hell, we can do it in real life. Hell, we do it in real life – in front of a glass mirror! To rehearse! In fact, we should have glass mirrors in the system in the “practice” environment so people become more familiarized with the mirror interface and its effects. This once again brings up the features of motivate or at least having a more robust animation system.

Random Thoughts

Basic Types of Animations

To steal Motivates categorization:

Locomotion skills – move from one place to another
Manipulation skill – manipulating an object via an abstract handle, I’ve generalized this even further into a target based animation, where the target might just be the general location where the action is directed towards. I should point out the in a situation such as climbing a ladder or a rope of the distinction between manipulation and locomotion starts getting blurry.
Basic skill - what we are calling a gesture

Structure of Avatar Skeleton

Aprox 22 muscles for face
Aprox 17 joints for body not counting fingers and toes

Blending Animations

The fact that we are going to be using a hierarchal skeletal system should facilitate our abilities to make more complex animations at runtime. We should be able to superposition data both in a numeric sense, by literally adding one set of contraction/angle vectors to another, or in a joint sense, allowing the lower level bones to play different data then the main skeleton. In addition, we have structured animation data, so it should be possible to store an animation such as “wave” not just as a full body animation but just an arm animation. This would allow not exactly superposition but the replacement of one animation’s arm data with another animation’s arm data at runtime. For example, we could combine two dissimilar animations achieving a combined “walk” with a “wave” as well as a “smile” at the same time. A few problems with this: what if the person isn’t walking - we don't really want them to be perfectly still except for the wave; also, by separating out the hierarchy and playing different animations on different parts of the skeleton, we increase the risk of animations where different parts of the avatar start interfering with each other and cause intra-avatar collisions, lowering the realism of the simulation.

Facial Animation Blending

Note some of this has been supplanted by the implemented facial animation system

The facial animation system will have to be able to overlay the visual representation of the phonemes with the emotional state of the avatar in any given time. This should be able to be done via straight superposition of the data as long as there are well-defined constraint limiters on the muscles. Moreover, by restricting to one facial expression and one phoneme expression we can allow for a wide variety of facial animations yet at the same time keep the system tractable. The intuition behind this is that mood and speech largely determines facial state.

Duration of gestures

We probably offer the ability to either slowdown or speed up the playback of a gesture by changing the rate of interpolation of the data, as well as the option to loop the gesture. Note I’m talking about a simple fixed gesture, one that the consists of a fixed length set of biped based key frames. Janak: Hmmm… are we assuming all gestures/moods have the same duration rules? Maybe we should just come up with two or three duration rules – i.e. fixed for X time, decay over Y time, and immediate (but the last even could be fixed for {VERY SMALL} time). Then we can enumerate sample gestures and moods and give examples that fit inside our duration structure. Scripting can simplify a lot of this.

Duration of moods

Personal preferences differing from psychologist researched FuSM

Another obvious problem with this is that people act very different when they are doing high-level complex behaviors. Generalizing these actions into one prototypical behavior will be problematic. Back to our high level task of flirting, (bit transfixed aren’t we), when the user selects “flirting” and a target, the avatar would automatically enter into a flirty mood. However, what one person describes as flirting may not be remotely what another describes as flirting, me versus the guys in the village. A way to solve this would be to customize the FuSM, so that the user could select which actions constitute as flirting or adjust the probabilities of different sub actions. Or just have a few prototypical FuSM a few for men a few for women. (GP Note: This is a problem that emerges in the lower-level motion cap (mocap) data, you don’t want things to look the same. I currently see no solution to this except for a real-time video analysis.0

Other stuff

Currently, animations that are responses to events are called gestures. But since I can think of all of the animations as being responses to events, I’m having a problem not lumping them together.

Animation primitives and states

Facial Primitives

Each of these has a muscle contraction vector that defines the state

Basic expressions
Smile Frown Fear Look angry Disgust mirrorDisgust openmouth	Left_Blink Right_Blink close_eyes Pout Raise Eyebrows Suspicious Surprise
Phenomes (based on lispinc)
bump cage church earth eat fave if new	oat ox Roar size though told wet
Head motions
turnleft turnright tiltright	tiltleft bendforward bendbackward
Compound Expressions
no skeptical_no laugh look_around	yawn whistle wahha roll_head

Moods Each of these has fuzzy state machine sitting behind it that references the facial primitives as well as body poses a.k.a. key-frame data.
Anger Boredom Confusion Contempt Disgust Afraid Friendl Happy Pleading	Sadness Sexy Skepticism Stoic Surprise Suspicion Tired Triumph Others?

Major question is: How many of these are discrete expressions vs. moods that are conveyed over time.

GNP Actually I meant in a much more philosophical sense so I stared at a mirror (quick cam) for while)

Janak :Well – maybe there’s a way we can combine them somewhat – discrete expressions can be states lasting a time of zero. The system would automatically interpret that as being a immediate wham-bam thing. We could also have a time of –1, which would mean expression presented only if the mouse is clicked. Something like that.

Engine Motions

-That we see in the standard library necessary for basic and extend motion and the like

Quake:

Aim/ targeting
Turn head - left/right/up/down
Run/Walk forwards/backwards
Side step - left/right
Turn
Jump: forward/back/up/right/left
Crouch
Pick up - right hand/left hand
Die

Mortal kombat:

Basic street fighting punch - left/right arms
Basic street fighting kick - left/right legs

Thief:

Throw
Climb (with knees bent, like stairs)
Climb (with hands and feet, rope)

General user interactivity:

Bend down (at knees or at back)
Sit: flat/bent surface
Looking at something
Looking at someone
Holding something in hand - left/right
Holding something out in front of you - one hand/ both hands
Lie down
Wave

Others?

Janak: Lots and lots of them – we have to get experts to do this – people who are familiar with human physiology.

Title: Altiverse Animation System Authors: Gedalia Pasternak (contributions by Becca Abensur and Janak Parekh) Created: 5/26/99 Last modified: November 1, 2000