Issues for merging on-line programs and documents

Position paper for June 1-4, 1999 workshop
Ken Perlin, NYU


Thanks to the World Wide Web and Java, we now have the opportunity to disseminate information in a way that allows the viewer to actively manipulate facts and concepts. This can lead to ubiquitous "smart" documents that can allow many people to cooperatively share information. How far can we go in blurring this distinction between documents and programs? More specifically, what is the point at which programs and documents begin to merge seamlessly together? We give some examples of what has happened in this area and where the world may be going.


In an ideal world, there would be no hard distinction between on-line applets and their ilk, and documents. Both are methods that people use to communicate knowledge asynchronously across time and distance. Of course, to achieve a seamless fusion of program and document into a single species of cybernetic "information artifact" will require some iterations and false starts.

Ultimately, success in this unification will be dependent not mainly on technology (which constantly changes) but rather on the fixed properties of people - how we learn, how we communicate, and what sorts of interfaces will help us use our remarkable human powers of intuition and insight, rather than locking us in to preexisting modes of thought. For these reasons, it is a good idea to temper long-term goals and wish-lists with short term milestones and experiments.

In this position paper, we lay out some long term goals for how the paradigm of "on-line interactive program" and "on-line shared interactive document" might be merged. Then we lay out some short term milestone that seem to be reasonable steps along the way to those goals, given current knowledge. Finally, we show some concrete experiments that we have done to work toward those goals and milestones, and briefly analyse the results of these experiments.

Long Term Goals

We take as our long term goals a set of principles that arose out of recent discussions among faculty in New York University's Media Research Lab; Georgia Tech's Graphics, Visualization and Usability Center; and the State University of New York at Stonybrook's Center for Visual Computing. These discussions sought to unite a long term shared research agenda in the "future of documents." Two primary agreements emerged from the conversations: the importance of a "leveraging" approach, and identification of the most significant limitations of current documents. From these agreements, five principles for future research were articulated.

Leveraging. Our target applications for future documents are high-information activities (e.g business, education, health care, research). These activities require that we make full use of human communicative power, and that we optimally harness the knowledge of individuals who can provide complex information. In order to meet these goals, new information tools must leverage expert knowledge to help non-expert users create powerful program-documents. The effectiveness of this leveraging approach is today demonstrated by point-and-shoot cameras and desktop publishing programs. In each case expert knowledge becomes a part of an information tool, so that it can be used invisibly by a user who may know nothing about the underlying complications - such as focal lengths, ligatures, or real-time responsive animations.

Limitations to Address. The new generation of tools, as well as the program-documents created through them, must address a number of deep problems with the communication tools currently at our disposal. For instance, while humans perceive and create information using multiple sensory and expressive modalities, current tools for creating and disseminating information often make use of only one or two modalities (e.g., only text or speech). At this point, electronic communication tools that closely integrate multiple, simultaneous media forms (e.g. text, speech, animation, and gesture) are the special province of experts, not average citizens. Further, even the most advanced communication forms available lack any ability to adapt to the user, and do not facilitate the active role that cognitive scientists and educators know to be necessary for effective communication and learning. Finally, today's tools are often unavailable when and where they are needed, due to lack of mobility and/or high cost.

Research Principles:

  1. Interface. We must support the variety and depth of human input and output modalities. Research should encompass text input, speech analysis, position tracking, gesture recognition, graphical interaction, computational perception, autostereoscopic display, haptic feedback, situated audio, graphic display, and animation.
  2. Analysis. In order to be responsive to human users, we must be able to analyze human actions accurately, using appropriate models of human behavior, perception, and learning. This allows the user to operate on a higher semantic level, and provides an implicit collaboration between the nonexpert user and the expert author. This information can then be used to guide software agents that understand and follow users' high level directives, so that the user can work at a simple and intuitive level. Effective modeling of the user requires two different thrusts: modeling human perception and modeling human cognition.
  3. Synthesis. Systems must effectively synthesize active and engaging interface agents, ones that assist both authors and receivers in the construction of understanding. The embodiment of these communication agents may be abstract, schematic, or anthropomorphic. Given good models of user intentions, these interfaces can support the modeling behavior necessary for successful authoring by non-experts.
  4. Delivery. Synthesized responses must be available wherever they are needed, requiring techniques for delivering these results across thin wires to inexpensive mobile devices.
  5. Ongoing Assessment. We must set research goals by understanding the contexts into which program-documents are deployed, recognizing the forms that promote understanding and learning, and continually performing user tests of the work that researchers produce.
A shorter-term milestone is allowing authors to make assumptions about users (without full user-modelling happening in the background). This can be very effective (as current authored documents show) and this is the ground in which the MRL is currently working -- as well as in the ground of creating tools for authors to create and shape the reactions and rules that might one day be driven by user models and response.

Short Term Milestones

Short term milestones on which we are currently working include: The experiments described in these sections represent some of our attempts to merge the "on-line interactive program" and "on-line document" paradigms. These experiments are primarily in the area of human animation and figure modeling, as that is the area in which our Laboratory has the expertise required to most effectively pursue related user-interface issues.

All of the examples shown are Java 1.01 applets. This lowest-common-denominator platform was chosen because it allows the broadest available flexibility in making on-line experiments simultaneously available to large numbers of users.

Animation modeling

The figures above are from an interactive applet designed to investigate the question "how simple can we make it for people to design customized animated characters?" The "character" on the left actually contains seven interactive zones. Dragging the mouse within any of these zones causes certain regions of the body to change shape so as to follow the mouse movement. Certain parts are coupled together, so that dragging one always causes the other to change as well.

We can view this as a very simple example of program-document leveraging on multiple layers. The base substrate provides the potential for deformable shapes and for deformations to drive other parameters (including other deformations, and perhaps modes of action, or even personality traits). The example body is simple enough to be designed or customized by a user with nearly any level of drawing expertise. Expert-created rules can then be applied, which might be stored in a series of humanoid rule sets (for deformations and correspondences) between which the individual author/user can select and blend.

Questions for the workshop:

Animation design

In the case of animation design, layers of expert-created rules can provide a non-expert user with individual movements to include or overall stylistic guidance. For each character, this work can happen on the level of direct manipulation of the character and limited parameters. The above figure was created using a prototype applet that allows an animator to work on a higher level of "attitude" control, somewhat as though the graphical puppet were a human actor being given motivational instructions.

Yet for more precise or unexpected control, or for the purposes of a content-creating expert, a more detailed representation may be required that is both the document explaining the layered actions, but also the interface through which these actions may be manipulated.

Questions for the workshop:

Cognitive modeling

In our improvisational animation work, we have shown how to make an embodied agent react with responsive facial expression, without using repetitive prebuilt animations, and how to mix those facial expressions to simulate shifting moods and attitudes. The result is real-time interactive facial animation with convincing emotive expressiveness.

The eventual goal of this research is to give computer-mediated documents the ability to represent the subtleties we take for granted in face to face communication, so that they can function as agents for an emotional point of view.

In an experiment to discover a viable vocabulary for such a capability, we isolated the minimal number of facial expression elements that will produce a "convincing" impression of character and personality. Of course this is just a subset of the full range of expression the human face is capable of. Paul Ekman's pioneering work on the Facial Action Coding System gives a functional description of the full range of expression of which the human face is capable.

This was also an experiment to test whether it was feasible to implement a 3D character in Java without using any 3D plug-ins (ie: doing the 3D rendering entirely in the Java applet itself), which would allow us to conduct widely-distributed user studies using web technology already in place.

Questions for the workshop:

Multilayer interaction

Documents are typically divided into nested semantic levels. Summary information within an introduction or section heading expands into detailed information at a later section. This is also generally true of large interaction tasks. We are trying to ask the question of how to create interactive control of information that makes the best use of peoples' intuitive and cultural notions of this nesting.

The above sequence of images shows a number of time-sequential snapshots from a user interaction with a zoomable multi-scale user interface for controlling and editing animation. The user zooms into the controls for a particular animated character, and chooses an animatable to edit. The user can then build nested expressions of animation sliders and key-frame curves, in order to provide various controls for the animation.

Questions for the workshop:

Textural simulation

The above images are a set of progressively rendered versions of a synthetic planet created entirely out of procedural texture synthesis algorithms. The use of an entirely procedural planet generator is allowing us to do work in which we let users "steer" the evolution or appearance of a planet by using high level controls that directly impact the aesthetic parameters of planet evolution and appearance.

For example, the user can continuously vary the ratio of ocean area to land mass area. By giving the user a number of such controls, we can begin to conduct experiments on user experience of higher level control of procedurally and aesthetically defined artifacts in a highly shared computer-mediated document.

Questions for the workshop:

Correspondences and issues for the future

As you'll notice from above experiments, there are correspondences between each of the above experiments and work to be done in program-documents in a number of areas, including animation design, animation modeling, cognifive modeling, pedagogical presentation and organization, information search and retrieval, and scientific visualization.

A summary of our general thesis might be stated as follows: that (i) the layering approach is effective; that (ii) it is a worthwhile goal is to blur the distinction between programs and documents; that (iii) this blending, when done on the web, is effective for collaboration and information sharing, and that (iii) breakthroughs will come from making accessible tools that can be used by creative people in unexpected ways.


N. Badler, B. Barsky, D. Zeltzer, Making Them Move: Mechanics, Control, and Animation of Articulated Figures Morgan Kaufmann Publishers, San Mateo, CA, 1991.

N. Badler, C. Phillips, B. Webber, Simulating Humans: Computer Graphics, Animation, and Control Oxford University Press, 1993.

J. Bates, A. Loyall, W. Reilly, Integrating Reactivity, Goals and Emotions in a Broad Agent, Proceedings of the 14th Annual Conference of the Cognitive Science Society, Indiana, July 1992.

B. Bederson, J. Hollan, K. Perlin, J. Meyer, D. Bacon, and G. Furnas, Pad++: A Zoomable Graphical Sketchpad for Exploring Alternate Interface Physics, Journal of Visual Languages and Computing (7), 3-31. 1996.

B. Blumberg, T. Galyean, Multi-Level Direction of Autonomous Creatures for Real-Time Virtual Environments Computer Graphics (SIGGRAPH '95 Proceedings), 30(3):47--54, 1995.

A. Broderlin, L. Williams, Motion Signal Processing, Computer Graphics (SIGGRAPH '95 Proceedings), 30(3):97--104, 1995.

R. Brooks. A Robust Layered Control for a Mobile Robot , IEEE Journal of Robotics and Automation, 2(1):14--23, 1986.

J. Chadwick, D. Haumann, R. Parent, Layered construction for deformable animated characters . Computer Graphics (SIGGRAPH '89 Proceedings), 23(3):243--252, 1989.

D. Ebert and et. al., Texturing and Modeling, A Procedural Approach Academic Press, London, 1994.

P. Ekman, Facial expression of emotion. American Psychologist, 48, 384-392.

M. Girard, A. Maciejewski, Computational modeling for the computer animation of legged figures . Computer Graphics (SIGGRAPH '85 Proceedings), 20(3):263--270, 1985.

B., Hayes-Roth, and R., van Gent, Improvisational puppets, actors, and avatars, in Proceedings of the Computer Game Developers' Conference, Santa Clara, CA, 1996.

J. Hodgins, W. Wooten, D. Brogan, J O'Brien, Animating Human Athletics, Computer Graphics (SIGGRAPH '95 Proceedings), 30(3):71--78, 1995.

M. Johnson, WavesWorld: PhD Thesis, A Testbed for Three Dimensional Semi-Autonomous Animated Characters , MIT, 1994.

M. Karaul, personal communication

P. Maes, T. Darrell and B. Blumberg, The Alive System: Full Body Interaction with Autonomous Agents in Computer Animation'95 Conference, Switzerland, April 1995 .IEEE Press, pages 11-18.

B. Mandelbrot, The Fractal Geometry of Nature, W. H. Freeman and Co., 1983.

M. Minsky, Society of Mind , MIT press, 1986.

C. Morawetz, T. Calvert, Goal-directed human animation of multiple movements . Proc. Graphics Interface}, pages 60--67, 1990.

S. Mukherjea, J. Foley, and S. Hudson, Visualizing Complex Hypermedia Networks through Multiple Hierarchical Views, Proceedings of CHI'95, ACM press, 331-337. 1995.

F. K. Musgrave, Methods for Realistic Landscape Imaging, Ph.D. Thesis, Dept. of Computer Science, Yale University, 1994.

K. Perlin, An image synthesizer . Computer Graphics (SIGGRAPH '85 Proceedings)}, 19(3):287--293, 1985.

K. Perlin, Danse interactif . SIGGRAPH '94 Electronic Theatre, Orlando.

K. Perlin, Real Time Responsive Animation with Personality , IEEE Transactions on Visualization and Computer Graphics, 1(1), 1995.

K. Perlin, A. Goldberg, The Improv System Technical Report NYU Department of Computer Science, 1996.
(online at

K. Perlin, Hoffert, E., Hypertexture, Computer Graphics, Vol 19, No. 3, 1989.

K. Perlin, "An Image Synthesizer, Computer Graphics, Vol 15, No. 3, 1985.

K. Perlin, A. Goldberg, Sid and the Penguins, SIGGRAPH '98 Electronic Theatre, Orlando. 1998.

K. Perlin, Layered Compositing of Facial Expression, SIGGRAPH '97 Technical Sketch, Los Angeles. 1997.

K. Perlin, A. Goldberg, Improv: A System for Scripting Interactive Actors in Virtual Worlds, Computer Graphics; Vol. 29 No. 3. 1996.

K. Perlin and D. Fox, Pad: An Alternative Approach to the Computer Interface, Proceedings of SIGGRAPH'93, ACM Press, 57-64. 1993. K. Sims, Evolving virtual creatures . Computer Graphics (SIGGRAPH '94 Proceedings)}, 28(3):15--22, 1994.

N. Stephenson, Snow Crash Bantam Doubleday, New York, 1992.

S. Strassman, Desktop Theater: Automatic Generation of Expresssive Animation, PhD thesis, MIT Media Lab, June 1991
(online at

D. Terzopoulos, X. Tu, and R. Grzesczuk Artificial Fishes: Autonomous Locomotion, Perception, Behavior, and Learning in a Simulated Physical World, Artificial Life, 1(4):327-351, 1994.

R. P. Voss, "Fractal Forgeries" in Fundamental Algorithms for Computer Graphics, R. A. Earnshaw, ed, Spinger-Verlag, 1985.

A. Witkin, Z. Popovic, Motion Warping Computer Graphics (SIGGRAPH '95 Proceedings), 30(3):105-108, 1995.