An additional problem is that any device held in the hand can become awkward
while gesturing. We have found this even with a simple pointing device,
such as a stick with a few buttons [Seay99].
Also a user, unless fairly skilled, often has to pause to identify and select
buttons on the stick. With accurately tracked hands most of this
awkwardness disappears. We are adept at pointing in almost any direction
and can quickly pinch fingers, for example, without looking at them.
Finally, physical objects are often natural interactors (such as phicons [Ullmer97]). However, with current systems these objects must be inserted in advance or specially prepared. One would like the system to accept objects that one chooses spontaneously for interaction.
Fortunately, the computer vision community has taken up the task of tracking the user's hands and identifying gestures. While generalized vision systems track the body in room and desk-based scenarios for games, interactive art, and augmented environments [Bobick96, Wren95, Wren97], reconstruction of fine hand detail involves carefully calibrated systems and is computationally intensive [Rehg93]. Even so, complicated gestures such as those used in sign language [Starner98, Vogler98] or the manipulation of physical objects [Sharma97] can be recognized. The Perceptive Workbench uses computer vision techniques to maintain a wireless interface.
Most directly related to the Perceptive Workbench, the "Metadesk" [Ullmer97]
identifies and tracks objects placed on the desk's display surface using a
near-infrared computer vision recognizer, originally designed by Starner.
Unfortunately, since not all objects reflect infrared light and infrared shadows
are not used, objects often need infrared reflective "hot mirrors" placed in
patterns on their bottom surfaces to aid tracking and identification. Similarly,
Rekimoto and Matsushita's "Perceptual Surfaces" [Rekimoto97]
employ 2D barcodes to identify objects held against the "HoloWall" and
"HoloTable." In addition, the HoloWall can track the user's hands (or other body
parts) near or pressed against its surface, but its potential recovery of the
user's distance from the surface is relatively coarse compared to the 3D
pointing gestures of the Perceptive Workbench. Davis and Bobick's SIDEshow [Davis98]
is similar to the Holowall except that it uses cast shadows in infrared for
full-body 2D gesture recovery. Some augmented desks have cameras and projectors
above the surface of the desk and are designed to augment the process of
handling paper or interacting with models and widgets through the use of
fiducials or barcodes [Arai95,
Krueger's VIDEODESK [Krueger91],
an early desk-based system, used an overhead camera and a horizontal visible
light table (for high contrast) to provide hand gesture input for interactions
displayed on a monitor on the far side of the desk. In contrast with the
Perceptive Workbench, none of these systems address the issues of introducing
spontaneous 3D physical objects into the virtual environment in real-time and
combining 3D deictic (pointing) gestures with object tracking and
A ring of seven similar light-sources is mounted on the ceiling surrounding
the desk (Figure
2). Each computer-controlled light casts distinct shadows on the
desk's surface based on the objects on the table (Figure
3a). A second camera, this one in color, is placed next to the desk to
provide a side view of the user's arms (Figure
3b). This side camera is used solely for recovering 3D pointing gestures.
All vision processing is done on two SGI R10000 O2s (one for each camera),
which communicate with a display client on an SGI Onyx via sockets.
However, the vision algorithms could also be run on one SGI with two digitizer
boards or be implemented using inexpensive, semi-custom signal-processing
We use this setup for three different kinds of interaction which will be explained in more detail in the following sections: recognition and tracking of objects placed on the desk surface based on their contour, full 3D reconstruction of object shapes on the desk surface from shadows cast by the ceiling light-sources, and recognition and quantification of hand and arm gestures.
For display on the Perceptive Workbench, we use the Simple Virtual
Environment Toolkit (SVE), a graphics and sound library developed by the Georgia
Tech Virtual Environments Group [Kessler97].
SVE permits us to rapidly prototype applications used in this work. In
addition we use the workbench version of VGIS, a global terrain visualization
and navigation system [Lindstrom96,
as an application for interaction using hand and arm gestures. The workbench
version of VGIS has stereoscopic rendering and an intuitive interface for
Both systems are built on OpenGL and have both SGI and PC implementations.
To achieve this goal, we use an improved version of the technique described
in Starner et al. [Starner00].
The underside of the desk is illuminated by two near-infrared light-sources (Figure
2). Every object close to the desk surface (including the user's
hands) reflects this light and can be seen by the camera under the display
4). Using a combination of intensity thresholding and background
subtraction, we extract interesting regions of the camera image and analyze
them. The resulting blobs are classified as different object types based on a
set of features, including area, eccentricity, perimeter, moments, and the
In this work, we are using the object recognition and tracking capability
mainly for "cursor objects". Our focus is on fast and accurate position
tracking, but the system may be trained on a different set of objects to be used
as navigational tools or physical icons [Ullmer97].
A future project will explore different modes of interaction based on this
For navigation and object manipulation in a virtual environment, many
gestures are likely to have a deictic component. It is usually not enough to
recognize that an object should be rotated, but we will also need to know the
desired amount of rotation. For object selection or translation, we want to
specify the object or location of our choice just by pointing at it. For these
cases, gesture recognition methods that only take the hand shape and trajectory
into account will not be sufficient. We need to recover 3D information about the
user's hand and arm in relation to his body.
In the past, this information has largely been obtained by using wired gloves or suits, or magnetic trackers [Bolt92, Bimber99]. Such methods provide sufficiently accurate results but rely on wires and have to be tethered to the user's body, or to specific interaction devices, with all the aforementioned problems. Our goal is to develop a purely vision-based architecture that facilitates unencumbered 3D interaction.
With vision-based 3D tracking techniques, the first issue is to determine which information in the camera image is relevant, i.e. which regions represent the user's hand or arm. This task is made even more difficult by variation in user clothing or skin color and by background activity. Although typically only one user interacts with the environment at a given time using traditional methods of interaction, the physical dimensions of large semi-immersive environments such as the workbench invite people to watch and participate.
In a virtual workbench environment, there are few places
where a camera can be placed to provide reliable hand position
information. One camera can be set up next to the table without overly
restricting the available space for users, but if a similar second camera were
to be used at this location, either multi-user experience or accuracy would be
compromised. We have addressed this problem by employing our shadow-based
architecture (as described in the hardware
section). The user stands in front of the workbench and extends an arm over
the surface. One of the IR light-sources mounted on the ceiling to the left of,
and slightly behind the user, shines its light on the desk surface, from where
it can be seen by the IR camera under the projector (see Figure
5). When the user moves his arm over the desk, it casts a shadow on the desk
surface (see Figure
6a). From this shadow, and from the known light-source position, we can
calculate a plane in which the user's arm must lie.
Obviously, the success of the gesture tracking capability
relies very strongly on how fast the image processing can be done. It is
therefore necessary to use simple algorithms. Fortunately, we can make some
simplifying assumptions about the image content.
We must first recover arm direction and fingertip position from both the camera and the shadow image. Since the user is standing in front of the desk and the user's arm is connected to the user's body, the arm's shadow should always touch the image border. Thus our algorithm exploits intensity thresholding and background subtraction to discover regions of change in the image and searches for areas in which these touch the front border of the desk surface (which corresponds to the top border of the shadow image or the left border of the camera image). It then takes the middle of the touching area as an approximation for the origin of the arm (Figure 6b and Figure 7b). For simplicity we will call this point the "shoulder", although in most cases it is not. Tracing the contour of the shadow, the algorithm searches for the point that is farthest away from the shoulder and takes it as the fingertip. The line from the shoulder to the fingertip reveals the 2D direction of the arm.
Another problem can be seen in Figure 7b, where segmentation based on color background subtraction detects both the hand and the change in the display on the workbench. A more recent implementation replaces the side color camera with an infrared spotlight and a monochrome camera equipped with an infrared-pass filter. By adjusting the angle of the light to avoid the surface of the desk or any other close objects, the user's arm is illuminated and made distinct from the background and changes in the workbench's display does not affect the tracking.
A bigger problem is caused by the actual location of the side camera. If the
user extends both of his arms over the desk surface, or if more than one user
tries to interact with the environment at the same time, the images of these
multiple limbs can overlap and be merged to a single blob. As a consequence, our
approach will fail to detect the hand positions and orientations in these cases.
A more sophisticated approach using previous position and movement information
could yield more reliable results, but we chose, at this first stage, to accept
this restriction and concentrate on high frame rate support for one-handed
interaction. This may not be a serious limitation for a single user for certain
tasks; a recent study shows that for a task normally requiring
two hands in a real environment, users have no preference for one versus two hands in a virtual environment that does not
model effects such as gravity and inertia [Seay99].
However, the necessity to move either the camera or the object imposes severe
constraints on the working environment. To reconstruct an object with these
methods, it is usually necessary to interrupt the user's interaction with
it, take the object out of the user's environment, and place it into a
specialized setting (sometimes in a different room). Other approaches make use
of multiple cameras from different view points to avoid this problem at the
expense of more computational power to process and communicate the results.
In this project, using only one camera and the infrared light sources, we analyze the shadows cast on the object from multiple directions (see Figure 9). As the process is based on infrared light, it can be applied independently of the lighting conditions and without interfering with the user's natural interaction with the desk.
The existing approaches to reconstruct shape from shadows or silhouettes can be divided into two camps. The volume approach, pioneered by Baumgart [Baumgart74] intersects view volumes to create a representation for the object. Common representations for the resulting model are polyhedra [Baumgart74, Conolly89], or octrees [Srivastava90, Chien86]. The surface approach reconstructs the surface as the envelope of its tangent planes. It has been realized in several systems [Boyer96, Seales95]. Both approaches can be combined, as in Sullivan's work [Sullivan98], which uses volume intersection to create an object and then smooths the surfaces with splines.
We have chosen to use a volume approach to create
polyhedral reconstructions for several reasons. We want to create models that
can be used instantaneously in a virtual environment. Our focus is not on
getting a photorealistic reconstruction, but on creating a quick low
polygon-count model for an arbitrary real-world object in real-time, without
interrupting the ongoing interaction. Polyhedral models offer significant
advantages over other representations, such as generalized cylinders,
superquadrics, or polynomial splines. They are simple and computationally
inexpensive, Boolean set operations can be performed on them with reasonable
effort, and most current VR engines are optimized for fast rendering of
polygonal scenes. In addition, polyhedral models are the basis for many later
processing steps. If desired, they can still be refined with splines using a
Our approach is fully automated and does not require any special hardware (e.g. stereo cameras, laser range finders, structured lighting, etc.). The method is extremely inexpensive, both in hardware and in computational cost. In addition, there is no need for extensive calibration, which is usually necessary in other approaches to recover the exact position or orientation of the object in relation to the camera. We only need to know the approximate position of the light-sources (+/- 2 cm), and we need to adjust the camera to reflect the size of the display surface, which must be done only once. Neither the camera, light-sources, nor the object are moved during the reconstruction process. Thus recalibration is unnecessary. We have substituted for all mechanical moving parts, which are often prone to wear and imprecision, by a series of light beams from known locations.
An obvious limitation for this approach is that we are confined to a fixed number of different views from which to reconstruct the object. The turntable approach, on the other hand, allows the system to take an arbitrary number of images from different view points. However, Sullivan's work [Sullivan98] and our experience with our system have shown that even for quite complex objects, usually seven to nine different views are enough to get a reasonable 3D model of the object. Note that the reconstruction uses the same hardware as the deictic gesture tracking capability discussed in the previous section. Thus, it comes at no additional cost.
The speed of the reconstruction process is mainly limited by the switching
time of the light sources. Whenever a new light-source is activated, the image
processing system has to wait for several frames to receive a valid image. The
camera under the desk records the sequence of shadows cast by an object on the
table when illuminated by the different lights. Figure
10 shows two series of contour shadows extracted from two sample objects by
using different IR sources. By approximating each shadow as a polygon (not
necessarily convex) [Rosin95],
we create a set of polyhedral "view cones", extending from the light source to
the polygons. The intersection of these cones creates a polyhedron that roughly
contains the object. Figure
11 shows some more shadows with the resulting
polygons and a visualization of the intersection of polyhedral cones.
Exceptions are spherical or cylindrical objects. The quality of reconstruction for these objects depends largely on the number of available views. With only seven light-sources, the resulting model will appear faceted. This problem can be solved by either adding more light-sources, or by improving the model with the help of splines.
Apart from this, the accuracy by which objects can be
reconstructed is bounded by another limitation of our architecture. All
our light-sources are mounted to the ceiling. From this point of view they
cannot provide full information about the object's shape. There is
a pyramidal "blind spot" above all horizontal flat surfaces that the
reconstruction cannot eliminate. The slope of these pyramids depends on
the angle between the desk surface and the rays from the light-sources.
For our current hardware setting, this angle ranges between 37° and 55°,
depending on the light-source. Only structures with a greater slope will
be reconstructed entirely without error. This problem is intrinsic to the
method and does also occur with the turntable approach, but on a much smaller
We expect that we can greatly reduce the effects of this error by using the image from the side camera and extracting an additional silhouette of the object from this point of view. This will help to keep the error angle well below 10°. Calculations based on the current position of the side camera (optimized for the gesture recognition) promise an error angle of only 7°.
In the current version of our software, an additional
error is introduced through the fact that we are not yet handling holes in the
shadows. But this is merely an implementation issue, which will be
resolved in a future extension to our project.
Even so, we expect that simple improvements in the socket communication
between the vision and rendering code and in the vision code itself will improve
latency significantly. For the terrain navigation task below, rendering
speed provides a limiting factor. However, Kalman filters may compensate
for render lag and will also add to the stability of the tracking system.
To measure error, we used the Metro tool [Cignoni98]. It approximates the real distance between the two surfaces by choosing a set of (100,000-200,000) points on the reconstructed surface, and then calculating the two-sided distance (Hausdorff distance) between each of these points and the ideal surface. This distance is defined as
with E(S1,S2) denoting the one-sided distance between the surfaces S1 and S2:
The Hausdorff distance directly corresponds to the reconstruction
error. In addition to the maximum distance, we also calculated the mean
and mean square distances. Table
1 shows the results. In these examples, the relatively large maximal
error was caused by the difficulty in accurately reconstructing the tip of the
cone and the pyramid.
0.0215 (7.26 %)
0.0228 (6.90 %)
0.0056 (1.87 %)
0.0043 (1.30 %)
Mean Square Error
0.0084 (2.61 %)
0.0065 (1.95 %)
When the user moves his hand above the display surface, the hand and arm are
tracked as described in Section
4. A cursor appears at the projected hand position on the display surface,
and a ray emanates along the projected arm axis. These can be used in selection
or manipulation, as in Figure
13a. When the user places an object on the surface, the cameras recognize
this and identify and track the object. A virtual button also appears on the
display (indicated by the arrow in Figure
13b). Through shadow tracking, the system determines when the hand overlaps
the button, selecting it. This action causes the system to capture the 3D object
shape, as described in Section
This set provides the elements of a perceptual interface, operating without
wires and without restrictions as to objects employed. For example, we have
constructed a simple application where objects placed on the desk are selected,
reconstructed, and then placed in a "template" set, displayed as slowly rotating
objects on the left border of the workbench display. These objects can then be
grabbed by the user and could act as new physical icons that are attached by the
user to selection or manipulation modes. Or the shapes themselves could be used
in model-building or other applications.
Although there are currently some problems with latency and accuracy (both of
which will be diminished in the future), a user can successfully employ gestures
for navigation. In addition the set of gestures are quite natural to
use. Further, we find that the vision system can distinguish hand
articulation and orientation quite well. Thus we will be able to attach
interactions to hand movements (even without the larger arm movements). At the
time of this writing, an HMM framework has been developed to allow the user to
train his own gestures for recognition. This system, in association with the
terrain navigation database, should allow more sophisticated interactions in the
In conclusion, the Perceptive Workbench uses a vision-based system to enable a rich set of interactions, including hand and arm gestures, object recognition and tracking, and 3D reconstruction of objects placed on its surface. These elements are combined seamlessly into the same interface and can be used in diverse applications. In addition, the sensing system is relatively inexpensive, retailing approximately $1000 for the cameras and lighting equipment plus the cost of a computer with one or two video digitizers, depending on the functions desired. As seen from the multiplayer gaming and terrain navigation applications, the Perceptive Workbench provides an untethered and spontaneous interface that encourages the inclusion of physical objects in the virtual environment.
Arai, T. and K. Machii and S. Kuzunuki.
Retrieving Electronic Documents with Real-World Objects on InteractiveDesk.
UIST'95, pp. 37-38 (1995).
Geometric Modeling for Computer Vision,
PhD Thesis, Stanford University, Palo Alto, CA, 1974.
Bobick, A., S. Intille, J. Davis, F. Baird, C. Pinhanez, L. Campbell, Y. Ivanov, A. Schutte, and A. Wilson,
The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment,
MIT Media Lab Technical Report (1996).
Bolt R. and E. Herranz,
Two-handed gesture in multi-modal natural dialog,
UIST 92, pp. 7-14 (1992).
Object Models from Contour Sequences,
Proceedings of the Fourth European Conference on Computer Vision, Cambridge (England), April 1996, pp. 109-118.
Gesture Controlled Object Interaction: A Virtual Table Case Study,
Computer Graphics, Visualization, and Interactive Digital Media, Vol. 1, Plzen, Czech Republic, 1999.
Chien, C.H., J.K. Aggarwal,
Computation of Volume/Surface Octrees from Contours and Silhouettes of Multiple Views,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR'86), Miami Beach, FL, June 22-26, 1986, pp. 250-255 (1986).
Cignoni, P., Rocchini, C., and Scopigno, R.,
Metro: Measuring Error on Simplified Surfaces,
Computer Graphics Forum, Vol. 17(2), June 1998, pp. 167-174.
Connolly, C.I., J.R. Stenstrom,
3D Scene Reconstruction from Multiple Intensity Images,
Proceedings IEEE Workshop on Interpretation of 3D Scenes, Austin, TX, Nov. 1989, pp. 124-130 (1989).
Coquillart, S. and G. Wesche,
The Virtual Palette and the Virtual Remote Control Panel: A Device and an Interaction Paradigm for the Responsive Workbench,
IEEE Virtual Reality '99 Conference (VR'99), Houston, March 13-17, 1999.
Daum, D. and G. Dudek,
On 3-D Surface Reconstruction Using Shape from Shadows,
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'98), 1998.
Davis, J.W. and A. F. Bobick,
SIDEshow: A Silhouette-based Interactive Dual-screen Environment,
MIT Media Lab Tech Report No. 457 (1998).
Fitzmaurice, G.W., Ishii, H., and Buxton, W.,
Bricks: Laying the Foundations for Graspable User Interfaces,
Proceedings of CHI'95, pp. 442-449 (1995).
Ishii, H., and Ullmer, B.,
Tangible Bits: Towards Seamless Interfaces between People, Bits, and Atoms,
Proceedings of CHI'97, pp. 234-241 (1997).
Kessler, G.D., L.F. Hodges, and N. Walker,
Evaluation of the CyberGlove as a Whole-Hand Input Device,
ACM Trans. on Computer-Human Interactions, 2(4), pp. 263-283 (1995).
Kessler, D., R. Kooper, and L. Hodges,
The Simple Virtual Environment Libary: User´s Guide Version 2.0,
Graphics, Visualization, and Usability Center, Georgia Institute of Technology, 1997.
Kobayashi, M. and H. Koike,
EnhancedDesk: integrating paper documents and digital documents,
Proceedings of 3rd Asia Pacific Computer Human Interaction, pp. 57-62 (1998).
Artificial Reality II,
Krueger, W., C.-A. Bohn, B. Froehlich, H. Schueth, W. Strauss, G. Wesche,
The Responsive Workbench: A Virtual Work Environment,
IEEE Computer, vol. 28. No. 7. July 1995, pp. 42-48.
How Many 2D Silhouettes Does It Take to Reconstruct a 3D Object?,
Computer Vision and Image Understanding, Vol. 67, No. 1, July 1997, pp. 81-87 (1997).
Leibe, B., T. Starner, W. Ribarsky, Z. Wartell, D. Krum, B. Singletary, and L. Hodges,
The Perceptive Workbench: Towards Spontaneous and Natural Interaction in Semi-Immersive Virtual Environments,
IEEE Virtual Reality 2000 Conference (VR'2000), New Brunswick, NJ, March 2000, pp. 13-20.
Lindstrom, P., D. Koller, W. Ribarsky, L. Hodges, N. Faust, and G. Turner,
Real-Time, Continuous Level of Detail Rendering of Height Fields,
Report GIT-GVU-96-02, SIGGRAPH 96, pp. 109-118 (1996).
Lindstrom, Peter, David Koller, William Ribarsky, Larry Hodges, and Nick Faust,
An Integrated Global GIS and Visual Simulation System,
Georgia Tech Report GVU-97-07 (1997).
An Introduction to Solid Modeling,
Computer Science Press, 1988.
HI-SPACE: A Next Generation Workspace Environment.
Master's Thesis, Washington State Univ. EECS, June 1999.
Rehg, J.M and T. Kanade,
DigitEyes: Vision-Based Human Hand-Tracking,
School of Computer Science Technical Report CMU-CS-93-220, Carnegie Mellon University, December 1993.
Rekimoto, J., N. Matsushita,
Perceptual Surfaces: Towards a Human and Object Sensitive Interactive Display,
Workshop on Perceptual User Interfaces (PUI '97), 1997.
Rosin, P.L. and G.A.W. West,
Non-parametric segmentation of curves into various representations,
IEEE PAMI'95, 17(12) pp. 1140-1153 (1995).
Schmalstieg, D., L. M. Encarnacao, Z. Szalavar,
Using Transparent Props For Interaction With The Virtual Table,
Symposium on Interactive 3D Graphics (I3DG'99), Atlanta, 1999.
Building Three-Dimensional Object Models from Image Sequences,
Computer Vision and Image Understanding, Vol. 61, 1995, pp. 308-324 (1995).
Seay, A.F., D. Krum, W. Ribarsky, and L. Hodges,
Multimodal Interaction Techniques for the Virtual Workbench,
Proceedings of CHI'99.
Sharma R. and J. Molineros,
Computer vision based augmented reality for guiding manual assembly,
Presence, 6(3) (1997).
Srivastava, S.K. and N. Ahuja,
An Algorithm for Generating Octrees from Object Silhouettes in Perspective Views,
IEEE Computer Vision, Graphics and Image Processing, 49(1), pp. 68-84 (1990).
Starner, T., J. Weaver, A. Pentland,
Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video,
IEEE PAMI , 20(12), pp. 1371-1375 (1998).
Starner, T., B. Leibe, B. Singletary, and J. Pair,
MIND-WARPING: Towards Creating a Compelling Collaborative Augmented Reality Gaming Interface through Wearable Computers and Multi-modal Input and Output,
IEEE International Conference on Intelligent User Interfaces (IUI'2000), 2000.
Ph.D. Thesis, MIT Media Lab (1992).
Sullivan, S. and J. Ponce,
Automatic Model Construction, Pose Estimation, and Object Recognition from Photographs Using Triangular Splines,
IEEE PAMI , 20(10), pp. 1091-1097 (1998).
TWIN Solid Modeling Package Reference Manual.
Computer Aided Design and Graphics Laboratory (CADLAB), School of Mechanical Engineering, Purdue University, 1995.
Ullmer, B. and H. Ishii,
The metaDESK: Models and Prototypes for Tangible User Interfaces,
Proceedings of UIST'97, October 14-17, 1997.
Underkoffler, J. and H. Ishii,
Illuminating Light: An Optical Design Tool with a Luminous-Tangible Interface,
Proceedings of CHI '98, April 18-23, 1998.
van de Pol, Rogier, William Ribarsky, Larry Hodges, and Frits Post,
Interaction in Semi-Immersive Large Display Environments,
Report GIT-GVU-98-30, Virtual Environments '99, pp. 157-168 (Springer, Wien, 1999).
Vogler C. and D. Metaxas,
ASL Recognition based on a coupling between HMMs and 3D Motion Analysis,
Sixth International Conference on Computer Vision, pp. 363-369 (1998).
Wartell, Zachary, William Ribarsky, and Larry F.Hodges,
Third Person Navigation of Whole-Planet Terrain in a Head-tracked Stereoscopic Environment,
Report GIT-GVU-98-31, IEEE Virtual Reality 99, pp. 141-149 (1999).
Wartell, Zachary, Larry Hodges, and William Ribarsky,
Distortion in Head-Tracked Stereoscopic Displays Due to False Eye Separation,
Report GIT-GVU-99-01, SIGGRAPH 99,.pp. 351-358 (1999).
Interacting with paper on the digital desk,
Comm. of the ACM, 36(7), pp. 86-89 (1993).
Wren C., F. Sparacino, A. Azarbayejani, T. Darrell, T. Starner, A. Kotani, C. Chao, M. Hlavac, K. Russell, and A. Pentland,
Perceptive Spaces for Performance and Entertainment: Untethered Interaction Using Computer Vision and Audition,
Applied Artificial Intelligence , 11(4), pp. 267-284 (1995).
Wren C., A. Azarbayejani, T. Darrell, and A. Pentland,
Pfinder: Real-Time Tracking of the Human Body,
IEEE PAMI, 19(7), pp. 780-785 (1997).