J. Bryan Lewis, Lawrence Koved, Daniel T. Ling
Veridical User Environments
IBM Research
T. J. Watson Research Center
P.O. Box 704
Yorktown Heights, New York 10598
We describe a software architecture for virtual worlds, built on a base of
multiple processes communicating through a central
event-driven user interface management system.
The virtual world's behavior is specified by a dialogue composed
of modular subdialogues or rule sets. In order to achieve
high flexibility, device remappability and reusability,
the rule sets should be written as independent modules, each
encapsulating its own state. Each should be designed according
to its purpose in a conceptual hierarchy: it can transform a specific
device into a generic device, or transform a generic device into an interaction
technique, or, at the top level, map interaction techniques to actions.
We achieved a degree of concurrency early in our work by building dedicated servers for the glove and tracker [Wang]. We have continued to add new devices, including speech input, speech output and sound output, as servers. Communication takes the form of asynchronous message-passing over a private network. Coordination is handled by a central kernel-like executive process with which all the devices communicate. Thus the foundation of our architecture, shaped by the performance and multiple-device characteristics, is as shown in the figure.
The central executive process fills the same role as a dialogue manager or user interface management system (UIMS) in conventional desktop applications. In fact we adapted an existing event-driven rule-based UIMS [Rhyne] by adding a front end to receive asynchronous network messages and package them as events for the rule matcher. This allows us to handle events from newly defined devices like the glove, tracker, and speech recognizer, just as easily as the desktop version handled mouse and keyboard events. With this kind of UIMS the dialogue, which specifies and synchronizes the mapping of inputs to actions and results to output, is expressed as a set of production rules. Each rule is fired by an event or pattern of events, and produces a new event or accomplishes an immediate effect by executing semantics embedded in the rule.
Another way of looking at flexibility is the avoidance of arbitrary restrictions on interaction techniques. Restrictions should be controllable at a sufficiently high level that they can be dynamically changed. We shall see in the structure of our fluid dynamics world that we use the tracker for three different purposes depending on the user's current task; each task uses a subset of the available degrees of freedom.

To this visualization tool we added real-world-like interactivity. Forming a fist gesture grabs the vortices and rotate them. A flat-hand "flying" gesture navigates around the vortices. The flying metaphor can be thought of as a camera-bearing satellite flying around a globe; the angle of the hand determines the angular velocity and the closeness of the hand to the screen determines the satellite's radius.
The user can issue spoken commands to turn on a spotlight mounted on the camera, to reset the camera to its home position, and to pause or resume the use of the hand as an input device.
A set of five sliders allows the user to adjust the experimental conditions -- simulation timestep and surface isovalue -- and the red, blue and green color components of the graphic objects. The user selects a slider by pointing at it, and grabs the slider's adjustment knob with a fist gesture.
We found frequent sound feedback to be very useful. The application's response to an adjustment in timestep or isovalue can take as long as ten seconds, so we play "thinking music" in the interim.2 Other uses of sound feedback are a click on every change of hand gesture, a helicopter sound during flying, and a vocal "okay" in response to a spoken command. These seem to help minimize the perceived latency.
We therefore changed the slider dialogue to work with more abstract non-glove-specific events such as SELECT, PICK, MOVE, COMMIT, and DESELECT. Another advantage of the new set of interaction primitives was a wider applicability. The interaction techniques for many other devices such as pulldown menus, selection lists, and pushbuttons could be described by the same primitives. The dialogue's mapping could be changed from one device to another as long as the interaction techniques were similar, and the devices produced the same type of results.
Thus remappability seemed to be the right paradigm to guide the structuring of the dialogue. We wanted to be able to reuse the same subdialogues for devices with similar interaction techniques. Going further, we wanted to be able to remap devices and techniques dynamically, letting the user switch without interrupting the world. This would provide a good basis for the dynamic flexibility needed in virtual worlds. Doing this required the rule set to be divided into several levels such that the interfaces between levels provided convenient places for identifying similarities in techniques and outputs.
Structure of rule sets

In the lowest level, a rule set forms the interface to a specific hardware or software device. In particular the rule set's function is to encapsulate low-level details, such as packing more than one datum into a single message, or waiting for acknowledgment before sending new data, or smoothing of jittery input streams.
The generic level accepts events coming up from the lower level and further transforms them into more general interaction techniques such as rotation, flying or color events. It also transforms events coming down from the executive as needed to match the requirements of the lower level.
The executive level maps interaction techniques to actions. It handles higher-level synchronization and dialogue mode, e.g., a fist gesture means grabbing a slider bar if it occurs after selecting a slider, but means rotation otherwise.
The three levels are conceptual. There can be many separate rule sets belonging to a given level, and typically this is true in the lower two levels. A level might need further subdivision in some cases, depending on how closely the device or technique matches the desired interface to the next level.
Clearly an additional benefit of this structure is reusability. If a device or technique has been given a generic enough interface to allow easy substitution, then it will be just as easy to reuse. Remappability and reusability are equivalent as guidelines for dialogue decomposition.
SET_GLOVE { remember specified handedness;
tell server specified profile };
QUERY_GLOVE { query server for data };
GLOVE { split data into one-glove events }
--> GLOVE1, GLOVE2;
GLOVE1 { determine whether gesture changed }
iftrue gesture_changed
--> GESTURE_CHANGED1;
The rules are represented schematically here. Pseudo-code in braces represents embedded semantics. Rules can produce new events, as represented by "&rarrow.". We're omitting the rules for initialization and quitting which are commonly found in all the low-level rule sets.
The corresponding rule set in the generic level is the following, which transforms the separated and smoothed GESTURE_CHANGED1 event into a generic one-of-many device. The gesture fans out into one of seven mutually exclusive alternatives.
GESTURE_CHANGED1 { compare to gesture table }
--> oneof
POINT, TWOPOINT, FLAT, FIST,
THIRD, TWOTHIRDS, or THUMBOUT;
Finally, at the highest level, our dialogue maps these events to actions.
FLAT --> START_FLYING;
START_FLYING { set mode = flying };
The highest-level dialogue has internal state represented by the mode variable; this state is encapsulated and not shared with any other rule set. A flat gesture starts flying mode; this causes the dialogue to select a different interpretation for the tracker device, as will be described next.
QUERY_TRACK { query track };
TRACK { server data... no action required };
The TRACK event contains six floating point numbers: three positions (still expressed in the tracker's source coordinates) and three orientation angles. In our simple one-device example, TRACK requires no action at this level. The next higher level contains three parallel rule sets. The first transforms the tracker into a generic device reporting two angles and a position, as needed for theta, phi and rho in our satellite flying model.
Tracker to Rotating:
TRACK { transform the three angles to world
space and put into ROTATION event }
--> ROTATION;
Tracker to Slider Manipulation:
TRACK { compute relative motions in x,y plane }
--> DELTA_PLANE;
Tracker to Flying:
TRACK { compute hand angles with respect to
x and y directions; combine the two
angles and z position into FLY event }
--> FLY;
Note that the last rule reduces the tracker's six degrees of freedom to two; the tracker is essentially being transformed into a mouse.
All three generic devices are always active. It is only at the highest level that we dynamically choose one of their inputs.
DELTA_PLANE iftrue mode == manipulating_sliders
--> MOVE_SLIDERS;
ROTATION iftrue mode == rotating
--> RENDER_ROTATE;
FLY iftrue mode == flying
{ map angles and position to theta,
phi, rho; convert to viewpoint }
--> VIEWPOINT;
SET_TIMESTEP { send timestep to simulator };
SET_ISOVALUE { if mapper has received the
latest simulation data
then send isovalue, else save it };
MAPPER_GOT_DATA { send isovalue to mapper };
The application content, when fully separated in this way, comprises a very small part of the dialogue. Also notice that the application is treated like any other concurrent "device"; its behavior is coordinated by the executive level.
1 The vorticity is the magnitude of the curl of the velocity. Loosely speaking, it measures the turbulence, or rate of change in flow velocity.
2 Interactivity is not sacrificed during this thinking interval; the user can continue to fly, rotate and issue commands. This is a natural benefit of the concurrent architecture.
[Brooks] Brooks, F.P. Grasping Reality through Illusion. Proceedings ACM SIGCHI, pp.1-11, 1988.
[Wang] Wang, C.P., Koved, L. and Dukach, S. Design for Interactive Performance in a Virtual Laboratory. Computer Graphics, Snowbird, Utah, March 1990. Proceedings 1990 Symposium on Interactive 3D Graphics)
[Rhyne] Rhyne, James R. Extensions to C for Interface Programming. ACM SIGGRAPH Symposium on User Interface Software, Banff, Canada, October 1988.
[Jacob] Jacob, R.J.K. A Specification Language for Direct-Manipulation User Interfaces. ACM Transactions on Graphics, 5(4):283-317, October 1986.
[Foley] Foley, J.D., van Dam, A., Feiner, S.K. and Hughes, J.F. Computer Graphics, Principles and Practice. Addison-Wesley, 1990.
[Wiecha] Wiecha, C., Bennett, W., Boies, S. and Gould, J. Generating Highly Interac- tive User Interfaces. Proceedings ACM SIGCHI '89, 277-282, 1989.
[Zabusky] Gu, Z., Silver, D. and Zabusky, N.J. 3D Visualization and Quantification of Evolving Amorphous Objects. IEEE Proceedings of Visualization 1990, 1990.