A Task-based Architecture for Application-aware Adjuncts
Robert Farrell
Peter Fairweather
T J Watson Research Center
Yorktown Heights, NY 10598 USA
+1 914 945 3398
robfarr@us.ibm.com
Eric Breimer
Computer Science Department
Rensselaer Polytechnic Institute
Troy, NY
breime@cs.rpi.com
ABSTRACT
Users of complex applications need tools that serve as adjuncts to their work, such as helpers, assistants, advisors, and critics. We are experimenting with ways of making adjuncts aware of the history of interaction surrounding the accomplishment of a task. This paper introduces a new architecture for application-aware adjuncts. Using the architecture, we have implemented a prototype Task Critic adjunct that can give task-oriented critiques of application usage. Our approach is generic, widely applicable, and works directly with off-the-shelf software packages.
Keywords
Architecture, events, adjuncts, human-computer interaction, graphical user interface, plan recognition, task-based, performance support.
INTRODUCTION
Help systems for software applications typically provide information on operating an application’s complex commands and options without giving the user help completing tasks [5]. While computer-based instruction [2] may provide effective support for new users executing common tasks, more advanced users are left without adequate assistance. To address this shortcoming, we have been experimenting with "adjuncts" to software applications that can be easily authored to handle new tasks. Our adjuncts execute alongside the application, take advice from the user, and provide task-specific support in the context of the user’s work.
This paper describes our experiences creating application-aware, task-oriented adjuncts. The first section provides information on our generic architecture. The next section explains a prototype we’ve constructed for a simple text editor. The last section discusses some of the research issues.
ARCHITECTURE
Our proposed architecture is shown in Figure 1. It supports the interaction between three independent but interrelated agents:

Figure 1: Architecture for Application-aware Adjuncts
Users (1) can work on more than one target application (2) to accomplish a task (e.g., a Text Editor and E-mail) and the Support-ware (3) can provide application-aware processing for more than one Adjunct (4).
To receive application-aware communications from an adjunct, a user must register their identity with the adjunct and optionally select a high-level task or tasks (a) An illustrative task for a text editor is shown in Figure 2.

Figure 2: A sample task: Find a word in a file
As the user works on applications, the system interprets the events being generated (b) as working toward one or more of the tasks. The adjunct can use this information (c) to provide in-context assistance, performance summaries, directed feedback, or detailed critiques. Our Support-ware components provide a convenient interface for adjuncts to plug into an application’s event stream and respond to user actions.
Support-ware Components
Support-ware consists of two major components: the Capture Module (d) and the Analysis Module (e).
Capturing Events
Our architecture supports two different methods for capturing events. The Capture Module can subscribe to the events generated by particular applications, and can then ‘normalize’ these events to a generic event model. Alternately, it can "spy" on events being generated by applications -- using the facilities of an application-independent event-processing substrate (e.g., the operating system or event-handling middleware). Using this method, one could provide adjuncts for popular software packages.
There are two major classes of user interface events that can be captured from the operating system: user events and application events. User events ("inputs") are the result of actions the user actually took on the application: key presses, mouse movements, dialog use, and window controls are examples. Application events ("outputs") are generated by the application to control itself, typically in response to user events: creation and destruction of user interface widgets, painting of windows, and population of default values in text fields are all examples of such events.
Primitive events are n-tuples where the first element is an enumerated event type (see Figure 3) and the rest are strings of characters. "Filtering controls" (g) allow the capture module to discard certain classes of events from consideration and enable adjuncts to suspend and resume the capture process.

Figure 3: Primitive User Interface Events
We have implemented an event capturing system for the Windows operating system that uses the "hooks" provided by the Microsoft Windows operating system to intercept user interface events. Event filtering is implement as a tree where each node tests a different attribute of the Windows event structure. An event that passes all of the tests is converted in a generic event (f) and is passed on to the analysis module.
Analyzing Events
The Analysis Module receives events from one or more Capture Modules on different computer systems. If the computer supports multiple users, the Capture Module must also report the current user’s identity.
The analysis module adds a time stamp to each incoming generic event, then performs a bottom-up reconstruction of the user’s hierarchical task structure, starting with the incoming generic events.
Each generic event is matched against the available recognition rules. The recognition rules typically do four things:
A sample recognition rule is shown in Figure 4.

Figure 4: Sample Recognition Rule
The Analysis Module compiles recognition rules into a table that indexes primitive Capture Agent event types to possible rules to match.
The Analyzer alternates between reading events and matching recognition rules. Events that do not match the current context are put into a ‘working’ memory, so that they can try to match later. However, otherwise-equivalent more-recent events will be preferred during matching.
Once a new inferred event has been added, the system immediately matches it against other recognition rules, so that event recognition proceeds depth-first, guided by the active contexts. The system also records the supporting relationship between the matched events and the asserted inferred event.
The growing "forest" of inferred events is called an interpretation of events The primitive events are the leaves of the interpretation forest. There may be more than one interpretation for a given set of events. One possible interpretation for our sample text editing task and the set of primitive events we introduced is shown in Figure 5.

Figure 5: An Interpretation of Events for the 'Find a word in a file' Task
Now that let’s take a look at how we have applied this architecture in an implemented system.
PROTOTYPE ADJUNCT: Task Critic
We have tested portions of our architecture by building a system for critiquing application usage. Critiquing is an effective mechanism for improving human behavior, especially that of near-expert users. The critiquing method introduced here encourages reflexive problem solving and challenges the user to create self-explanations [1] for perceived expectation failures.
Hutchins, Hollan, and Norman [5] introduced the idea that in interacting with an unfamiliar and unnatural system there are two major problems: the "gulf of execution" (effort required to determine how to get a system to do what the user wants) and the "gulf of evaluation" (effort required to interpret the feedback provided by the system). While help systems and computer-based training typically reduce the gulf of execution, a critic system can reduce the gulf of evaluation.
Expert critiquing systems have had a large impact where there is a good source of expert knowledge and where complex constraints must be fulfilled (e.g., spelling checkers) [7]. This is precisely the situation we find when the user is presented with a real-world task using a suite of applications, such as writing a letter of acceptance or maintaining a mailing list. The challenges of the real task (e.g., complex formatting, merging multiple files) create a useful role for an adjunct, such as a task critic.
Our Task Critic (see Figure 6: Task Critic) allows the user to select a task and then perform that task using designated desktop applications. At any point, the user can request an evaluation of their work, temporarily suspend the system operation, or retry the task from the beginning.

The Task Critic’s evaluation lists those goals that were satisfied, those that were omitted, and those that were unexpected. A sample evaluation is shown in Figure 5.

Figure 7: A Critique of a user opening a file without searching
The Task Critic consists of the adjunct user interface, a task model, an application model and an evaluation module that computes the critique.
Adjunct User Interface
The Task Critic user interface consists of a single primary window with popup dialogs for browsing tasks and help. The task browser allows the user to select their desired task from a list of task collections. Once the user starts the task, a ‘clock’ icon provides visual feedback on task performance. The small, resizable critic window allows the user maximal screen real estate for performing the task on the application, while the dual-pane design allows the user to see both the task description and the system’s evaluation simultaneously. This design also supports on-demand, incremental evaluation.
Task Model
Tasks are represented as hierarchical AND/OR trees of goals and subgoals (see Figure 5). All subgoals are ordered linearly unless an ‘unordered’ predicate is added to the supergoal’s list of temporal constraints. Goals have descriptions that are identical to event descriptions, except that they may contain incompletely specified elements.
Figure 8: Task model
Application Model
The Application Model must be pre-populated for each target application. It enables the recognition rules to be largely application-independent. For example, the application model stores a mapping from application button names to generic button names. Using this technique, most recognition rules do not have to be modified if we want to build an adjunct for a different text editor.
The Text Editor Application
The event types for the Text Editor consists of operations on applications (Start/Exit), windows (Open/Close), files (Save), and documents within the editor (searching, editing).
When running the capture module on the Text Editor application, we found that window titles were insufficient to differentiate application windows, so we also report the type of window (the "widget"). For some applications, it is necessary to also determine relative position of the windows. In our previous work [
*], we used the unchanging portions of the content to identify windows.Critiquing
The Task Critic evaluates the user’s inputs by starting with the most recent task given to the Task Critic. The system recursively matches the goals in the task description against the events in the interpretation, starting with the high-level goals. If a goal matches an event, then the goal is marked as achieved, the pairing is stored, and it’s subgoals are matched against the event’s supporting events. If the goal does not match any events, then it counts as omitted. A second pass is made over the interpretation, recursively querying each of the events to see if it has been paired with a goal. The events that are highest in the interpretation forest are returned as unexpected. The system passes the lists of events through a simple template-based natural language generator to the final output to the user.
FUTURE WORK
Critiquing is difficult because users can perform ANY actions between the time they start the task and when they ask for an evaluation. Unlike many critic systems, which are only checking constraints on final solutions, our system examines the entire history. We would like to be able to jump in if we aren’t following the user at all. We would also like to control the event stream by blocking events or sending certain events to the applications.
We would like to evaluate the system in the context of normal use. Before we do this, we want to try to encode all of the application knowledge needed to track what a normal user does on an application like the Text Editor during a typical day. This will require much better task tracking and an expanded vocabulary. It also may require that users answer questions about their behavior.
It may seem that the recognition rules and task model encode the same information: how to perform tasks. While this is true, it is likely that rules for recognition are specially tuned in humans. Consider how easy it is to recognize some tasks, but much harder to execute them. Similarly, we may be able to encode enough knowledge for the Task Critic to be able to recognize a good solution without being able to generate it. However, it might be possible to share some knowledge between the task model and the recognition rules (e.g., the contexts).
As users become more familiar with the application, they are able to keep more of their own actions and the system’s responses in memory, probably using a primarily visual representation. Thus, we conjecture that the grain size of the interaction history being evaluated should increase with the user’s experience and critiques will be more natural if they exactly reproduce portions of the user’s history on the application. We are incorporating these ideas into future versions of the Task Critic.
Because our critiquing system provides commentary on events that have already transpired, so it is common for the critique to refer to user interface elements and data that is not current displayed by any application. One approach is to "record" the interaction and reproduce it with commentary, an approach taken by ‘reflective’ problem-solving systems [4]. Another approach is to produce feedback immediately when a discrepancy is noticed between the task model and the user behavior [5].
We would like to extend our framework to handle collaborative work in organizations. We would need to include primitives for describing constraints on resources and users as well as timing.
CONCLUSIONS
We have introduced an architecture for application-aware adjuncts that separates the interpretation of events from how they are used. It is able to capture events from multiple applications. We have showed how the adjuncts can assist with particular tasks and build a hierarchical interpretation of events. By separating event interpretation from task goals, the system supports critiquing of the entire history of events surrounding a task.
ACKNOWLEDGMENTS
We would like to thank Dan Oblinger for numerous discussions, Jacob Ukelson for his interest and support, and Dick Lam for helping with the GUI code review.
REFERENCES
Conati, C. and VanLehn. Teaching meta-cognitive skills: implementation and evaluation of a tutoring system to guide self-explanation while learning from examples. In Proceedings of AIED’99: the 9th World Conference on Artificial Intelligence and Education, Le Man, France, 1999.