Human-Computer Interaction Understanding
Robert Farrell
Learning Sciences Research Group
Research Statement
This research is concerned with understanding how people use software systems and making use of this information to improve human performance and business productivity. In particular, we are concerned with capturing, analyzing, and interpreting human-computer interactions and using this information in 'real-time' for task/learning support and/or business process improvement.
In terms of technology, the work is akin to Natural Language Understanding (NLU). NLU systems build plausible interpretations of natural utterances by people in the real world, which can be leveraged for a variety of applications, including speech control and more natural database queries. Our Human-Computer Interaction Understanding systems will build plausible interpretations of user actions and system responses on computer systems, which can be used to provide assistance or learning support to the use, to build software that is more intelligent in learning from and adapting to users, or to enable highly reactive, intelligent business processes.
Research Questions
When people learn a new application, such as a word processor or database retrieval system -- what do they learn, how is it encoded, and what difficulties or impasses do they encounter? How much of what they learn is tied to the particular user interface or application behavior -- and how much is general? What kinds of behavior to people exhibit on these applications? How do people notice what options are available to them, where their output is located, if they are in an error state? How much variability is there across individuals in a given work group, center, or company.
How can we improve user performance on desktop applications? Does reflection help? Critiquing? How immediate do such interventions need to be?
Training Application
We are building a critiquing system that lets the user select a task (i.e., looking up a person in callup), fill in parameters to that task (i.e., name and location), and 'watches over their shoulder' as they perform this task. It outputs a critique consisting of: things they did correctly, things they did not do, and things they did that were unexpected. Our system is able to look at more than 1 application. Application interfaces should be dialog-based but could be Lotus Notes-based o r Web-based. We are currently working with editing and searching tasks.
Components
The Capture Agent is a small, efficient C++ program that uses 'hooks' provided by the Windows operating system to capture and filter over 100 different types of user and system events, including keyboard, mouse, dialog interactions, menus, commands, and windows. Events can be hierarchically categorized and then filtered or reported. During classification, information can be gathered and reported (i.e., window parents).
The Recognition Engine is a Java program that works on a sequence of events. Events are matched and summarized at a higher level. Matching consists of checking consistency of locations (e.g., same window) and objects (e.g., same file). Recognizing a given pattern may involve a need to reference context (e.g., knowledge of object contents or attributes). In addition, pattern recognition may involve time constraints (e.g., the nearest event of a given type). Typically, system responses appear soon after an input, but applications may be asynchronous. The Recognition Engine outputs a set of plausible interpretations. Each interpretation is a tree showing how a higher-level event was recognized from lower-level events.
The Evaluator takes a description of a task, including goals to achieve and optionally how to achieve them, and determines whether a likely interpretation of the events achieves the goals. It notes task steps that were omitted as well as unexpected steps that do not fit into the interpretation. A crude natural language generator outputs the results.
Current Directions
We would like the system to work with as few hints as possible about tasks. Toward this end, we will be experimenting with algorithms to concurrently match a number of possible task patterns. We will also be looking at situations were we don't know the beginning and end of a task.
We are expanding the system to be report interpretations to a centralized server. A client-server implementation could collect task information across multiple users to provide overall usage statistics or could collect information about work flows or collaborative tasks.
We need to make the recognition engine more sophisticated. It needs to recognize repeated attempts at the same task, iterative tasks (e.g., find all occurrences of the name in the file), and error outcomes. It needs better primitives for reasoning about temporal order.
We would like to have a variety of ways of hooking the recognition engine into applications. Currently, we put our 'capture agent' between the operating system (Windows 32-bit) and the applications by extending operating system code. This reports the low-level user and system events across all applications. We would also like to accept events directly from applications that can provide them.
When a user signals that they have finished or a task interpretation is highly likely, we would like to be able to automatically initiate an action, such as running a Java program, starting a script, or triggering a work flow.