|
|
In
a highly dynamic user-computer interaction system as we support, it
is difficult to predict how the interaction would unfold. It is thus
impractical to plan in advance the content and forms of all possible
system responses.
To
tailor system responses to a user interaction
context, we have developed an intelligent multimedia presentation
framework, called YOGA (Your Oration and Graphics Automation). YOGA
consists of three key modules in supporting the creation of a
tailored, multimedia system output: 1) content selection that
dynamically decides the proper response content (e.g., what a sub-set
of data attributes to present), 2) media allocation that allocates the
most suitable presentation media such as speech or graphics to convey
the intended content, and 3) media-specific designers that design the
most effective media-specific presentation form (e.g., graphical or
verbal output).
More importantly, we take practical issues into account when
developing YOGA technologies to achieve desired system coverage and
extensibility. Specifically, we devise an optimization-based framework
to select response content and allocate suitable media. We combine
machine learning with other approaches to dynamically synthesize
verbal and visual responses. In addition to tailoring responses to
individual requests, our approaches also leverage user input patterns
to customize responses to a specific user interaction flow.
In particular, we have developed a natural speech generation system,
called SEGUE (Spoken English Generation Using Examples), which can
automatically produce natural and coherent spoken utterances in a
conversational setting [Pan-INLG04]. SEGUE uses an existing sentence
corpus to dynamically decide proper words, utterance structures, and
sentence boundaries [Pan-ACL05] (e.g., how to generate multiple
sentences to convey the intended content).
Similar to SEGUE, we have developed IMPROVISE*, which is an extension
of a previous planning-based graphics generation system IMPROVISE, to
produce coherent, rich visual output. Unlike IMPROVISE, IMPROVISE*
can cover much more dynamic user interaction situations and deal with
large and complex data sets by learning from and combining existing
graphics examples [Zhou-IJCAI03].
Our learning engines for both SEGUE and IMPROVISE* not only can reuse
suitable examples, but they can also compose new forms of outputs by
dynamically combining different example fragments [Pan-INLG04,Zhou-IJCAI03]. As a
result, we can cover a wide range of interaction situations using only
a small number of examples. For example, IMPROVISE* uses about 20
visual examples and SEGUE uses around 200 sentence examples for our
real estate application that covers 25+ concepts, each with a number
of attributes (e.g., a house has 40 attributes). The usage of a small
example set helps to set up a system quickly. Moreover, we can easily
extend a system's capability by adding new examples. Nonetheless, a
case-based learning engine alone is inadequate in meeting all our
needs. For example, it is inefficient to use case-based learning to
abstract sentence aggregation rules, since it would require a large
number of examples. Similarly, case-based learning is inefficient in
learning precise visual arrangements (e.g., exact positions and
sizes). Thus, we use case-based learning to learn overall presentation
structures (e.g., visual or sentence structure) and use other
approaches to fine tune presentation details (e.g., layout).
Feature followup is derived during TAICHI's input interpretation to
signal whether a given user request is new or a continuation of a
previous request. In Figure 1, U2 is a follow-on of U1, since it
inherits certain data constraints specified in U1. To maintain
semantic continuity between follow-up requests, the visual designer
uses this feature to compute the amount of visual content overlap
between two successive visual responses. In general, YOGA maximizes the
overlap between follow-on requests, while reducing the overlap when a
new flow starts [Wen-InfoVis05].
Another derived input feature navDirection, indicating the change of direction in user data navigation, also influences YOGA response creation. When exploring a data space, a user may change his data foci in several ways (Figure 1): filtering a data set (U3), expanding a data set (U4), or switching to a different data set (U5). To tailor YOGA output to a user interaction flow, both our visual and verbal designers exploit this feature. First, navDirection helps the visual designer to decide the amount of visual context to be maintained between displays. For example, if the system detects that a user is narrowing down a data set, it will reduce visual content overlap across displays to let users focus on the filtered data set [Wen-InfoVis05]. Likewise, this feature helps the language designer to decide how much information it should repeat in successive verbal responses. To avoid repetitions, the language designer generates progressively more terse expressions, such as ellipses, in response to a series of similar requests (R3). It could also use this feature to generate more informative responses, confirming the current navigation direction (R3’).
Publications
|
|
|||||||||||||||||||