|
|
Motivation
It
is highly desirable to have a robust and
accurate input interpretation engine that can understand diverse
user expressions in context. Since building a general purpose
interpretation engine is very difficult, we focuses only on
understanding user inputs for information-seeking applications.
Our current multimodal input interpretation framework is called
TAI-CHI (Two-Way Adaptative
Interpretation for Context-sensitive
Human-computer Interaction).
Approach
We
employ three complementary strategies to enable robust and accurate
interpretation of diverse user data requests in context. First, we
use a context-driven approach to optimize TAI-CHI interpretation of
multimodal inputs by
exploiting various contexts simultaneously. Second, we provide system
guidance in context to allow user and TAI-CHI to adapt to each other's
expressions over time. Third, we leverage the strength of multiple
modalities to achieve robust interpretation.
Context-driven interpretation
Currently, TAI-CHI focuses on user requests to databases. As we
have observed (e.g., from our WOZ study), while these requests
exhibit substantial syntactic variations, they share a common
semantic structure. Based on this observation, we use a set of semantic constructs to model a user
request. Specifically, a user request includes two top-level constructs:
intention and attention. Intention encodes the user information-
seeking task (e.g., data access or comparison). Attention
captures the data target of the intention, made up of lower-level
constructs, such as data concepts/attributes to be retrieved, and a
set of constraints that the retrieved data must satisfy. It also
includes derived meta features that characterize the overall properties
of a request. Such meta features are used to tailor TAI-CHI
responses to the query context.
To interpret an input, TAI-CHI first identifies various semantic constructs
using a lexicon that is largely derived automatically from the databases.
TAI-CHI then resolves references and semantic
ambiguities by uniformly modeling contextual cues as a set of constraints, including conversation
history, data semantics, and syntactic information derived
from a syntactic parser. It then uses optimization-based
approaches to derive the most probable interpretation of an input
by maximizing the satisfaction of all constraints. As a result, TAI-CHI is able to consider all constraints
simultaneously to optimize interpretation.
As a result, TAI-CHI can handle a wide range of user expressions
regardless their syntactic forms, ranging from keywords (e.g.,
"colonials 3+ bedrooms") to full English sentences, all in context.
Such flexibility is much appreciated in a practical application,
where TAI-CHI must accommodate various user linguistic styles, and
tolerate imperfect user inputs (e.g., abbreviated and ungrammatical
expressions). Moreover, our approach helps to minimize the effort
for supporting new domains, since it does not require a large training
corpus or a large set of syntactic rules.
Adaptive
interpretation
Despite our effort described above to help achieve more accurate
and robust interpretation, TAI-CHI’s interpretation capability may
still be insufficient for our targeted, real-world applications.
Instead of directly improving TAI-CHI interpretation capability in a
conventional way, we build a two-way adaptation engine that
allows both users and TAI-CHI to dynamically adapt to each other’s
expressions in the course of interaction [Pan-IUI05].
Consequently, the adaptation enhances the usability of TAI-CHI
by turning a novice user into a power user, who can work effectively
within TAI-CHI’s capability. Moreover, TAI-CHI improves its interpretation
capability through self-adaptation, minimizing the
overall effort of developing an effective interaction system.
Leveraging GUIs and language inputs
Besides combining language inputs and deictic gestures as in
other systems, we have explored the usage of GUIs to
complement language inputs for two reasons. First, it is easier for
users to use GUIs to express certain data requests (e.g., using a
slider for dynamic data query). Second, GUI inputs are explicit
and thus help TAI-CHI to process the accompanying language input.
By default, TAI-CHI interprets a user request in the context of previous
requests. However, users may break from the previous conversation
flow without explicitly signaling using language cues. While TAI-CHI is able
to detect some of these breaks, it also lets users use a GUI button to explicitly signal
the start of a new flow. In fact, users can use different GUI buttons
to control a conversation flow, including interrupting a TAI-CHI
response (barging in), starting over (wiping out the entire
conversation
Publications
-
Joyce Chai, Shimei Pan and Michelle X. Zhou.
MIND: A Context-based Multimodal Interpretation Framework in
Conversation Systems.
Natural, Intelligent and Effective Interaction in
Multimodal Dialogue Systems, J. Kuppervelt, L. Dybkjaer and
N. Bernsen (eds). Kluwer. 2005. To appear.
-
Shimei Pan, Siwei Shen, Michelle X. Zhou and Keith Houck.
Two-Way Adaptation for Robust Input Interpretation for Practical
Multimodal Interaction.
Proceedings of ACM Conference on Intelligent User Interfaces
(IUI), pages 25-32, 2005.
-
Joyce Chai, Pengyu Hong and Michelle X. Zhou.
A Probabilistic Approach to Reference Resolution in Multimodal
Interfaces.
Proceedings of ACM Conference on Intelligent User Interfaces
(IUI),
pages 70-77, 2004.
-
Joyce Chai, Pengyu Hong, Michelle X. Zhou and Zahar Prasov.
Optimization in Multimodal Interpretation.
Proceedings of Association of Computational Linguistics
(ACL), pages 1-8, 2004.
-
Keith Houck.
Contextual Revision in Information Seeking Conversation Systems.
Proceedings of International Conference on Spoken Language
Processing (ICSLP), 2004.
-
Joyce Chai.
Semantics-based Representation for Multimodal Interpretation in
Conversational Systems.
Proceedings of International Conference on Computational
Linguistics (COLING), 2002.
-
Joyce Chai.
Operations for Context-based Multimodal Interpretation.
Proceedings of International Conference on Spoken Language
Processing (ICSLP), 2002.
-
Joyce Chai, Shimei Pan and Michelle X. Zhou.
MIND: A Semantics-based Multimodal Interpretation Framework for
Conversation Systems.
Proceedings of International CLASS Workshop on Natural,
Intelligent and Effective Interaction in Multimodal Dialog
Systems , 2002.
-
Joyce Chai, Shimei Pan, Michelle X. Zhou and Keith Houck.
Context-based Multimodal Input Understanding in Conversational
Systems.
Proceedings of IEEE International Conference on Multimodal
Interfaces (ICMI), pages 87-92, 2002.
|
|