|
|
OverviewImagine
being able to obtain information, not through an assortment of
independently predesigned web sites, but rather through an intelligent
multimodal conversation that is tailored to the task you are
performing, customized to your personal preferences, and adapted to
your context and interaction devices. To realize this vision, we are
building an intelligent interaction framework, called Responsive
Information Architect (RIA), which aims to support full-fledged,
context-sensitive information seeking.
Using our intelligent
multimodal
input and multimedia output
technologies, RIA engages users in a
dynamically generated multimodal, multimedia conversation to aid
users in their information seeking. Unlike existing information
browsing paradigm that forces users to explore information following
pre-defined paths (e.g., GUI menus), RIA allows users to express their
information requests in a flexible way using multiple modalities, such
as speech, text, and gesture. Using a rich context, such as
conversation history and data semantics, RIA is capable of
understanding user inputs, including these complex data queries (e.g.,
"find houses in cities in the north along hudson river with at least
5000 people"), abbreviated requests (e.g., "what about golf courses"),
and imprecise and ambiguous ones. By tracking and mining user
navigation patterns, RIA is able to aid users in navigating large and
complex information spaces proactively. In particular, RIA can help
users to refine their queries by indicating the most effective
navigation path when there is too much data retrieved, or to
automatically relax user queries by recommending the most similar data
when there is no data found. In addition, to provide users with a
customized presentation of retrieved information, RIA automatically
synthesizes a multimedia tour of information using graphics, speech,
and video. This component is responsible for understanding the meaning of
user multimodal inputs. Once individual unimodal inputs (e.g., speech
and gesture) are recognized and parsed, our interpreter attempts to
sort out the meanings of user inputs at two levels by using a wide
variety of contexts: turn level (what user means at this particular
turn of conversation) and discourse level (how the current user input
relates to the overall conversation). As a result, our interpreter
formulates a set of communicative goals that capture the meanings of
user inputs. Our current interpreter is a realization of our
semantics-based multimodal interpretation framework, TAICHI. Presentation BrokerUpon receiving the conversation acts, the presentation broker sketches a multimedia draft that expresses the content and the structure of a new discourse segment, with proper media punctuation and coordination cues. Based on this draft, the language and visual designers work with each other via a coordinator to author a multimedia blueprint that contains a set of media-specific acts along with spatial/temporal constraints. By solving all the constraints, RIA translates the blueprint into a multimedia script ready for the media producer to render. The presentation broker is an instantiation of our automated multimedia presentation framework, IMPRESA.Visual DesignerAccording to the generated presentation draft, the visual designer creates a conneted series of displays to depict the intended information visually. The visual designer is an instantiation of our automated graphics generation framework IMPROVISE.Language DesignerBased on the formulated presentation draft, our language designer automatically composes natural English speech to convey the desired information. Our current language designer is an instantiation of our languae generation framework SEGUE.Information ServerTo support all components described above, an information server supplies various kinds of contextual information in a uniform manner. In particular, There are four types of information available: domain data (e.g., house data in a real-estate domain), conversation history (e.g., all detailed exchanges between RIA and a user), user model (e.g., user profiles), and environment model (e.g., device capabilities). In addition, domain data are augmented with a rich set of meta data, such as media preference (e.g., spatial regions prefer visual media), to facilitate other components decision making process (e.g., media allocation). RIA's information server is part of KARMA infrastructure supporting ontology management using IRIS, spatial data management using ESRI servers and domain data management using structured sources like DB2 and Oracle.
Publications
|
|
||||||||||||||||||