Responsive Information Architect

 

 

 

 

Home > Research > Projects > RIA

Overview

Imagine being able to obtain information, not through an assortment of independently predesigned web sites, but rather through an intelligent multimodal conversation that is tailored to the task you are performing, customized to your personal preferences, and adapted to your context and interaction devices. To realize this vision, we are building an intelligent interaction framework, called Responsive Information Architect (RIA), which aims to support full-fledged, context-sensitive information seeking.

Using our intelligent multimodal input and multimedia output technologies, RIA engages users in a dynamically generated multimodal, multimedia conversation to aid users in their information seeking. Unlike existing information browsing paradigm that forces users to explore information following pre-defined paths (e.g., GUI menus), RIA allows users to express their information requests in a flexible way using multiple modalities, such as speech, text, and gesture. Using a rich context, such as conversation history and data semantics, RIA is capable of understanding user inputs, including these complex data queries (e.g., "find houses in cities in the north along hudson river with at least 5000 people"), abbreviated requests (e.g., "what about golf courses"), and imprecise and ambiguous ones. By tracking and mining user navigation patterns, RIA is able to aid users in navigating large and complex information spaces proactively. In particular, RIA can help users to refine their queries by indicating the most effective navigation path when there is too much data retrieved, or to automatically relax user queries by recommending the most similar data when there is no data found. In addition, to provide users with a customized presentation of retrieved information, RIA automatically synthesizes a multimedia tour of information using graphics, speech, and video.


RIA Architecture

RIA Infrastructure Multimodal InterpreterConversation facilitatorPresentation Broker Visual Designer Language Designer Information Server

Legend

 

Multimodal Interpreter

This component is responsible for understanding the meaning of user multimodal inputs. Once individual unimodal inputs (e.g., speech and gesture) are recognized and parsed, our interpreter attempts to sort out the meanings of user inputs at two levels by using a wide variety of contexts: turn level (what user means at this particular turn of conversation) and discourse level (how the current user input relates to the overall conversation). As a result, our interpreter formulates a set of communicative goals that capture the meanings of user inputs. Our current interpreter is a realization of our semantics-based multimodal interpretation framework, TAICHI.

Conversation Facilitator

Based on the current conversation context, including the conversation history and domain data, the facilitator first generates all feasible moves that indicate how RIA may converse with a user. For example, RIA may ask the user for more information, or simply present certain information to the user. Among all the feasible moves, the facilitator then selects the best move for the current context, by considering a number of factors, including data properties (e.g., data volume and complexity) and communication obligations (e.g., the response relevance). Once the right move is chosen, the facilitator formulates a set of corresponding conversation acts, which indicate what type of multimedia response should be generated.

Presentation Broker

Upon receiving the conversation acts, the presentation broker sketches a multimedia draft that expresses the content and the structure of a new discourse segment, with proper media punctuation and coordination cues. Based on this draft, the language and visual designers work with each other via a coordinator to author a multimedia blueprint that contains a set of media-specific acts along with spatial/temporal constraints. By solving all the constraints, RIA translates the blueprint into a multimedia script ready for the media producer to render. The presentation broker is an instantiation of our automated multimedia presentation framework, IMPRESA.

Visual Designer

According to the generated presentation draft, the visual designer creates a conneted series of displays to depict the intended information visually. The visual designer is an instantiation of our automated graphics generation framework IMPROVISE.

Language Designer

Based on the formulated presentation draft, our language designer automatically composes natural English speech to convey the desired information. Our current language designer is an instantiation of our languae generation framework SEGUE.

Information Server

To support all components described above, an information server supplies various kinds of contextual information in a uniform manner. In particular, There are four types of information available: domain data (e.g., house data in a real-estate domain), conversation history (e.g., all detailed exchanges between RIA and a user), user model (e.g., user profiles), and environment model (e.g., device capabilities). In addition, domain data are augmented with a rich set of meta data, such as media preference (e.g., spatial regions prefer visual media), to facilitate other components decision making process (e.g., media allocation). RIA's information server is part of KARMA infrastructure supporting ontology management using IRIS, spatial data management using ESRI servers and domain data management using structured sources like DB2 and Oracle.


Publications

  • Michelle X Zhou, Keith Houck, Shimei Pan, James Shaw, Vikram Aggarwal, and Zhen Wen. Enabling Context-Sensitive Information Seeking. Proceedings of Intelligent User Interface (IUI) , 2006, to appear.