Browser must be script enabled to view current content
Transcoding In Depth
Home
Demos
In Depth
White Paper
Standards
Publications
About Us
Contact

Internet Transcoding Technologies for Universal Access

Abstract

A growing diversity of pervasive devices is gaining access to the Internet and other information sources.  However, much of the rich multimedia content cannot be easily handled by the client devices with limited communication, processing, storage and display capabilities.  In order to improve access, we are developing a system for transcoding multimedia and Internet content.  The system uses an InfoPyramid for managing and manipulating multimedia content composed of video, images, audio and text.  The InfoPyramid manages the different versions of media objects with different fidelities and modalities and generates and selects the versions in order to adapt the delivery to different client devices.  The system enables scalable multimedia delivery for a variety of client devices, including personal digital assistants (PDAs), hand-held computers (HHCs), smart phones, TV browsers and color PCs.

Introduction

An increasing amount of electronic information takes the form of multimedia -- an integration of images, video, graphics, audio and text.  Many technologies are being developed for compression, indexing, searching and filtering in order to better manage multimedia data.  However, enabling effective multimedia content access for a wide diversity of client devices is becoming one of the important emerging problems in the generation of pervasive computing.

New classes of pervasive computing devices such as personal digital assistants (PDAs), hand-held computers (HHC), smart phones, automotive computing devices, and wearable computers allow users more ubiquitous access to information than ever.  Many of the devices have capabilities of serving as calendar tools, address books, pagers, global positioning devices, travel and mapping tools, e-mail clients, and Web browsers.

As users are beginning to rely more heavily on pervasive computing devices, there is a growing need for applications to bring multimedia information to the devices.  However, due to limited device capabilities -- in terms of the display size, storage, processing power, and network access -- there are new challenges for designing the applications that allow these devices to effectively access, store and process multimedia information.  Concurrent with the developments in pervasive computing, advances in storage, networking and authoring tools are driving the production of large amounts of rich multimedia content.  The result is a growing mismatch between the available rich content and the capabilities of the client devices to access and process it.  The mismatch between content requirements and client capabilities impacts a number of applications including Internet access and Web browsing, multimedia presentation and digital libraries.

InfoPyramid Framework

In order to improve the accessibility of multimedia content by pervasive computing devices, we are developing a scalable multimedia delivery system.  We represent the media objects that constitute the multimedia documents using an InfoPyramid data model.  The InfoPyramid manages the different versions of the media objects with different modalities and fidelities.  The scalable multimedia delivery is then enabled by either

  • storing, managing, selecting, and delivering different versions of the media objects in the InfoPyramid in order to adapt the multimedia documents to the client devices, or
  • manipulating the media objects on-the-fly, such as by using methods in the InfoPyramid for text-to-speech translation, image transcoding and summarization.
     

This allows the multimedia content delivery to adapt to the wide diversity of client device capabilities for communication, processing, storage, and display.

The InfoPyramid provides a general framework for managing and manipulating media objects.  The InfoPyramid manages different versions of media objects with different modalities (video, image, text, and audio) and fidelities (summarized, compressed, and scaled versions).  The InfoPyramid also provides and manages the translation and summarization methods that generate the different versions of the media objects.

Each media object is represented by a cell in the InfoPyramid.  For example, the cell in the lower-left corner of the InfoPyramid corresponds to a high-resolution video.  The cells above this one in the video column correspond to the lower-resolution or compressed versions (lower fidelity) of the video.  The cells to the right correspond to the different image, text and audio versions (different modalities) of the video sequence.  On the other hand, the cell in the bottom of the text column corresponds to a full-detailed body of text.  The cells above it in the text column correspond to the summarized and compressed versions (lower fidelity) of the text body.  The cells in the audio column correspond to different versions of the text rendered as audio (different modality), such as by text-to-speech conversion.

Translation and summarization

In general, the manipulation operations for media objects in the InfoPyramid alter their modality and fidelity.  The translation methods convert the media objects to different modalities, such as text to audio, or video to images.  The summarization methods generate different versions within the same modality, but with different fidelity.  For example, the summarization methods compress the images, summarize text, and generate video abstractions.  The translation and summarization methods can be cascaded to change both the modality and fidelity of the media objects.

InfoPyramid data model

The InfoPyramid provides a complete data model for managing and manipulating the media objects.  The InfoPyramid data model consists of classes for the different modalities and fidelities (data) and transcoders (methods).  The data model distinguishes between the two types of transcoders -- translators and summarizers.  The data model is extensible in that, initially, we define four modalities -- video, image, text and audio.  Other modalities can be added, such as 3-D graphic models and text languages.  We initially define several transcoders such as text-to-speech conversion, image transcoding, video transcoding, video-to-image key-frame extraction and text summarization.  The data model can also be extended by adding new transcoders by deriving from the translator or summarizer classes.

Content Adaptation

The InfoPyramid framework allows the content to be summarized, translated and converted as needed in order to adapt it to the client devices.  For example, the Web page below is adapted differently to different devices, such as to color workstations, PDAs, and cell phones.

Workstation connected to a local area network: In this case, the client device is capable of retrieving and displaying the Web page in its original form.

 

 

PDA using a wireless modem: The device cannot handle large, colorful images and has minimal bandwidth. To adapt the Web page, the image is converted to black and white and is compressed. The text is summarized to one paragraph, and the video is delivered as text, based on related text.

 

 

Cellular phone: In this case, only the header is shown on the screen of the cellular phone, while the text, image and video content are all converted to speech.

 

 

Architecture

The architecture of InfoPyramid transcoder consists of four components:

  • InfoPyramid Representation: There is an InfoPyramid representation for each multimedia content component, regardless whether it is an image, a video clip, a section of text, or a combination of the above. This representation can accommodate multiple modalities and multiple levels of details within each modalities.
  • Content Analysis: Content analysis modules analyze the inter-relationship between different sections of the content. For example, these analysis modules determine the nature of images (whether they are graphics or images, as shown below), the purpose of image (for navigational, button, or related to the textual content).
  • Transformation: Examples of modality translation and summarization modules include language translators, text summarizers, modules that generate progressive image representations, speech recognition and text-to-speech synthesis.
  • Content Customization: These modules customize the content based on the user interests, and client platform capabilities to determine the most appropriate format of the content that will be sent to the client.