|
Internet Transcoding Technologies for Universal Access
AbstractA growing diversity of pervasive devices is gaining access to
the Internet and other information sources. However, much of the rich multimedia content cannot be easily handled by the client devices with limited communication, processing, storage and display
capabilities. In order to improve access, we are developing a system for transcoding multimedia and Internet content. The system uses an InfoPyramid for managing and manipulating multimedia content composed
of video, images, audio and text. The InfoPyramid manages the different versions of media objects with different fidelities and modalities and generates and selects the versions in order to adapt the delivery to
different client devices. The system enables scalable multimedia delivery for a variety of client devices, including personal digital assistants (PDAs), hand-held computers (HHCs), smart phones, TV browsers and
color PCs. Introduction
An increasing amount of electronic information takes the form of multimedia -- an integration of images, video, graphics, audio and text. Many technologies are being developed for
compression, indexing, searching and filtering in order to better manage multimedia data. However, enabling effective multimedia content access for a wide diversity of client devices
is becoming one of the important emerging problems in the generation of pervasive computing. New classes of pervasive computing devices such as personal digital assistants (PDAs), hand-held computers (HHC), smart
phones, automotive computing devices, and wearable computers allow users more ubiquitous access to information than ever. Many of the devices have capabilities of serving as calendar
tools, address books, pagers, global positioning devices, travel and mapping tools, e-mail clients, and Web browsers. As users are beginning to rely more heavily on pervasive
computing devices, there is a growing need for applications to bring multimedia information to the devices. However, due to limited device capabilities -- in terms of the display size,
storage, processing power, and network access -- there are new challenges for designing the applications that allow these devices to effectively access, store and process multimedia
information. Concurrent with the developments in pervasive computing, advances in storage, networking and authoring tools are driving the production of large amounts of rich multimedia
content. The result is a growing mismatch between the available rich content and the capabilities of the client devices to access and process it. The mismatch between content
requirements and client capabilities impacts a number of applications including Internet access and Web browsing, multimedia presentation and digital libraries. InfoPyramid Framework
In order to improve the accessibility of multimedia content by pervasive computing devices, we are developing a scalable multimedia delivery system. We represent the media objects
that constitute the multimedia documents using an InfoPyramid data model. The InfoPyramid manages the different versions of the media objects with different modalities and fidelities. The
scalable multimedia delivery is then enabled by either
- storing, managing, selecting, and delivering different versions of the media objects in the InfoPyramid in order to adapt the multimedia documents to the client devices, or
- manipulating the media objects on-the-fly, such as by using methods in the InfoPyramid for text-to-speech translation, image transcoding and summarization.
This allows the multimedia content delivery to adapt to the wide diversity of client device capabilities for communication, processing, storage, and display. The InfoPyramid provides a general framework for managing and manipulating media objects. The InfoPyramid manages different
versions of media objects with different modalities (video, image, text, and audio) and fidelities (summarized, compressed, and scaled versions). The InfoPyramid also provides and
manages the translation and summarization methods that generate the different versions of the media objects.
Each media object is represented by a cell in the InfoPyramid.
For example, the cell in the lower-left corner of the InfoPyramid corresponds to a high-resolution video. The cells above this one in the video column correspond to the lower-resolution or
compressed versions (lower fidelity) of the video. The cells to the right correspond to the different image, text and audio versions (different modalities) of the video sequence. On the
other hand, the cell in the bottom of the text column corresponds to a full-detailed body of text. The cells above it in the text column correspond to the summarized and compressed
versions (lower fidelity) of the text body. The cells in the audio column correspond to different versions of the text rendered as audio (different modality), such as by text-to-speech conversion.
Translation and summarization
In general, the manipulation operations for media objects in the InfoPyramid alter their modality and fidelity. The translation
methods convert the media objects to different modalities, such as text to audio, or video to images. The summarization methods generate different versions within the same modality,
but with different fidelity. For example, the summarization methods compress the images, summarize text, and generate video abstractions. The translation and summarization methods
can be cascaded to change both the modality and fidelity of the media objects.
InfoPyramid data model
The InfoPyramid provides a complete data model for managing
and manipulating the media objects. The InfoPyramid data model consists of classes for the different modalities and fidelities (data) and transcoders (methods). The data model
distinguishes between the two types of transcoders -- translators and summarizers. The data model is extensible in that, initially, we define four modalities -- video, image, text
and audio. Other modalities can be added, such as 3-D graphic models and text languages. We initially define several transcoders such as text-to-speech conversion, image
transcoding, video transcoding, video-to-image key-frame extraction and text summarization. The data model can also be extended by adding new transcoders by deriving from the translator or summarizer classes.
Content Adaptation
The InfoPyramid framework allows the content to be summarized, translated and converted as needed in order to adapt it to the client devices. For example, the Web page
below is adapted differently to different devices, such as to color workstations, PDAs, and cell phones. Workstation connected to a local area network: In this case, the client device is capable of retrieving and displaying the Web page in its original form.
PDA using a wireless modem: The device cannot handle large, colorful images and has minimal bandwidth. To adapt the Web
page, the image is converted to black and white and is compressed. The text is summarized to one paragraph, and the video is delivered as text, based on related text.
Cellular phone: In this case, only the header is shown on the screen of the cellular phone, while the text, image and video content are all converted to speech.
Architecture The architecture of InfoPyramid transcoder consists of four components:
- InfoPyramid Representation:
There is an InfoPyramid representation for each multimedia content component, regardless whether it is an image, a video clip, a section of
text, or a combination of the above. This representation can accommodate multiple modalities and multiple levels of details within each modalities.
- Content Analysis:
Content analysis modules analyze the inter-relationship between different sections of the content. For example, these analysis modules determine the nature
of images (whether they are graphics or images, as shown below), the purpose of image (for navigational, button, or related to the textual content).
- Transformation
: Examples of modality translation and summarization modules include language translators, text summarizers, modules that generate progressive image
representations, speech recognition and text-to-speech synthesis.
- Content Customization:
These modules customize the content based on the user interests, and client platform capabilities to determine the most appropriate format of
the content that will be sent to the client.
|