|
In recent years, advances in computer and networking
technologies have led to the development of powerful workstations with
audio and video capabilities, server machines with high-capacity
storage devices, and broadband networks that support quality-of-service
(QoS) guarantees. These advances have
spurred interest in the development of distributed multimedia
applications. Deployment of such applications would be
facilitated by the availability of service-enabling software
that hides the details of the underlying network infrastructure from
the application developer. Research is also required to fully
understand the communication requirements of these applications and the
corresponding implications for system and network design.
An important class of distributed multimedia applications is
"presentational" applications, where multimedia documents featuring
continuous (audio and video) or discrete (image and text) data are
accessed interactively by remote users. Application areas include
multimedia news, digital libraries, home shopping, and distance
education. The success of this type of interactive service is heavily
dependent on the ability to deliver the service to a large community of
users in an effective manner.
In September 1993, the Canadian Institute for Telecommunications
Research (CITR), in collaboration with the
IBM Toronto Laboratory Centre for Advanced
Studies (CAS), initiated a major project
on broadband services. The goal of this major project is to provide the
software technologies required to support the development of
distributed multimedia applications. Our work is focused on
presentational applications, where emphasis is placed on efficiency and
service flexibility. By efficiency, we mean the ability to support many
users and many multimedia documents. By service flexibility, we mean
that the application is able to support a wide range of
QoS requirements from the users, adapt to
changing network conditions, and support multiple document types.
The research program is organized as six constituent projects, which
are investigated by a multidisciplinary team from eight institutions
across Canada. An important activity is the development of an
integrated prototype using the research results from the constituent
projects. Multimedia news has been selected as a target application for
development--collaborators at IBM Canada
played a major role in this selection. In this paper, the system
architecture, research results, and prototyping effort are presented.
First, we discuss the key design decisions and the organization of the
Broadband Services research program. Next, the details of the overall
system are described. These include the application programming
interface, the various software modules, and the communications among
the various modules. Our effort in developing an integrated prototype,
and the challenges that we have encountered, are also discussed.
Besides the integrated prototype, the research team has made
advancements to the various research areas under investigation. These
accomplishments are discussed, followed by concluding remarks and a
discussion of future research directions.
Design decisions and project organization
Conceptually, our system is a distributed system where multimedia
documents, stored in databases and file servers, are accessed by remote
users over a broadband network. Its design is based on the
following decisions:
- Uniform treatment of content data and meta-data: In a
multimedia document, the content data correspond to the
text, image, audio, and video data, and meta-data contain
descriptive information about the content data. Meta-data include
annotation information such as keywords, author, and date of creation,
as well as information relevant to system operation, such as document
structure and encoding schemes for audio or video. Our system treats
meta-data and content data uniformly with respect to querying.
- Use of an object-oriented database: In general, a multimedia
document is a structured complex object containing a number of media
objects (video, audio, image, or text). Video and audio objects are
generally large, consisting of digitized samples of analog data. There
is no simple structure to these objects as there is, for example, to
the name, address, and salary attributes of an employee object in a
traditional database management system
(DBMS). Video and audio objects also have
temporal and spatial relationships to one another. Relational
DBMSs are not suitable for supporting
multimedia documents because (1) they are designed to efficiently
manage large numbers of small objects, and (2) they manage fixed data
types and are not extensible. We have adopted an object-oriented
DBMS because features such as abstraction
and encapsulation of complex objects, an extensible type system, and
support for representing various hierarchies are more suitable for
meeting the requirements of multimedia applications.
- Development of a continuous media file server: Content data
come in two varieties depending on whether the data are continuous
(audio and video) or discrete (image and text).
[1] Image and
text data are stored in the DBMS. For
audio and video data, our system must provide guarantees of delivery,
as well as support for synchronization of independent media streams and
QoS. We have therefore decided to develop
a special-purpose continuous-media file server. A consequence of this
decision is that continuous data and discrete data may be stored on
separate servers.
- Synchronization of multiple media streams: In our system, the
media objects that make up a multimedia document may be stored on
different media servers. This facilitates the development of
applications where the same video stream may be combined with one of
several possible audio streams, such as those corresponding to
different languages. A mechanism is needed to request the delivery of
media objects from different servers and to synchronize their
presentation at the client.
- QoS negotiation and adaptation as
an integral part of the system architecture: To achieve service
flexibility, an application must be able to cope with varying network
conditions as well as varying presentation quality requested by the
users. The latter is relevant, for example, when a video document is
available at multiple levels of resolution. The
QoS negotiation process is guided by the
users' preferences and priorities, which can be captured in the form
of user profiles. The system also adapts to changing user priorities,
system parameters, and resource availability. In our system, the
various system components must effectively support
QoS negotiation and adaptation.
Project organization. The CITR
Broadband Services research program consists of six constituent
projects: multimedia data management, continuous media file server,
quality-of-service negotiation and adaptation, scalable video encoding,
synchronization of multimedia data, and project integration. These
projects are organized around different system components and are
investigated by a multidisciplinary team from eight institutions across
Canada. The overall major project is led by J. Wong of the University
of Waterloo. The co-leader is K. Lyons of the
IBM Toronto Laboratory Centre for Advanced
Studies. Apart from project management, an important responsibility of
the major project leader and the co-leader is to coordinate the
milestones of the constituent projects so that the objectives of the
major project are met.
System description
Our system architecture, developed as a result of our design
decisions, is depicted in Figure 1. It is
based on a client/server model. In this section, we first describe the
features of our multimedia news application, and then the software
technologies that can be used to develop such an application. These
include the various software modules, and the communications between
modules. For ease of exposition, we will base our description on an
example document and selected user requests.
Figure 1
Multimedia news application. Our multimedia news application
is designed to support the search, retrieval, and presentation of
multimedia news documents. In general, a news document may contain
media objects such as audio, video, image, and text. These objects may
be stored on different servers. To present a document, the
corresponding media objects are retrieved and synchronized for
presentation at the client workstation. Facilities are provided for a
user to negotiate the quality of the presentation with the system, such
as the encoding scheme used and the frame rate of video. Facilities are
also provided for users to store their QoS
preferences in user profiles.
Multimedia API. A multimedia API
(application programming interface) has been defined to support
application development. Table 1 contains
an overview of the API primitives, organized by function
groups. [2] This is a minimal
API, which has been implemented as a C++ class library,
[2] using the capabilities provided by the
software modules developed by the constituent projects. Details of this
implementation are provided in a later section. Our multimedia news
application has been developed using this API.
Table 1
Multimedia API
| Function Group | Primitive
| Explanation (Where Required) |
| Initialization | Initialization |
Acquire resources and establish
connections between system components |
| Log on |
| Search | Set search scope |
Set range of documents to be searched |
| Search on meta-data |
| Search on content |
| Retrieval and document presentation |
Prepare |
Prepare for retrieval; this includes the prefetching of data |
| Present |
Start or resume presentation |
| Pause |
| Fast forward/rewind |
| Close | Terminate presentation |
| QoS negotiation |
Open profile window | |
| Get active profile |
| Negotiate active profile |
| Get document QoS |
| Get system QoS |
| Negotiate presentation |
| Shutdown | Log off |
|
| Release |
Release resources and terminate connections |
Document representation. We next discuss our approaches to
document representation and document storage. This will facilitate our
discussion of the various software modules in later sections.
SGML/HyTime standard. We follow the SGML/ HyTime
standard [3,4]
for representing document structure.
SGML (Standard Generalized Markup
Language) formally specifies document structure by defining element
types such as "paragraph" and "figure" and the relationships
between them using a document type definition (DTD). SGML
does not prespecify the nature of these elements, nor the structure of
the composition hierarchy that contains them. Instead, a document
designer specifies a different DTD for
each category of document being designed.
The representation of spatial and temporal relationships between
media objects is an important consideration because such information is
required to plan the retrieval and presentation of multimedia
documents. In following the HyTime philosophy, we completely separate
the presentation of a document from its content. This has two
implications. First, the user's presentation style preferences must be
stored and accessed when necessary. The second and arguably more
important consideration is to represent the spatial and temporal
relationships in accordance with the HyTime standard.
HyTime defines a number of architectural forms; one of them is the
finite coordinate space (FCS) that is used
to model spatial and temporal relationships. We define an
FCS of three dimensions:
x and y to represent spatial dimensions and
time to model the temporal dimension. A set of ranges along
these axes forms an extent which corresponds to an
event. [3] An event schedule
is used to represent
temporal relationships among the various media objects. Within this
context, our model of spatial and temporal relationships is a set of
type definitions that correspond to the relevant HyTime concepts.
Details of the type system design can be found
elsewhere. [5,6]
Example document. We show in Appendix A a complete
SGML markup of a multimedia news document
that describes the CITR Broadband Services
demonstration at the CAS conference
(CASCON) in Toronto in 1996. This
document, as viewed by a user of our prototype, has the layout shown in
Figure 2.
Figure 2
The markup begins with a standard SGML
<doctype> tag (or element) identifying the
DTD to which the document conforms. This
is followed by the <article> element, which identifies the
document's unique identifier and its primary language. The
<frontmatter> element contains subelements representing the meta-data
used for document indexing and searching. It includes the document's
editorial information (author, date, keywords, location, subject, and
source) and the document's headlines and abstract.
The next section of the document is marked by the <async> element. It
contains the asynchronous content that may be displayed at any time and
with no timing constraints; text and images fall into this category. We
have two <image> elements corresponding to the two logos in
Figure 2.
Each image element defines an image but makes no mention of its
content; the content is described using a separate <image-variant>
element. In general, an image-variant element includes an identifier,
the name of a file containing the image data, the size of the image
data, and QoS parameters such as the
encoding scheme used, the image's colour space, and the image's
dimensions. Note that multiple variants of the same image may be used
to provide multiple QoS levels. Our
example document, however, contains only one variant of each image.
The next part of the document is marked by the <sync> element and
contains specifications for the document's continuous media objects,
such as audio and video. There are four components to this
specification: axis definition, object declaration, object extent
lists, and QoS variants. The SGML elements <x>, <y>, and <time>
define the HyTime axes, which will be used to place and measure media
objects. These objects are then declared using <audio> and <video>
elements, nested within an event schedule (<av-evsched>)
and a finite coordinate space (<av-fcs>). The object declarations
include the "price" of retrieving the object and a list of
QoS variants that represent the object's content.
The object extent lists define, in the <av-extlist> element, the
sizes and positions of objects that have been defined previously. This
allows different objects to occupy different spaces within the
FCS. An extent list is composed of
(starting position, size) pairs, one for each of the HyTime axes
previously defined. This information is used in media stream
synchronization. Finally, the details of the
QoS variants for each object are
specified. Our example document contains two variants for each of the
audio and video objects. The various user-level
QoS parameters are included with the
<audio-variant> and <video-variant> elements. These parameters are
used in QoS negotiation. Each
QoS variant refers to a <stream>
element, which is the link between the database and the continuous
media file server (CMFS). Audio and video
objects stored in our CMFS are identified
by universal object identifiers (UOIs).
Each <stream> element specifies a UOI
and the size of the object.
When our example SGML document is parsed,
it is converted into objects suitable for insertion into the database.
The text components of the document (i.e., title and abstract) are
included in these objects. The two image files are read in and used to
provide the content for the image objects in the database.
Software modules. In our design, applications reside above an
API module, which communicates with the other software modules in
the system (see Figure 1). These modules
communicate with each other in order to achieve the desired results. In
this section, an overview of each software module is presented.
API module. The API module
insulates the application developer from the workings of the rest of
the system. It provides a set of objects that enable the developer to
use the entire functionality of the system without being aware of the
components involved or how they interconnect. These objects are:
- Session manager: This object represents an application's connection
to the rest of the system. It should be the first item constructed by
the application; likewise its destruction should be the final operation
performed before the application terminates.
- Query manager: This object is the application's interface to the
database management system. It manages the construction of queries,
performs the actual query operations, and makes the results available.
- Document manager: This object is associated with an entire multimedia
document. The methods within this object allow for the retrieval and
examination of the document's content. There is a separate document
manager for each document currently being processed.
- Presentation manager: One such object is associated with each of the
various media components of a document. For a given media component,
the corresponding presentation manager has knowledge on how to present
the data to the user. In addition, presentation managers that represent
continuous media data, e.g., the audio and video, also implement a
"virtual VCR" interface, which allows
the application to control the presentation, to query the current
temporal position, and to query the current state of the presentation
(stopped, playing, etc.).
DBMS module. The database is an object-oriented system that
complies with the SGML/HyTime standard. It
is based on ObjectStore** [7]
and therefore the ObjectStore API is used to search and retrieve
documents from the database. Facilities are provided to retrieve a set
of documents, to iterate through the set, and to retrieve a document
based on the instance variables of its objects. ObjectStore is
organized with a server component and a client component; these
components are shown as database server and database client,
respectively, in Figure 1. For convenience, we
use "database" to mean the two components working together.
QoS manager module. The QoS
manager module is responsible for managing user profiles, validating
presentation requests through these profiles, and performing
QoS negotiation.
Synchronization module. The synchronization module is
responsible for the presentation of media objects. Once the
QoS manager has determined the
QoS variants that should be used for the
presentation, it passes this information to the synchronization module,
which then manages the synchronized presentation.
The CMFS module. The CMFS module
provides a capability to deliver continuous media data to the client.
It also provides support for QoS
negotiation and synchronization. Included in this module is the
transport of continuous media data over an
ATM network.
Communication between software modules. We illustrate the
communication between the different software modules by means of two
examples: (1) processing of a query and (2) presentation of a document.
Processing of a query. The software modules involved in
processing a query are depicted in Figure 3.
In this figure, communication between
software modules is shown by the steps indicated:
- The user requests a keyword search (e.g., find all documents
that include reference to CITR and CASCON).
- The application collects the user's search criteria and passes
them to the query manager in the API
module.
- The query manager first retrieves the ObjectStore database root
corresponding to all documents and then retrieves the meta-data of each
document.
- Upon receiving a response from the database, the query manager
checks each document's meta-data to see if the attributes match the
user's request. Suppose there is more than one matching document. The
objects representing these documents are converted into objects that
are understood by the application and the API.
- The query manager returns to the application the status of the
query (success in this case).
- The application requests information about the matching
documents.
- The query manager responds with the meta-data of the matching
documents.
- The application presents the user with these matching
documents.
Figure 3
Presentation of a document. Now suppose that the user wishes
to view a document that matched the search criteria. The software
modules involved and the communication between modules are shown in
Figure 4. The steps are:
- The user requests the retrieval and presentation of a
specific document.
- The application contacts the document manager corresponding
to the requested document and asks it to retrieve and present the
document. [8]
- The document manager asks the presentation managers
associated with the document to retrieve and display their
respective media objects. [9]
Steps 4 to 6 are performed for
each presentation manager that corresponds to a discrete media object,
while steps 7 to 16 are performed for each presentation manager that
corresponds to a continuous media object.
- The presentation manager retrieves its associated media
object from the database.
- The database returns the media object.
- The presentation manager displays the contents of the media
object on the screen.
- The presentation manager requests the QoS manager to negotiate
QoS parameters and to display its associated continuous media object.
- The QoS manager queries the database for the QoS parameters of each
available media variant for the requested document.
[10]
- The database returns the QoS variant information; the QoS manager then
selects the variants that are appropriate to the user's profile.
- The selected variants are passed to the synchronization module
for display.
- The synchronization module queries the database for additional
information about the selected variants, such as formats, HyTime
parameters, UOIs, etc.
- The database responds with the requested information. This
information is then used to construct the presentation scenario, to
configure the media decoders, and to access the CMFS.
- The synchronization module requests the
CMFS to deliver the media objects.
- The CMFS sets up the required connections and sends data
packets continuously to the synchronization module.
- As frames of media are received, the synchronization module
determines when they should be displayed, and at that time passes them
to the appropriate media decoders.
- The media decoders display the frames to the user.
Figure 4
Prototype development and integration challenges. We have been
successful in meeting our milestones and delivering versions of our
integrated prototype. Much of the success is due to the effort of the
project integration team, which was led by R. J. Velthuys
(September 1994 to June 1996) and by D. Evans (since July 1996). This
team is comprised of research staff and graduate students from all
participating institutions. Team members collaborate at both the design
and implementation levels. The design level is concerned with the
definition of the reference architecture. Of particular importance are
the interfaces between system modules. At the implementation level,
integration team members interact frequently by electronic mail, phone
calls, and short-term visits. Much progress was made at an integration
workshop organized by K. Lyons of IBM CAS
in November 1995. Members of the integration team spent two weeks at
IBM CAS, working out the details of the
interfaces, modifying the software modules as required, collaborating
in debugging these modules, and producing an enhanced version of the
integrated prototype.
The latest version of the integrated prototype was demonstrated at
CASCON in November 1996. In this version the ObjectStore server and
the CMFS reside on an IBM RISC System/6000* (RS/6000*) running
AIX*. Two client platforms are supported: IBM RS/6000 running
AIX and SUN SPARC** running SunOS**. The networking technology is
based on ATM (asynchronous transfer mode)
switching equipment from Newbridge Networks Corporation, and
audio/video support is provided by motion
JPEG (Joint Photographic Experts Group)
cards from IBM and Parallax Graphics, Inc.
The multimedia news application contains a news browser with facilities
for QoS negotiation, searching, retrieving, and presenting news documents.
While developing our prototype, we have gained much experience in
working together as a team. We have also encountered challenges that
need to be addressed:
- The CITR Broadband Services major
project involves researchers from eight institutions across Canada.
Very often, research teams at the different institutions progress at
different speeds, resulting in the need to develop simplified versions
of some software modules for testing purposes while waiting for the
completed versions.
- Despite the effort of the project integration team, there were still
instances where the definition of interface primitives was
misunderstood. Much effort was spent in clarifying these primitives.
- The computing environment at the different institutions is not
identical; different hardware and operating system configurations are
in use. Additional effort to understand the differences in system
configuration was required to successfully install the same software
modules at different institutions.
- We have been using leading-edge equipment in our prototype
development. This includes ATM switches,
ATM adapter cards, and motion JPEG cards. Much effort was required to
understand how this equipment works. Compromise was made because of the
capability of this equipment. For example, we were planning to develop
a transport service over a native ATM
interface, but such an interface was not available. The compromise was
then to run UDP (User Datagram Protocol)
over IP (Internet Protocol) over ATM.
- Much effort was spent in the first year of this major project to bring
the team members together to work as a team. This is due to the
differences in established culture at the different institutions and
the conflicting priorities among researchers at different institutions.
- During our prototype development, we have to make compromises on
the features to be included because of resource limitations. For
example, in our synchronization module, much effort was spent on the
design and implementation of algorithms to synchronize audio and
video objects; work on the temporal synchronization of audio/video with
image/text has been delayed. As another example, we have been
successful in the design and implementation of algorithms for scalable
video encoding and decoding. Such algorithms are attractive when one
wishes to effectively support video at different levels of resolution.
Unfortunately, our software implementation is very slow, and hardware
decoding devices are not available. We have therefore decided not to
include the scalable video feature in our integrated prototype.
Research accomplishments
Apart from the collaborative effort that has led to the
development of our integrated prototype, constituent projects have
advanced the state of knowledge in their respective areas of research.
In this section, these research accomplishments are discussed.
Multimedia data management. As mentioned previously, our
database management system is based on ObjectStore. To provide support
for multimedia and be compliant with SGML/HyTime, we have built an extension
layer on top of ObjectStore. [11]
This extension layer
contains the type system and addresses three issues related to
multimedia data management: modeling of the basic media components
(i.e., text, image, audio, and video), document representation based on
SGML/HyTime, and the capture and storage
of meta-data. An example of an SGML
markedup document has been provided in Appendix A. This document
is consistent with the document type definition
(DTD) for a news document. In general, we
should be able to store different types of documents in one database by
accommodating multiple DTDs. Tools should
also be available to automatically insert marked-up documents into the
database.
Dynamic insertion of new DTDs. Our system is designed to
handle multiple DTDs and support the
creation of types that are induced by these
DTDs. It analyzes new DTDs and automatically generates the types
that correspond to the elements they define. We store the
DTD as an object in the database so that
users can run queries like "Find all DTDs in which a <paragraph> element is
defined." (See Figure 5.)
Figure 5
In our design, a meta-DTD describes a grammar for defining DTDs, and a
DTD parser parses each DTD according to this grammar. While
parsing the DTD, an object is created for
each valid SGML element defined. This
object contains information about the element, such as its name,
attribute list, and context model. If the
DTD is valid, a type generator is used to
automatically generate C++ code that defines a new ObjectStore type for
each element in the DTD. For example, if a "book" DTD is parsed, objects
representing <title>, <authorlist>, <chapter>,
<section>,
<paragraph>, <index>, etc., would be created.
We have also addressed two important problems related to abstraction so
as to reduce the complexity of the multimedia type system and therefore
reduce maintenance time and errors. First, if two or more elements in
the same DTD share a common feature, then
that feature is automatically extracted and promoted to an abstract
superclass. For example, the Video and
Audio types both share a common duration
attribute, so the abstract supertype
Temporal was created to promote this feature.
The second problem is related to common element definitions across
different DTDs. In general, this is a
difficult problem because it leads to the well-known semantic
heterogeneity problem that has been studied extensively within the
multidatabase community. It involves the ability to determine whether
two elements are semantically equivalent. In our design, we have chosen
to give up some abstraction in favor of a semantically "safe" type
system. Specifically, we only reuse types that have well-defined
semantics, e.g., atomic types such as Image and
Text and high-level abstract supertypes
such as TextElement, Structured, and
HyElement. For the rest of the elements in
a given DTD, we create new types. Name conflicts between elements in
different DTDs are resolved automatically by using
the DTD name as prefix during type
creation (for example, Article_Section and
Book_Section).
A major advantage of our approach is that new element types are
inserted into the database without costly schema evolution. The
DTD manager takes the DTD file as input and stores the
DTD in the database as an object that can
later be used for parsing documents. As soon as a
DTD is stored in the database, SGML documents of that type can be
inserted. Further details of handling multiple
DTDs are described by Schöne. [12]
Automatic insertion of multimedia documents. Tools for
insertion of documents into a database are not developed in many
multimedia DBMS projects because they are
considered to be outside the scope of database work. However, such
tools have been developed for our system. Our approach is to couple the
database with a retrofitted SGML parser.
[13,14] This parser accepts an
SGML document instance from an authoring tool
(see Figure 5). It then fetches the required
DTD from the database and uses this DTD to validate the document instance.
If the document is error-free, an output is generated and passed to an
instance generator. This output is in the form of a parse tree, and
includes a text string for the document that is stripped of the markup,
together with a linked list of nodes containing annotations into the
string, an attribute list, and pointers to "parent" and "next"
nodes. The instance generator traverses the parse tree and instantiates
the appropriate objects in the database corresponding to the elements
in the document. These are persistent objects and can be accessed using
the query interface.
Continuous media file server. We have designed and implemented
a high-performance continuous media file server
(CMFS) that is scalable and provides support for QoS and synchronization.
CMFS architecture. Our CMFS is
based on a set of server nodes, each with a processor and disk storage
on multiple local SCSI-2 Fast/Wide buses.
One of these nodes is the administrator node. To achieve high-speed
operation, the server nodes are connected to an ATM network (see
Figure 6). A sufficient number of disk
drives are attached to the SCSI (small computer
system interface) buses to provide the required bandwidth. The disks
can be "striped" along a single SCSI
bus (to a maximum of four disks) or across SCSI buses.
Figure 6
Besides interconnecting nodal computers by an
ATM network, the configuration can consist of processor cards
interconnected via an I/O bus such as VME
(VERSA-Module Eurocard). In either case,
the initial "open" request from the client first goes to the
administrator node. This node then determines which of the server nodes
has the requested media object and forwards the request to this server.
From then on, communication is direct between a particular server node
and the client.
Our architecture has the following features:
- Scalability: The performance bottleneck of a CMFS is the
I/O bandwidth. This is substantiated by
the following speed differences: the disk system (typically between 2
to 5 megabytes [MB]/second), the I/O bus (SCSI-2 at 20
MB/second), the internal bus (800 to 1200 megabits [Mb]/second), and an
ATM network (100 to 155 Mb/second). Given that the typical
bandwidth required to support a client ranges from 1 to 8
Mb/second, the number of concurrent clients may be quite limited. However,
our CMFS architecture is scalable, permitting
the use of multiple server nodes as more capability is needed. Since
the server nodes are independent of each other, the architecture does
not impose any limit on the number of servers that can be added.
- Multimedia support: Our CMFS
design is not restricted to a single media syntax such as
MPEG-2 (Moving Pictures Experts Group-2).
A suitable abstraction for time and for the media units per second has
been developed. This would effectively support the possibility of
displaying the same video stream with a variety of different audio
streams, where each stream may come from a different server.
- Disk I/O bandwidth: In our system,
the disk capability of the server is determined dynamically by
calibrating the disk I/O bandwidth. This
is more accurate than other studies that assume a static disk
layout. [15,16]
Two values are determined in this calibration:
the maximum and the minimum number of I/O
operations per second. These values include the hardware overhead
in transferring disk blocks as well as the operating system software
overhead.
- Variable bit rate I/O scheduling:
A novel I/O scheduling algorithm based on
variable bit rate (VBR) streams has been
developed. This permits the scheduling of streams that have been
compressed using VBR schemes such as motion JPEG and
MPEG-2. An admission control algorithm and
an I/O scheduler for variable bit rate
traffic have been developed as part of the
CMFS. A new stream is admitted if at any
instant in time the combined data rate requirements of all streams do
not exceed the I/O bandwidth. Our
algorithm is more efficient when compared to algorithms based on
constant bit rate traffic because a larger number of streams can be
supported simultaneously.
Synchronization and QoS support. The
CMFS provides a programming interface that
supports access to the media objects, synchronization, and
QoS. [17] Consider the steps
required for the presentation of a document, described earlier. In
steps 13 and 14, the synchronization module requests the
CMFS to set up connections and
deliver the media data. This is accomplished by using the "prepare"
operation to request the CMFS to begin the
data transfer of a media object. The "read" operation is then used
to obtain data queued at the client. The read is strictly a local
client operation that does not result in a request to the
CMFS. This instantaneous nature of read,
coupled with the fact that there is a guaranteed bounded delay on
prepare, supports the synchronization of multiple independent streams
at the client even if the streams originate from different servers.
The CMFS is designed such that once the
prepare operation has returned control, the client is guaranteed to
have sufficient data queued locally to support the continuous
presentation of the media object. Underflow is therefore avoided. The
prepare operation also has parameters that control the speed and amount
of data that are transmitted. These parameters are used to vary the QoS.
Synchronization of multimedia data. We have designed and
developed a novel algorithm to synchronize multiple media streams from
possibly heterogeneous servers. Our algorithm adheres to the intermedia
skew tolerances obtained by Steinmetz [18]
that define the
limits in the perception of ordinary human beings between various media
types. For example, lip synchronization has a tolerance of 120
milliseconds. In designing our algorithm, we adopt an approach that
does not require a global clock among the various servers. Furthermore,
the buffer requirements are kept to a minimum.
Our synchronization algorithm forms the core of the synchronization
module. Referring again to our document presentation steps, the
synchronization module is involved in steps 11 to 15. More precisely,
the synchronization module queries the database for meta-data such as
length of document, frame rate, HyTime parameters, and the
UOIs used to locate the media objects in
the CMFS. These meta-data are then used to
construct a presentation scenario, which segments the media streams
into small pieces (for example, segments of one-second duration) and
defines the temporal relationship among the segments.
Our synchronization algorithm is executed at two levels. At the first
level, the decoding delays are estimated, and the Time Flow Graph
method [19] is used to determine
the times at which the media
servers should start transmitting their respective media streams. At
the client, a media synchronization controller
(MSC) is activated for each media stream.
The MSCs are responsible for opening and
controlling transport connections. They read the required segments
according to the presentation scenario so as to play out the multimedia
document in synchrony.
Sometimes, scheduling and predicting the traffic are not sufficient to
maintain a simultaneous multistream delivery, since the network
may introduce random delays and losses, resulting in "jitters" and
gaps within the data stream. Compensating for such errors is done at
the second level of synchronization, as follows. If during a segment
time interval, the video MSC receives data
out of synchronization with the audio stream (for example, with skew
greater than 120 milliseconds), it informs the audio
MSC of the actual time-skew. During the
next one-second interval, both MSCs shift
their data presentations by the previously encountered skew, thus
recovering synchronization. Details of the synchronization control
system and performance evaluation results are reported in Lamont et
al. [20]
QoS negotiation and adaptation. The overall goal of
QoS negotiation is to optimize the system
configuration that can satisfy the users' QoS constraints. A framework
for QoS negotiation has been defined that
includes all system components such as the client workstation, network,
and servers. [21,22]
The global configuration involved in
a given instance of an application can be selected based on the user's
QoS requirements and the resource
availability at the different system components. For access to
multimedia documents, the system may take advantage of the presence of
several media variants. [23] For
presentation to a specific user, the system selects the most appropriate
media variant depending on the QoS preferences of the user
(including cost) and the current availability of system resources. This
selection involves the evaluation of various configuration
alternatives. If the negotiated QoS cannot
be maintained during the presentation of the document, possibly due to
network or server congestion, the QoS
manager may perform an automatic reconfiguration in order to maintain
the originally agreed upon QoS characteristics.
Based on the above framework, a protocol for
QoS negotiation has been designed and
implemented. The details of this protocol were not included when we
earlier discussed the document presentation steps. In that example, we
assumed a match between the QoS variant
found in the document meta-data and that contained in the user profile
(see step 9). On the other hand, if there is no match, the protocol
proceeds as follows.
A user may define different QoS profiles,
each containing a set of selection criteria.
[22] For each
relevant QoS parameter, the criteria may
include a minimum value and a preferred value. A priority ordering
based on these parameters is also provided, either in absolute terms or
in terms of a weighted sum. The latter is important because some kind
of trade-off may be performed between conflicting
preferences, such as low cost and high presentation
quality. A match is not found if the system cannot provide a
configuration that satisfies the minimum requirements of the user. In
this case, the user is invited to accept certain quality reductions
based on the feasible system configurations. The user may accept this
alternative or simply abort the negotiation.
Scalable video encoding. Scalable video encoding
[24]
is an important feature in the design of distributed multimedia
applications because it provides efficient support for video objects at
multiple levels of resolution. One can then accommodate requests to
display video objects on terminals with different capabilities and to
transport video over networks with a range of QoS availability.
Three types of scalability are commonly identified: spatial
scalability, where the different levels have different spatial
resolution; signal-to-noise ratio (SNR)
scalability, where the different levels have the same spatial
resolution but different amplitude resolution (or
SNR); and temporal scalability, where the
different levels have different temporal resolution. For the
MPEG-2 standard, spatial scalability is
the most relevant for meeting our objectives because it allows us to
support receivers with different display capabilities.
MPEG-2 provides for two levels of
resolution in the spatial scalable extension, and we have developed
such a two-level encoder/decoder in software. The original sequence at
the higher level is first filtered and down-sampled to produce a lower
resolution picture. A typical situation would be to retain one sample
out of two horizontally and one out of two vertically for a total
spatial subsampling factor of 4:1. This picture is then encoded using
the appropriate MPEG-2 configuration, and
the corresponding bit stream is stored or transmitted. At the decoder,
the encoded low-resolution picture is decoded and up-converted to the
original resolution using spatial interpolation to reconstruct the
missing samples. The up-converted picture is then available to assist
in encoding the original full-resolution picture. The prediction of a
picture in the high-resolution sequence is formed using the previous or
subsequent high-resolution reference picture(s), the up-converted
picture from the low level (at the same time instant), or a combination
of the two.
Related work. Enabling technology for distributed multimedia
applications is an active area of research, and it is not practical in
this paper to provide a survey of related work. Nevertheless, it is
important to mention some related publications. The technology includes
multimedia data management, [25-27]
continuous media file server,
[28,16] QoS negotiation
and adaptation, [29,30]
media stream
synchronization, [31] and scalable video
encoding. [32,33]
In addition, the recently announced
ObjectStore Version 5 has features that support multimedia, and the
Internet Engineering Task Force has developed protocols such as
RSVP [34] and
RTP [35] that support the
transport of media streams over the Internet.
Concluding remarks and future direction
In this paper, we have described the software technologies
developed by the CITR Broadband Services
major project. Our technologies have a number of salient features that
are not present in most other systems:
- Our continuous media file server is scalable without the need for
special hardware; it also supports QoS,
variable-bit-rate transfer, and synchronization of media streams.
- Our database is based on an object-oriented design with an efficient
storage structure, a uniform treatment of multiple media and meta-data,
and a database model that is compliant with the
SGML/HyTime standard.
- Our QoS management framework supports a
dynamic choice of available services; that is, it selects an optimal
configuration of the system components based on factors such as cost
and resource availability.
- Our media synchronization algorithm is based on the time-flow-graph
approach and does not require a global clock.
Our approach of organizing the constituent projects according to
system components has worked very well. An important success factor is
the close collaboration among members of the integration team. Our
integration effort has led to improved understanding of the research
issues related to each system component. Some of these issues might not
have surfaced if the research had focused on a specific component only.
We now have a testbed that can be used for research and development
work in distributed multimedia applications.
We have recently started work on extending our technologies to include
a conversational capability. This would allow users to engage in
videoconferencing and, at the same time, access multimedia documents
from a multimedia database. Such a capability would effectively support
applications such as telelearning and remote consultation. The
conversational capability, together with a telelearning application,
will be developed.
Acknowledgments
We gratefully acknowledge the contributions made to this project
by many people other than the authors, including K. Bennet, M. Daami,
R. Dssouli, S. El-Medani, D. Finkelstein, J. Gecsei, M. Ito, J.
Jarmasz, J. Konrad, L. Li, Y. Liu, C. Louvard, R.
Mechler, R. Ng, S.-L. Ooi, S. Panchanathan, W. Robbins,
M. Schöne, R. Somalingam, C. Vittel, and A. Vogel.
This work was supported by a grant from the Canadian Institute for
Telecommunications Research under the National Centres of Excellence
program of the Government of Canada, and by the
IBM Toronto Laboratory Centre for Advanced
Studies.
Appendix A: Sample document markup
<!doctype article SYSTEM "newsMM.dtd">
<article id="CITRNews-RV-01"
language="English">
<frontmatter>
<edinfo>
<author>David Evans</author>
<date>11/15/1996</date>
<keywords>CITR, IBM, CAS, CASCON, Events,
Conference.</keywords>
<loc>Toronto</loc>
<subject>Report on CITR Broadband Services
participation at CASCON '96</subject>
<source>CITR</source>
</edinfo>
<hdline>CITR Broadband Services Major Project
Demo at CASCON '96 a Success
</hdline>
<subhdline>Attendees impressed with demonstration
of results of CITR Broadband Services Project.
</subhdline>
<abs-p>
<paragraph>
The CITR Broadband Services Major Project
presented its latest work at the CASCON '96
conference in Toronto.
</paragraph>
</abs-p>
</frontmatter>
<async>
<section>
<paragraph>
<figure>
<image id="CITR-news-image-1"
variantspec="image1-variant">
</figure>
<figure>
<image id="CITR-news-image-2"
variantspec="image2-variant">
</figure>
</paragraph>
</section>
<image-variant id="image1-variant"
filename="images/citr_logo.gif"
format="gif" size="525" width="95"
height="64" color="colour">
<image-variant id="image2-variant"
filename="images/cascon.gif"
format="gif" size="476" width="88"
height="80" color="colour">
</async>
<sync>
<audio-visual id="audio-visual-1">
<x id="x-1">
<y id="y-1">
<time id="time-1">
<av-fcs id="fcs-1">
<av-evsched id="evsched-1">
<audio id="audio-1" price="5"
variantspec="audio-variant-1
audio-variant-2" exspec="extlist-1">
<video id="video-1" price="5"
variantspec="video-variant-1
video-variant-2" exspec="extlist-1">
</av-evsched>
</av-fcs>
<av-extlist id="extlist-1">
<xdimspec id="xdimspec-1">1
320</xdimspec>
<ydimspec id="ydimspec-1">1
240</ydimspec>
<tdimspec id="tdimspec-1">1
156</tdimspec>
</av-extlist>
<video-variant id="video-variant-1"
format="mjpeg" streamspec="video-stream-1"
site="bristol-atm" duration="155"
width="320" height="240" framerate="5"
bitrate="20185" color="colour">
<video-variant id="video-variant-2"
format="mjpeg" streamspec="video-stream-2"
site="bristol-atm" duration="155"
width="320" height="240" framerate="15"
bitrate="20185" color="colour">
<audio-variant id="audio-variant-1"
format="g728" streamspec="audio-stream-1"
site="bristol-atm" duration="155"
samplerate="22050" bps="16"
quality="CD" language="English">
<audio-variant id="audio-variant-2"
format="g728" streamspec="audio-stream-2"
site="bristol-atm" duration="155"
samplerate="22050" bps="16"
quality="CD" language="English">
<stream id="video-stream-1" uoi="1
size="22050">
<stream id="video-stream-2" uoi="2"
size="100000">
<stream id="audio-stream-1" uoi="3"
size="22050">
<stream id="audio-stream-2" uoi="4"
size="322050">
</audio-visual>
</sync>
</article>
*Trademark or registered trademark of International Business
Machines Corporation.
**Trademark or registered trademark of Object Design, Inc. or Sun
Microsystems, Inc.
Cited references and notes
Accepted for publication May 30, 1997.
|