|
1. Introduction
Traffic in the next generation of the Internet is expected to be rich in multimedia content [1, 2], thanks to quality of service (QoS) [3, 4], mechanisms for resource reservation [5], and high-bandwidth pipes [6, 7], as well as increased expectations [8]. At the same time, the tremendous proliferation of multimedia content, multimedia formats, and multimedia-capable devices demand a rethinking of fundamental distribution mechanisms found in the current Internet [9]. The next generation of multimedia networks should be capable of efficiently mitigating the interaction of multiple performance-limiting factors during the distribution of the new multimedia traffic payloads. Examples of performance-limiting factors include receiver capabilities such as access bandwidth and media processing power, and link capabilities such as loss and error rates.
Consider the following scenario, in which the same content is to be distributed to a diverse group of receivers. In particular, consider the following types of receivers: (a) network computers (NC), (b) personal computers (PC), (c) palmtops (PDA), and (d) workstations (WS). On one hand, NCs are relatively low-CPU processing nodes connected via a high-bandwidth pipe at low error rates, whereas PCs have more CPU processing power but are connected via a shoestring-bandwidth pipe at higher error rates. Similarly, workstations are typically very-high-CPU processing nodes connected via a very-high-bandwidth pipe at very low error rates, whereas palmtops are very-low-CPU processing nodes connected via a very-small-bandwidth pipe at typically high error rates.
Today's Internet is simply not capable of multicasting multimedia content to such receivers. Consider the operational space of these receivers (Figure 1) in terms of three primary performance-limiting factors over the performance and efficiency of end-to-end delivery:
- Available bandwidth (B/W).
-
Receiver computational performance (CPU).
-
Error tolerance (TOL).
Figure 1
Whereas the CPU axis characterizes the ability of a receiver to process a given media type, the B/W and TOL axes characterize the end-to-end transport and the media. A point on this space characterizes the set (receiver, media, and link).
These particular receivers do not lie within a linear or planar region. On one hand, in terms of increasing bandwidth, these receivers would be ordered as (c, b, a, d); on the other hand, in terms of increasing processing power, they would be ordered as (c, a, b, d). Moreover, with respect to decreasing error tolerance, they would be ordered as (c, b, d, a). A projection from this space results in a set of incompatible receivers being grouped together. Moreover, the resulting performance compromise across such grouped receivers is neither specifiable nor controllable in today's Internet. End-to-end provisioning to such widely heterogeneous receivers requires network intelligence to accommodate the differences in receiver connectivity, processing capabilities, and prevalent error rates. In general, making the network aware of such receiver disparities enables it to provision the same content in formats/representations customized for the needs and preferences of individual receivers.
For example, if audio content were to be broadcast to receiver types (a, b, c, d), the network could provision a with a relatively high-bandwidth representation of low decoding overhead and high tolerance to errors (e.g., phone-quality µ-law1 audio); b with a relatively lower-bandwidth representation but with higher decoding overhead and lower tolerance to errors (e.g., Internet-quality audio); c with a negligible-bandwidth representation of negligible decoding overhead and high tolerance to errors (e.g., text transcripts); and d with a high-bandwidth representation regardless of decoding overhead and tolerance to errors (e.g., CD-quality audio). Thus, to efficiently accommodate the operational characteristics of diverse receiver types such as a · · · d, multiple media representations of a given media content may have to be generated and distributed.
A straightforward solution would require each content source to provide the same content in each of the desired media representations. Each receiver's client software to access (i.e., decode and play back) multimedia content would have the ability to understand all of the media representations that could possibly be desired by any receiver. This solution, however, is not feasible in practice for several reasons. First, because of the rapid proliferation of new and diverse information and entertainment appliances/devices, it may be practically impossible to outline the most common set of receiver characteristics that define the targeted receiver audience. As a result, and because a given content may be accessed by any number of receivers, a complete set of desired media representations for the content may not be known a priori. Second, even if the targeted receiver audience were to be determined in advance, it would be prohibitively expensive for a content provider to publish content in a large set of media representations. Finally, supporting every media representation in the client software would impose significantly higher resource (especially memory and/or disk) requirements on the receivers, hence increasing their cost. Moreover, this makes future representation of content in new media formats difficult if not impossible. All of these problems can be addressed by enabling the network to provide rich "content connectivity" between sources and receivers.
In this paper, we present an evolutionary framework for provisioning enhanced multimedia distribution-of-content services to a group of receivers. The framework addresses various degrees of heterogeneity among "distribution paths" from a common content source to diverse receivers, where a distribution path carries information from a content-aware multicast distribution tree between the source and one or more receivers. The goal of the framework, given the broadcast of some multimedia content, is to generate one or more distribution paths that are compatible with various receivers as they join the broadcast/multicast. The framework enables the realization of a multimedia overlay network that provides sophisticated content-based programmability over the provisioning, management, and distribution of media flows from content sources to receivers.
Toward this end, we utilize a number of evolutionary measures. First, to address receiver heterogeneity, we introduce awareness of receiver and link capabilities into the network. Second, we decouple media content from its representation and provision individual receivers with the best feasible representation that suits the receiver and the network. Third, we introduce intelligence into the network that allows it to exploit and adapt to request patterns across receivers (and groups of receivers). The above techniques support a feasible performance spectrum capable of supporting heterogeneity with regard to receiver capabilities and preferences, and performance tradeoffs among error resiliency, network bandwidth, and congestion.
The overlay requires one or more of each of the following four basic types of multimedia-aware nodes: sources, receivers, distributors, and controllers. The delivery of multimedia content originates at a source and terminates at a receiver connected by a traversal path (referred to as the distribution path) over one or more distributors. The overlay performs arbitration roles between sources and receivers, such as supplying content in a format suited to the performance capabilities of individual receivers. The construction and subsequent extension of content-distribution trees is determined by the types and numbers of receivers subscribing to a given content and representation.
To decouple content from representation--a key feature of our framework--the overlay supports content-based mechanisms to reconfigure distribution paths so as to address a wide range of receiver heterogeneity. This allows the overlay to provision content to a particular receiver by "transforming" such content from an ill-suited representation to a better-suited one. Given a core content distribution tree, the overlay constitutes a programmable network capable of autonomously generating branches from the core distribution tree. The branching criteria are based on cost criteria such as receiver capabilities (e.g., bandwidth, CPU processing power) and link capabilities (e.g., loss rate, error rate).
Our work builds upon and extends the notion of overlay concepts [10, 11] and application-level framing (ALF) [12]. In contrast to Clark's original ALF discussion, the notion of a distribution path is used to expose interior, application-level control points during the relay of a media flow as opposed to just at its ends.
The rest of the paper is structured as follows. First, we outline the goals and requirements necessary for evolving toward a next generation of programmable multimedia network services. Then, in light of these principles, we present an evolutionary perspective on high-function network elements and services. We then describe the key aspects of our framework for programmable multimedia overlay networks that satisfies the goals and requirements mentioned above. Section 5 discusses a critical component of our framework, namely, management of such overlay networks. In Section 6, we briefly describe a number of existing Internet-related technologies and necessary enhancements to them for effective realization of the proposed framework. Section 7 concludes the paper.
2. Goals and requirements
In this section we outline the key goals and requirements which must be satisfied by any framework in order to realize autonomous and polymorphic distribution of multimedia content. As argued earlier, a single representation cannot efficiently address the content requirements of truly diverse (i.e., heterogeneous) receivers. Managing diverse media representations, however, is a significant burden for designers and developers of multimedia content, applications, and tools. Receiver diversity can be transparently accommodated by ensuring independence of a particular content from its media representation (referred to as goal G1). Moreover, the distribution of content to receivers must be efficient and controllable (referred to as goal G2). These two goals impose two key requirements that the network must satisfy: promotion of content awareness and deployment of network services that leverage such awareness. We elaborate further on these requirements below.
Promotion of content awareness
Figure 2 illustrates the key features required of multimedia applications and the network to meet our dual goals of media independence and efficient distribution of content to heterogeneous receivers. Our approach introduces a new level of receiver-adaptive responsibilities into the network. Each receiver (i.e., client or application) is characterized by a capabilities bank, which represents the particular needs and preferences of a receiver with regard to any multimedia content. Each receiver exports a characterization of its capability bank to the overlay, which functions as a programmable multimedia distribution network. The overlay augments a particular receiver's characterization (including its processing power) with a network-oriented characterization of the receiver's attached link to the network (e.g., available bandwidth, loss rate, and error rate).
Figure 2
Deployment of network capabilities
The overlay also contains an encompassing services bank of (continuous media) stream-oriented capabilities. These multimedia services are distributed throughout the overlay, and their form and function are programmable (i.e., controllable) from within the overlay. To address G1, the overlay may deploy a variety of media capabilities to achieve media polymorphism (e.g., transcoders or enhancers)2 over selected overlay nodes. To address G2, these capabilities must be autonomously deployed on the basis of some cost criteria that best match the needs and preferences of receivers. Given a core content distribution tree, the overlay constitutes a programmable network capable of autonomously generating branches from the core distribution tree. The branching criteria incorporate both receiver capabilities (e.g., bandwidth, CPU processing power) and link capabilities (e.g., loss rate, error rate). The decision to branch off is weighted by the number of such receivers, bringing similar cost weights to the core distribution tree. Moreover, when the cost weight is substantially large, alternative representations of the broadcast are considered for distribution over the newly created branch.
By satisfying the above requirements, our framework defines a distribution network capable of "intelligently" adapting the distribution and representation of multimedia content into one that is suitable to the needs, preferences, and capabilities of its receivers. Figure 3 uses the specific scenario of a multicast of speech content to illustrate, via an example, the sort of intelligence that can be provided on a single programmable network element (referred to as the distributor in Figure 3) within our framework. An input stream of PCM-encoded speech [13] must be recoded onto another format for some group of clients.3 In particular, an on-line speech-to-text [14] capability could be used as an adaptation measure by the network to match the operational space of one or more receivers, as well as a multimedia service to provide a value-added text-captioning enhancement. Similarly, a text-to-speech capability (i.e., increasing the streaming requirements) could be used by the network as a media service to suit the preferences of a particular set of receivers.
Figure 3
Next, we present an evolutionary perspective to motivate and justify the continuing development of enhanced network elements and services, and distinguish our framework from other approaches. Subsequently we describe in detail the key features of our framework.
3. An evolutionary perspective
Since the arrival of the World Wide Web (WWW) [15], there has been a gradual but steady trend toward provisioning high-function (i.e., intelligent) network elements and increasing the functionality provided by these elements to coordinate end-to-end data transfer between a network (e.g., Web) client and server. This evolution of high-function network elements and their associated services has been driven primarily by two aspects, both relating to the way in which users perceive and utilize network services:
-
The performance of content delivery as experienced by clients.
-
The diversity of client devices (and their connectivity to the Internet and content servers).
It is argued that future evolution of network services will be driven by, in addition to the above aspects, the ability of network elements to provide enhanced multimedia services to any client anywhere [16]. Future network elements must be capable of transparently accommodating and adjusting to client and content heterogeneity.
Terminology
For the ensuing discussion to be comprehensible, we first outline the terminology we adopt in describing the evolution of high-function network services. We refer to a source of network content as the server and a sink of network content as the client. A client requests delivery of content from a server, and in response content flows from the server to the client through the network. When traversing the network, the content may flow through one or more intermediary network elements that together constitute the network fabric connecting the clients and servers. Intermediaries can either be routers that relay (i.e., forward) network data onto one or more attached networks, or proxies that may perform more intelligent forwarding and transformations on the network data.
A number of clients may be communicating simultaneously with a number of servers over multiple unicast [17, 18] or multicast [19, 20] sessions. While a unicast session coordinates content transfer between a single client and server, a multicast session coordinates content transfer between a server and a set of clients wanting to receive the same content. Multicast transfers are much more efficient (in terms of server processing load and network bandwidth) than unicast transfers, if the same content is to be delivered from a server to one or more clients.
Location and nature of intelligence
Figure 4 illustrates the alternative locations in which intelligence about available network services may be placed. From a client's perspective (location 1), this intelligence typically constitutes awareness of the services provided by different (possibly replicated) network servers. It may also include estimates of the end-to-end performance the client can expect when communicating with functionally equivalent servers. Since clients and servers can be heterogeneous, this intelligence may also include knowledge of the functional disparities (in terms of supported protocols and content types) between the servers. The client can utilize this knowledge to select the most appropriate server and mechanism in order to obtain the available content.
Figure 4
As an alternative, this knowledge (and the associated burden) could be entirely or partially transferred to the individual servers (location 3), or could reside inside the network (location 2). In the rest of this section, we present the evolution of high-function network elements and services (i.e., placement of intelligence in location 2 in Figure 4). Figure 5 succinctly illustrates the variations in client-server connectivity capturing this evolution. We highlight key shortcomings in supporting continuous media applications and accommodating content and client heterogeneity. We consider the following cases: no intermediary, single intermediary, multiple intermediaries, and intelligent intermediaries.
No intermediary
Figure 5
This represents the simplest form of Internet client-server communication, characterized by a direct (logical) connection between the client and server, i.e., with no intermediary proxies (case 1 in Figure 5). Data must still flow through the set of network routers that provide the connectivity between the client and the server. There are a number of advantages of this simple configuration. No extra functionality is required in the network elements to provide connectivity between the client and server. Moreover, the client may utilize local intelligence to select one from a set of (replicated) servers with which to communicate. As mentioned above, this intelligence is usually based on the measured end-to-end performance (latency and bandwidth) between the client and the servers.
However, despite the aforementioned advantages, this configuration suffers from a number of significant drawbacks. First, it requires direct connectivity between the client and server; such connectivity is at times not possible, and either may be explicitly forbidden for security reasons or may result in poor end-to-end performance. Often the server with which a client desires to communicate resides inside the administrative domain of a company. To maintain a high degree of security, the company may not allow direct access to the server's content. In fact, the server may not even be on a network that is directly connected to the outside world, i.e., the Internet. Even with direct connectivity, performance (and hence scalability) can suffer because each client must communicate with the server for content access. This results in higher server load and wastes network bandwidth, because there is no provision for content reuse among different clients. In another situation, mobile users communicating directly with one another may experience poor end-to-end network performance because of highly lossy and unreliable wireless links. Second, the client and server must agree on the type of content delivered; i.e., the client must be able to process/display the delivered content. The burden of handling client heterogeneity lies completely on the server, which must support the necessary intelligence and incur additional processing load. This burden is most severe for multicast sessions, which must be handled by the server as multiple unicast sessions. Finally, if the servers cannot support client heterogeneity, an unnecessary burden is placed on the clients to support multiple protocol and content standards and to communicate with the "most compatible" server.
Single intermediary
Several of the above-mentioned drawbacks are partially mitigated by adding a single intermediary proxy between the client and server (case 2 in Figure 5). This proxy may reside at the edge of the network, either close to a set of clients or as a front end to a set of content servers. The main role played by the proxy is to forward network data between the client and the server while often performing limited well-defined operations over such data. There are three primary benefits of using such an intermediary proxy: improved connectivity and security semantics, support for content customization, and higher performance.
Connectivity and security semantics are improved because the proxy can act as a bridge between the client and server, providing support for indirect, authenticated, and restricted access to content servers (e.g., acting as a firewall). The presence of a proxy also provides the opportunity for content customization, i.e., application of useful protocol translation and content transcoding functions to compensate for the "impedance mismatch" resulting from protocol and client heterogeneity. An intermediary proxy can also result in higher performance (e.g., by better supporting client mobility and multicast sessions). The proxy can act as a base station connecting a mobile client to the Internet, improving performance by locally adapting to the characteristics of the wireless link to/from the client [21]. The proxy can also perform content-aware transformations such as adapting the delivery of multimedia content over wireless links via media scaling [22, 23]. Similarly, the proxy can provide some support for managing multicast sessions, freeing the server from this burden and improving overall performance.
However, a single intermediary does not provide sufficient support for large-scale high-function multimedia transfers between heterogeneous clients. Significantly greater functionality is needed to support a diverse set of networked multimedia applications. For example, while the proxy may perform limited transcoding functions, there is no support for media independence and hence for content heterogeneity. The functionality supported is typically ad hoc, and not sufficiently intelligent and programmable. Further, a single proxy cannot naturally capture the distributed and localized nature of multicast communication. More significantly, a single proxy provides poor scalability because of location and capacity constraints. That is, a single proxy cannot exploit the geographic locality naturally exhibited by clients, and it is unlikely to have sufficient capacity to manage and compensate for client and content heterogeneity.
Multiple intermediaries
It is likely that there will be more than one intermediary proxy in the path between a client and a server (case 3 in Figure 5). Each proxy is typically dedicated to a specific function, such as a firewall or a base station, supporting a limited role based primarily on access control and physical location requirements. While there are numerous advantages of specialization of individual proxies, such a scenario results in multiple proxies with scattered, uncoordinated functionality.
Many multimedia applications involving large-scale content distribution naturally require and benefit from support for multicast transfers. In recent years significant attention has been paid to network and protocol support for reliable multicast sessions [24, 25]. These efforts resulted in enhanced and explicit support for scalability and reliability in multicast communication. This is achieved via explicit support for IP multicast groups, appropriate end-to-end protocol support, multiple distributed multicast routers, and local repair capabilities at designated routers [24]. While multiple network elements work together to efficiently carry multicast traffic, no support is provided for the specific requirements of heterogeneous continuous media traffic.
Media awareness and client heterogeneity must be accommodated at the servers and clients via support for multiple media types and intelligent adaptation. For example, client heterogeneity may be supported via multilayered hierarchically coded media streams generated by the server; a receiving client could subscribe to one of these streams on the basis of its capabilities [26]. However, we believe that such approaches are insufficient to accommodate client and content heterogeneity in the scalable transfer of continuous media. Forcing media awareness and adaptation only at the endpoints places an undue burden on clients and servers. Furthermore, this makes it extremely difficult for them to support a variety of media types and representations, not all of which may have well-defined layered representations. Explicit support for media independence and programmability within the network promises to eliminate this burden while providing seamless integration of heterogeneous clients and content.
Intelligent intermediaries
Recently there has been an explosion of interest in intelligent intermediaries within the network, as evidenced by various research efforts in active networks [10, 11, 27]. Various forms of intelligent intermediaries have been proposed. For the WWW, one or more proxy caches may serve as intelligent intermediaries, improving content delivery performance significantly by caching popular Web content [28]. These proxies can be placed and configured to exploit geographic locality and client access patterns to reduce network server load and thus improve scalability. In other forms, proxies can act as intelligent front ends [29] performing load balancing at a server farm. For mostly static multimedia data, special proxies may perform advanced content translation and distillation functions to partially support content and client heterogeneity [30]. Advanced proxies may also have the capability to perform load balancing in conjunction with content awareness and affinity [31]. Specific support for continuous media filters has also been proposed for (programmable) heterogeneous networking [32-34].
The active network proposals to date target network level programmability without being content-aware, in contrast to our proposal, which targets content-aware application-level programmability. Moreover, while supporting varying degrees of media awareness, the aforementioned approaches either apply only to static content (i.e., do not extend directly to continuous media) or are media- and content-specific. We argue that the explosion of media types necessitates a network infrastructure that frees the clients and servers from such media dependency and the burden of managing content and client heterogeneity. This is particularly critical for continuous media because of its demanding resource requirements for processing, translation, and transmission.
That is, the next-generation network infrastructure must combine media awareness with a high degree of intelligent adaptivity in order to achieve true media independence and serve heterogeneous clients. In the next section, we present our framework for a scalable network infrastructure realizing media independence via programmable network intelligence.
4. Framework
In this section, we take a detailed look at key issues on the realization of G1 via a group of content-aware intermediaries which we call the "overlay." The G1 group decouples multimedia sources from receivers. Whereas G1 explores the mechanisms to provide media independence to heterogeneous receivers, the G2 goal is to realize efficient source-to-sink traversals through the group. To this end, in the next section, we show the realization of G2 via the ability to program this G1 group to react to network and application requirements. The resulting G2|G1 group of intermediaries is better described as a programmable overlay for the relay of multimedia content. Next, we proceed with the specification of such overlay groups, which are hereafter called "clouds" for distinction in this discussion.
Nodes
The cloud consists of four new types of high-function network elements: receivers, sources, distributors, and controllers. The cloud opens to users at well-defined points referred to as exterior nodes. Multimedia receiver nodes are exterior nodes that open the cloud to subscribers (S), whereas multimedia source nodes open the cloud to providers (P). A provider supplies content to the cloud, and a subscriber requests content from the cloud, leading naturally to a supply-and-demand model in which the cloud serves as an arbitrator for both the routing and brokering of content. The distribution of multimedia content originates at one or more sources and terminates at zero or more receivers. The distribution of multimedia content from a source to a receiver is accomplished by "relaying" such content across one or more of these interior points in the cloud. As argued earlier, multiple intermediaries are used as a way of breaking up the end-to-end distribution problem into k smaller end-to-end subproblems between interior points in the G2|G1 cloud.
The flow of content across any two nodes inside the cloud is referred to as a media flow. The original notion of a media flow was introduced by Clark [12] to expose application-level framing at either end of an end-to-end network connection. In contrast, here a media flow represents a first-class entity4 within the interior of the overlay, as opposed to just at its exterior endpoints. This notion is critical in highlighting the likelihood of exposing a media flow to network capabilities (i.e., multimedia services) at any of these interior points. These points are referred to as multimedia distribution nodes. Each distributor opens a service control point as well as a routing control point over the traversal of a media flow through the interior of the cloud.
To coordinate traversals across interior points, specialized nodes referred to as multimedia controllers implement the algorithmic functions necessary to orchestrate these distributed resources. Controllers provide a control entry point to the overlay, whereas sources and receivers realize the media entry points. There are important differences between the transport of media and control information. Whereas media connections support media flows, control connections relay the commands that manage these media flows. There are two basic approaches to implementing their transport:
-
Media information and control data are relayed over a common channel.
-
A separate channel is used to relay control data.
The first approach could be implemented by tagging and interleaving command capsules between media units. The second approach could be implemented by downloading such capsules over a separate control channel. However, the first approach is weaker because it ignores differences between the relay of media and control, requires the disruption of encoding formats, and introduces significant overhead at each node. Moreover, whereas the lifespan of a media connection is tied to that of its underlying media flow, the lifespan of a control connection is instead tied to that of its nodes. Finally, media flows are QoS-oriented, but control flows are priority-oriented. The first approach lacks the ability to prioritize the delivery of control commands, whereas the second approach provides both a QoS media channel and a priority delivery channel for commands as out-of-band data.
The components of our programmable G2|G1 cloud are illustrated in Figure 6. Throughout this paper, let ci be a controller, si a source, di a distributor, and ri a receiver; let ni represent any node {si, di, ri} in the G2|G1 cloud. Figure 7 illustrates some relationships among these components. On one hand, Figure 7(a) depicts a multimedia source (s1) sending a media flow to a distributor (d2). On the other hand, Figure 7(b) shows a view of the interior of the cloud, a controller (MCN) with control connections to the distributors (d1, d3, d5, d7) it manages. Finally, Figure 7(c) depicts a multimedia receiver (r1) with an incoming media flow from a distributor (d5).
Figure 6
Figure 7
Distribution paths The connection of a source to a receiver through distributors realizes an end-to-end traversal path through the cloud, referred to as a distribution path. A distribution path is said to emit a media representation. The distribution path is used for the relay of content m from P to S; for example, Figure 8(a) shows the distribution path p[P1, S2, m]. The distribution path p[P, S, m] is a content-aware distributed object that describes a controllable "walk" over the overlay. These walks over the overlay typically start at some source s and terminate at some receiver r while traversing one or more distributors di. For example, Figure 8(b) illustrates the traversal of the distribution path p[P1, S2, m] between a source and a receiver over three interior points (d1, d3, d5) in the cloud. While on one hand the distribution path represents an exterior end-to-end problem, on the other hand, each segment of the distribution path also represents an interior end-to-end subproblem.
Figure 8
Path segments
Content is relayed between any two nodes in a distribution path via a content-aware media connection referred to as a path segment. A path segment p[n1, n2, m] is a content-aware distributed object between two nodes n1&n2 in the G1 cloud that is said to emit a representation [e.g., (m)] of a media flow associated with some content m. This is represented as
p[n1, n2, m] (m),
|
(1)
|
where (m) is one possible content representation [e.g., (m), ß(m), ] for content m. A distribution path is composed of one or more path segments [for example, Figure 8(a) shows the distribution path p[P1, S2, m], and Figure 8(b) shows one of the constituent path segments (i.e., p[d1, d5, m]) over which the distribution path p[P1, S2, m] is realized]. Path segments enable dynamic rerouting of distribution paths over the distributor overlay and provide the reconfiguration granularity used to branch off (i.e., reuse) a distribution path. For example, Figure 8(c) shows reuse and reconfiguration of path segments.
Next, we formally define the relationships among nodes, distribution paths, and path segments.
Object management services
A distribution path can be represented as an ordered tuple that lists nodes on a traversal ordered from provider to subscriber. Equivalently,
|
p[P, S, m] = (P, s, {d}+, r, S).
|
(2)
|
Now, we refer to the predecessor pre(ni) of a node ni p[P, S, m] as simply ni-1 and to its successor suc(ni) as simply ni+1. For example, if p[P, S, m] is a distribution path, then p[ni, ni+1, m] is its ith underlying path segment. For convenience, we define the term "core distribution tree," CDT '(m), as the set of distributors in a distribution path (i.e., {d}+).
A path segment p[ni, ni+1, m] is controlled by its spanning node ni. To reflect this asymmetry, we classify path segments as incoming or outgoing. At node ni, incoming path segment in(pid, ni) of node ni in a distribution path is denoted by p[pre(ni), ni, pid]. Similarly, the outgoing segment out(pid, ni) of node ni in a distribution path pid is denoted by p[ni, suc(ni), pid]. The path identifier pid is used to "tie in" (within a node) incoming segments to outgoing segments by associating segment identifiers sharing the same path identifiers.5
Object directory services (e.g., object request brokers [35]) can be used to track these distributed objects, nodes, and their relationships. On one hand, to track a distribution path p[P, S, m], the controller creates a globally unique identifier pid(p[P, S, m]) to each distribution path. This identifier is referred to as the path identifier. An object proxy [35, 36] of the path identifier is made available to every node on its corresponding distribution path. On the other hand, to track a path segment, each node in a distribution path associates a unique identifier sid(p[ni, ni+1, m]) with each outgoing path segment. This identifier is referred to as the segment identifier. The segment identifier is a distributed object known only by nodes ni and ni+1 at either side of the corresponding path segment.
Path and segment identifiers allow the representation of the relationships between path segments and distribution paths as follows. First, a distribution path can be composed of one or more path segments. To represent this relationship, it is necessary to determine, given a path identifier, the segment identifiers associated with it. To this end, each path segment in a distribution path is associated with the same path identifier. Second, a path segment can be associated with zero or more distribution paths. To represent this relationship at a node, it is necessary to determine, given a segment identifier, all path identifiers associated with it. At the controller, given a path identifier, it is necessary to determine all nodes found along the corresponding distribution path. This way, only nodes in a distribution path must be tracked by the controller.
Capabilities
The path segment realizes a content-aware connection used to expose a media flow to application-level processing at either of its ends. As content is relayed through the cloud, distributors are capable of exposing an incoming media flow to application-level capabilities (e.g., multimedia services). A capability is defined as an application-level filter f() that takes as input a media flow having content representation (m) and outputs a media flow in content representation ß(m). Equivalently,
f[ (m), t] = [ß(m), t td],
|
(3)
|
where td is an upper bound on the aforementioned application-level processing overheads. Examples of capabilities are encryption of a media flow, multiplexing of multiple incoming media flows into a single outgoing media flow, content-based filtering, content translation, etc. Capabilities such as adaptive filtering can be used to adapt the incoming media flow to the requirements of outgoing path segment(s) [22, 23, 37].
Assuming that it is possible to provide on demand any capability at any distributor, the concatenation of path segments (i.e., the cascading of distributor capabilities) appears unnecessary. This fails to take into account, however, that the path segment abstraction is used not only to satisfy G1 but to satisfy G2 as well. On one hand, because resources are finite, cascading capabilities across distributors are needed to span distribution paths capable of emitting representations that would otherwise not be possible because of resource allocation constraints. On the other hand, as discussed later, each distributor also serves as a control point over the routing of a media flow. Consequently, the cascading of capabilities provides the means to introduce multiple route control points over the routing of a media flow over the overlay. Moreover, the cascading of capabilities increases the opportunity of reusing path segments.
5. Management of the overlay
In this section, we explore three important issues on the management of the overlay: global management of capabilities, end-to-end management of distribution paths, and subscription management.
Capability management
Clearly, a management authority is needed to manage capabilities throughout the overlay. To facilitate the discussion, controllers are entrusted with the management of (subsets of) distributors and their capabilities.6 In terms of our algorithmic needs, distributors can be represented in terms of a combination of application and network state. To support capability management, it is necessary to track each distributor in terms of its currently loaded capabilities (Ci) and their respective capacities (|Ci|). Moreover, to support application-level routing across distributors, it is necessary to differentiate among distributors, for example, in terms of their total capacity Tallot(d) and their utilization state U(d). To this end, a distributor d having k capabilities and total capacity Tallot is modeled as a tuple of the form
|
(Tallot, {Ci, |Ci|}k) where Tallot(d) =
|
k
|
|Ci|.
|
(4)
|
|
|
i=0
|
Route control versus service insertion points
Will more than one distributor be needed for the purposes of transforming content into a suitable representation? On one hand, because resources are finite, the cascading of capabilities across distributors allows the generation of representations that will otherwise not be available. Furthermore, each distributor serves as an explicit control point over the routing of a media flow. Moreover, multiple distributors can be used as application-level relays (routers) by routing a media flow via a "pass-through" capability at each such distributor. Each such distributor increases the effectiveness of application-level routing over the overlay, since each introduces an additional control point for the routing of media flows through the underlying packet-switched networks. However, each distributor introduces significant application-level overhead to the end-to-end delay between sources and receivers (i.e., when compared to a traditional network router). Clearly, a tradeoff exists between the number of distributors used for routing control and the cumulative application-level overhead (i.e., td ni p[P, S, m]) introduced by such distributors.
The overlay also opens new opportunities in the integration of route and service control. Multiple distribution paths sharing a common link may benefit from network (as opposed to client-based) integration. Our approach differs significantly from that of a low- level packet flow multiplexor in that our multimedia concentrator capability creates the opportunity to introduce awareness of receiver/subscriber capabilities. For example, the concentrator could rely on an earliest-deadline-first scheduler to remove media frames that end up being simply overhead to a particular class of subscribers. This approach results in several benefits. First, it shifts media integration from the subscriber to the overlay, thus easing the tasks of "low-CPU-powered" subscribers. Second, it induces common routing behavior for multiplexed media flows that, after all, are to be provisioned to the same receiver. On the down side, the concentrator capability reduces the ability of the overlay to provision under scarce available bandwidth. A concentrator imposes a harder constraint over admission control--finding k small holes across k links is easier than to find one large enough on one link.
In a more subtle point, the capabilities available on a distributor also provide the overlay with a way of indirectly biasing the traffic on the underlying packet-switched networks (heretofore referred to as the path inducement effect of the overlay). For example, if a distributor were exclusively allocated to a specific capability (for example, a flow decryption distributor), such a distributor would then be capable of a significant path inducement effect on the routing of most media flows requiring such services. The routing of such media flows would likely consider and/or reuse path segments through this distributor. On the other hand, if the distributor were to have too many capabilities, its path inducement effect would be decreased. Thus, choosing capabilities for a distributor is a difficult task. For this reason, means to dynamically reconfigure capabilities on a distributor become desirable.
Dynamic reconfiguration
Because we may not know in advance which capabilities are most needed, the ability to dynamically reconfigure capabilities in a distributor is desirable. In particular, we would like to
-
Monitor and forecast capability usage.
-
Pre-load critical capabilities into distributors.
-
Dynamically deploy the remaining capabilities throughout the overlay.
Our approach is to pre-load popular capabilities into distributors and pre-allocate spare capabilities and capacities through the overlay so as to make room for dynamic downloading of less popular capabilities. Spare and unused capacities provide controllers with a highly available resource pool (as a swap space for the overlay) that would be used, for example, to swap less popular capabilities for more popular ones. Similarly, the capacity of a capability would be dynamically reconfigured, for example on the basis of demand for a multimedia service. On the basis of overlay performance (e.g., the response of the overlay to demand), a controller could then recommend that a particular distributor update its capability banks and, if necessary, download new capabilities. One approach to implement the downloading of capabilities (i.e., software) across heterogeneous distributors would be via the use of platform-independent program capsules (e.g., Java** [38] program capsules). In contrast to other proposed active networks such as Mobiware [33], our mechanism targets the aggregate long-term performance of the overlay--capabilities are downloaded as needed in response to the performance of the overlay as opposed to the needs of a particular media flow (i.e., steady vs. transient response). On one hand, our approach enables the overlay to deploy and optimize capabilities based on criteria such as critical mass. On the other hand, our approach increases the response time of the overlay to adaptation of per-flow needs.
Measurements approach
To branch new path segments from a given distributor, the controller determines whether it is possible to generate a particular representation (i.e., as needed for service provisioning) at a distributor (i.e., as needed by routing control). For this reason, the capacity and availability of resources at each distributor are tracked by the controller. It is important, however, that the overhead of such tracking be low. If the updates are too frequent, too large an overhead is incurred. On the other hand, if the updates are too infrequent, the controller could end up relying on an incorrect assessment of the state of its distributors when constructing distribution paths across the overlay. Elsewhere [39], we presented a robust statistical process-control framework to derive reliable long-term measurements among widely distributed resources. Our approach consists in relaxing the need for periodic measurements by relying on the use of a special kind of watermark based on well-known statistical process control principles [40]. Watermarks are used to remove heterogeneity across distributors and to reliably measure the utilization state U(d) of a distributor. High and low watermarks are used to bound a critical region of capacity on the distributor (for example, (low, high) = [84%, 93%]). The region below the low watermark is referred to as the "green zone," the region between watermarks is referred to as the "yellow zone," and the region above the high watermark is referred to as the "red zone." Consequently, a distributor is said to be in a red, yellow, or green state by the controller in terms of its reported overall available capacity. The distributor needs only to report long-term changes around its watermarks. Moreover, the actual definition of a watermark needs to be known only by a distributor. This way, watermarks serve a dual functionality. Figure 9 illustrates the use of this scheme across two distributors d1 and d2. The approach is summarized as follows.
Figure 9
-
When a distributor capability is green, the controller may freely allocate new paths to this capability.
-
When a capability at a distributor is yellow or red, the controller does not allocate the capability to new configuration requests.
-
A distributor considers new configuration requests only when green. Note that race conditions are possible. Specifically, a controller may believe a distributor to be green while the distributor is no longer green. In such case, depending on its own admission controls, a distributor may or may not accept such requests. The region between low and high watermarks provides resiliency for the delay between distributor reports and the generation of new paths.
Illustration
To support configuration of capabilities across the overlay, controllers must also be capable of assessing the state of the overlay. In contrast to traditional routers, distributors require accounting for application-level costs during routing over the overlay. New ways are needed to represent and integrate application and network costs into the rerouting of distribution paths across the overlay. Table 1 illustrates one approach for the integration and representation of application and network costs for content representation (m). The (ith, jth) entry on the table contains the tuple [rttij, cij] which is associated with path segment p[ni, nj, m] (m). Whereas rttij equals the round-trip time between ni and nj, cij represents a weight factor associated with such a path segment. On one hand, rttij represents the network cost of a path segment. On the other hand, cij represents application-level processing costs associated with such a path segment. Since either cost factor can be arbitrarily large, routing across such a path segment must consider both cost factors.
|
Table 1 Integrated routing lookup table. The controller maintains the state of network conditions between distributors and of application-level processing costs within each distributor. Unlike traditional network elements, distributors introduce content processing overheads that must be accounted for during the routing of path segments and distribution paths.
|
|
(m)
|
n1
|
n2
|
n3
|
|
|
n1
|
(0, 1)
|
rtt(n1,n2),c12
|
rtt(n1,n3),c13
|
|
n2
|
rtt(n2,n1),c21
|
(0, 1)
|
rtt(n2,n3),c23
|
|
n3
|
rtt(n3,n1),c31
|
rtt(n3,n2),c32
|
(0, 1)
|
|
Table 2 illustrates, via an example, the tracking of routing information (for content m). The table enables the controller to manage content m (e.g., allocate, reuse, destroy, adapt) across the overlay. The (ith, jth) entry in this table contains the tuple [f(m1 m2)@b(m2)@(ni nj)]. Whereas f(m1 m2) represents a multimedia capability taking as input content in representation m1 and transforming it to representation m2, b(m2) represents the requirements of such a multimedia capability, and (ni nj) represents the set or group of distributors possessing such a capability. Because membership on this set is dynamic, each tuple can be visualized as a process group provisioning the corresponding representation (e.g., m2).
Figure 10
Table 2 Capability lookup table. To manage path segments and content representations [ (m), ], the controller must track available capabilities f[ (m) to ß(m)] within the overlay, in addition to their requirements b[ (m)], location (e.g., nj), and utilization state (shown in Figure 10). Note that in this example not all capabilities are found at every node.
|
|
|
m
|
(m)
|
ß(m)
|
(m)
|
|
(m)
|
|
|
f(  )@b( )@(n1, n4)
|
|
ß(m)
|
f( ß)@b(ß)@(n2)
|
|
|
(m)
|
|
f(ß )@b( )@(n3,n4)
|
|
|
Figure 10 illustrates, via an example, the tracking of four different distributors by a controller. The example shows one particular content type m (e.g., audio) and three known representations [ (m), ß(m), (m)] (e.g., PCM, H.323, and text). As stated earlier, the ability to provision a representation is referred to as a network capability (or multimedia service). In this example, the network capabilities ( ß, , ) are known to be deployed on the overlay, as shown in Figure 10. In this particular example, distributor d1 is configured 40% for capability ( ß), with its remaining 60% reserved for capability ( ). Similarly, distributor d2 is configured 60% for capability ( ß), but the remaining 40% is reserved as spare capacity for on-demand deployment of a new capability. Finally, distributor d3 is configured 100% as spare capacity reserved for on-demand deployment of new capabilities.
Figure 10 also highlights several important issues on the management of capabilities. First, the allotment of a distributor can be spread across to zero or more capabilities. For example, the d3 allotment is fully dedicated to the capability ( ), whereas both d1 and d2 provide partial allotment for the capability ( ß) and dedicate the rest to other capabilities [( ) and ( ß)], respectively. Second, part of a distributor's allotment could be set aside by the controller for later use (for example, 40% of the d4 allotment is set aside as spare capacity for use by the overlay).
Management of distribution paths
In this section, we discuss issues on the generation of distribution paths and the branching of path segments. A distribution path is said to be feasible if there exists some distributor along the path capable of emitting the requested content representation. The following algorithm determines whether a feasible distribution path exists. The algorithm attempts to find the availability of the requested representation as close as possible to the requesting receiver.
Algorithm 1
Given a core distribution tree CDT '(m) ß(m) provisioning representation ß(m) of content m, determine whether placement of a provisioning request R from subscriber S at receiver r for representation (m) results in a branch of the core distribution tree CDT '(m).
-
If
(m) is already provisioned at receiver r (for example, as needed for some other subscriber S' or for caching purposes), the controller requests r to multicast (m) to S as well.
-
Otherwise, find and rank the set of distributors already provisioning
(m) according to some cost criteria such as estimated delay, utilization state, or capacity.
-
If the above set is not empty, find the closest distributor di willing to create the path segment p[di, r, m]
(m).
-
Otherwise, delay this request until critical mass is created according to some cost criteria such as the number of similarly pending requests.
-
Find and rank the set distributors in CDT '(m) provisioning representation ß(m) according to some cost criteria such as estimated delay, utilization state, or capacity.
-
Find the closest distributor dj willing to apply network capability ß(m) to
(m) and branch out the path segment p[dj, r, m] (m).
Otherwise, if no distributor di is found, (m) cannot be provisioned at the current time by the overlay. In such a case, it is desirable to wait and determine whether a critical mass of (m) requests exists to construct a new core distribution tree CDT"(m) capable of emitting (m), and then apply Algorithm 1 to the critical mass of requests. It is also desirable in such a case to migrate the (m) branches out into the new core distribution tree CDT"(m).
Once the need for a new distribution path is established, the controller instructs each node ni-1 along a distribution path p[P, S, m] to set up and reserve an outgoing path segment to its successor ni. After reserving outgoing path segments, the controller configures individual path segments into one distribution path (pointed to by pid) as follows. At each node ni along a distribution path p[P, S, m], the set of incoming path segments configured for pid are associated with the set of outgoing path segments configured for pid, thus creating the end-to-end distribution path p[P, S, m]. Each node associates the resulting coupling to a path identifier pid(p[P, S, m]).
Algorithm 2
A controller orchestrates an end-to-end distribution path in a manner analogous to that of the two-phase swipe used in RSVP [5]:
-
In the forward swipe,
-
The controller "requests" each node ni in a distribution path to establish an outgoing path segment to its designated successor ni+1 (or successors when constructing a multicast tree).
-
At each node, network resources (e.g., sockets, bandwidth, buffers) are reserved and associated with the distribution path pointed to by pid.
-
To connect a source to a distributor, the controller instructs the source to connect to its distributor(s).
-
Once connected, the successor connects to its own successor(s).
This process is repeated until the subscriber is reached, at which point the distribution path physically exists for the first time.
-
Once the distribution path p[P, S, m] is physically established, a multimedia flow can be started over it. The backward swipe accomplishes this task.
-
To enable the flow of media between a source(s) and receiver(s), the controller starts the backward swipe by signaling receiver(s).
-
The receiver then signals its predecessor(s).
-
The predecessor then signals its predecessor(s), and so on.
This process is repeated until it reaches the source(s), at which point the distribution path starts relaying content for the first time.
The tear-down of a distribution path is asymmetrical (based on a second-chance-to-live principle [41]) to its creation because we would like to foster the reuse of path segments. A node sends a tear-down request to its predecessor to indicate that it is no longer interested in an incoming path segment. The predecessor signals its own predecessor and decreases its reference count for this path segment. If this path segment is no longer in use, a tear-down timer is initiated, causing the tear-down of the path segment to be delayed. This provides an opportunity for the reuse of this path segment by other media flows. If the timer expires and the path segment is still not being used, the path segment is destroyed and data structures are updated.
In addition to the above setup functions, each node also implements path segment monitoring and maintenance tasks (see Table 3). Each node ni in a distribution path estimates the performance of an outgoing path segment p[ni, ni+1, m] with respect to the requirements of its corresponding distribution path. An assessment that indicates a need for remedial action at the node triggers fine-tuning of the media resolution on the outgoing path segment (i.e., p[ni, ni+1, m]). If the assessment indicates that remedial action is needed at a predecessor, back-pressure is applied to request predecessors to adapt to the conditions of the outgoing path segment p[ni-1, ni, m]. It is easy to see that each receiver oversees the dynamic reconfiguration for all of the distribution paths leading to it. When a receiver hands off to another receiver, the receiver also transfers the ownership of the distribution path.
|
Table 3 Adaptive control primitives for media flows.
|
|
|
Node
|
Command
|
|
|
Distributor
|
adapt(ni, ni+1, pid, (m))
back-pressure(ni, ni-1, pid, m) |
|
Illustration Figure 11 depicts the realization of a distribution path in response to a subscription request. In order to respond to a subscription request and span a distribution path, overlay nodes perform several control tasks and exchange control messages with other nodes. A subscriber S is first authenticated, authorized, and assigned to a receiver r. Although each node is independently addressable, only the controller must know their addresses. The controller is aware of its distributors, receivers, and sources, as well as of other controllers in the overlay. The controller uses the configure() command to instruct a node ni-1 along a distribution path p[P, S, m] to set up an outgoing path segment to its successor ni. Admission controls are applied, and resources (sockets, buffers, etc.) are reserved at each node. After reserving an outgoing path segment, the controller uses the connect() command to integrate individual path segments into one distribution path (pointed to by pid). Every media node implements the associate() command, used to concatenate an incoming path segment to an outgoing path segment and associate the resulting coupling with a path identifier pid(p[P, S, m]). At each node ni along a distribution path p[P, S, m], the set of incoming path segments configured for pid is associated with the set of outgoing path segments configured for pid, thus creating the end-to-end distribution path p[P, S, m]. It is possible (and desirable) for path segments to be reused across distribution paths. Once the distribution path p[P, S, m] is physically established, a multimedia flow can be started over it. A node ni in a distribution path uses the distribute() command to request the relay of multimedia content to its predecessor. The process is started by the receiver, which sends a distribute() command to its predecessor(s), which execute the command and then issue distribute() commands to its predecessor(s). The process repeats until it reaches the source(s), at which point the distribution path starts relaying content for the first time. Last, the close() command is used by a node ni to tear down an outgoing path segment p[ni, ni+1, m]. A node may choose not to immediately release an outgoing path segment until an associated timer expires in order to encourage its reuse by other distribution paths. Table 4 summarizes these setup and control primitives. Figure 12 depicts the scope of these primitives with respect to nodes in a distribution path.
Figure 11
Figure 12
|
Table 4 Content distribution primitives for handling media flows.
|
|
|
Node
|
Command
|
|
|
Source
|
connect(p[ni, ni+1, m], pid)
|
|
Distributor
|
associate(cidin, cidout, pid)
|
|
Receiver
|
distribute(ni, pid)
close(ni, ni+1, pid)
|
|
Subscription management
Service provisioning is found in utility models, and its applicability to the Internet has recently been the focus of increasing research [2]. The overlay represents an open distribution network (much like the Advanced Intelligent Network [42]) being shared by independent and, possibly, competing content providers over which content is routed in the form of media flows to subscribers in an efficient and accountable manner. To provision subscription requests, the overlay must perform several new application-level tasks currently not found in today's end-to-end routing of multimedia subscriptions to multimedia servers. In particular, the overlay must now perform several tasks (Table 5) involving different amounts of application-level negotiation of resources and services distributed throughout the overlay, where each such negotiation is possibly performed according to different optimization policies. In particular, the overlay performs the following tasks:
|
Table 5 Handling of a provisioning request p[x, S, m] by the controller with the ultimate goal of producing a binding between some source x = P and the requesting receiver r(S).
|
|
|
Step 1
|
Bind provider (x = P) to S for provisioning of content m.
|
|
Step 2
|
Determine suitable content representation (m) for content m.
|
|
Step 3
|
Generate feasible distribution path p[P, S, m] (m).
|
|
Step 4
|
Secure the distribution path.
|
|
Step 5
|
Create record of transaction.
|
|
Step 6
|
Enable media flow to path p[P, S, m].
|
|
-
Negotiate with candidate content providers for a request and determine their suitability.
-
Select a binding to a suitable provider x = P according to some cost criteria.
-
Explore possible distribution paths so as to span a feasible distribution path from P to S through r suited to the capabilities of S while attempting to make best use/reuse of existing overlay resources.
To subscribe to multimedia services, a subscriber S places a query to the controller requesting a local point of entry to the overlay. The controller responds by determining a suitable receiver r for S. The entry point to the overlay is simply the address of any such receiver. Note that whereas the address of the controller is well-known to subscribers, the addresses of other nodes (such as sources and receivers) need not be known by subscribers. Conventional authentication and authorization techniques could be enforced by the controller to entitle the subscriber to global overlay services. Additionally, individual nodes (e.g., receivers and sources) could also request the authentication and authorization of a subscriber to enforce access controls. Successfully authenticated and authorized subscribers are referred to as trusted subscribers.
Once a subscriber S is considered trusted, it places a provisioning request R (e.g., for content characterized by m at a cost of up to c* units per minute) to its receiver r to request content from the overlay. Because subscribers are likely to be heterogeneous, a receiver must address the "impedance mismatch"7 between subscribers and network capabilities. To do so, a subscriber characterizes its capabilities to its (current) receiver. For example, recall that in our generic content-adaptive application model (shown in Figure 2), each subscriber/client was characterized in terms of a capabilities bank, used by the overlay to select and configure those multimedia services found to be suitable for the capabilities of this subscriber. In our model, subscriber characterization is used by a receiver to augment a subscriber-oriented request into a network-oriented request p[x, S, m] (where x may be either known or unknown). The receiver then places such a provisioning request p(x, S, m) (on behalf of S) to some controller on the overlay. The controller finds a suitable provider x = P and generates a feasible distribution path from P to S through r suited to the capabilities of S.
In a large network, subscribers necessitate a way to discover content of interest. One approach is to allow subscribers to explicitly choose providers; another approach is to defer such a task to the overlay. If the number of "channels" grows too large, the latter mechanism becomes more attractive. Regardless, we assume that the controller owns the responsibility of binding a subscriber to a provider. Receivers then forward subscription interests to a controller, which explores, negotiates, and secures subscription offers from different sources. This process involves exploring the feasibility of end-to-end application constraints (e.g., cost constraints and QoS parameters) as well as finding matches for subscription interests (e.g., keywords and content features).
In general, it is desirable for sources and receivers to regard the interior of the overlay solely as a distribution network, since the overlay is open to competing providers as well as subscribers. Therefore, means to secure the relay of content across the interior of the overlay (such as an encrypted media tunnel between sources and receivers) are desirable. Moreover, because of differences between subscribers, security mechanisms must address differences among providers as well as among subscribers. When required by subscribers, encryption measures could be relaxed from those associated with a shared channel with discouragement measures against casual content snooping. For example, when provisioning subscribers having reduced computational capabilities, selective encryption of a media flow (e.g., the encryption of critical media frames such as I-frames) could be used to trade off the robustness of content privatization against its corresponding computational decryption overhead.
In general, service provisioning requires a means of charging users. The construction of an information economy [44] that would encourage usage, generate revenue, and avoid chaos is a complex problem. Aside from asymptotic behavior, it is even unclear whether we should charge for connection time, contracted bandwidth, or rely on application-defined costs (e.g., per-capability service fees). Regardless of the charging scheme used, the framework must provide the ability to monitor usage throughout the overlay. However, tracking and monitoring are expensive functions that affect the scalability, security, and fault tolerance of the overlay. Centralization of billing is weak in terms of scalability and fault tolerance, but in general it is better for security. Distributed billing is better in terms of scalability and fault tolerance, but weaker in security. To this end, upon securing a distribution path, the receiver triggers the metering (i.e., usage tracking) of the path segment to a subscriber. Once a receiver terminates a subscriber connection, the receiver forwards the usage record to the source node(s) associated with the corresponding path segments. This is a fair model for the billing of actually delivered content when accounting for subscriber disconnection and failures.
6. Realization
In this section, we motivate extensions to existing technologies that must be developed in order to implement the proposed framework. In particular, we focus on networking protocols such as IP multicast, the experimental multicast backbone (MBONE), receiver-driven layered multicast (RLM), and Resource reSerVation Protocol (RSVP). We briefly describe each of these technologies and highlight their shortcomings in the context of building programmable multimedia overlay networks.
Existing technologies
IP multicast [19, 20] provides the underlying network support for the efficient delivery of datagrams to groups of receiving hosts (i.e., receivers) referred to as multicast (host) groups. Host groups are identified by multicast group addresses. A datagram sent to a multicast group address is forwarded to all of its members by relaying the datagram over a set of routers capable of processing multicast addresses (i.e., multicast routers). Multicast routers that interface with one or more hosts are referred to as edge routers, whereas multicast routers that interface only with other routers are referred to as interior routers. Since individual receivers in a host group may be distributed across the whole Internet, it is likely that forwarding may traverse multiple multicast routers. To support efficient forwarding of multicast datagrams, IP multicast specifies two key functions: a group membership protocol and a multicast routing protocol. IP multicast specifies the Internet Group Management Protocol (IGMP) [45] to provide lightweight group monitoring at multicast edge routers. IGMP tracks multicast group addresses subscribed to by hosts (i.e., receivers) on the same network as an edge router. IP multicast specifies its routing function through distributed routing algorithms such as the Distance Vector Multicast Routing Protocol (DVMRP) [46] and Protocol Independent Multicast [47, 48] so that multicast datagrams sent to a group can be forwarded to the group members as efficiently as possible. A limitation of IP multicast routing is that it is not content-aware, since routing and group functions are driven only on network indicators (e.g., time-to-live counts).
MBONE [49] refers to the experimental "multicast backbone" that has been implemented on today's Internet by selectively deploying IP multicast technologies in routers. Since not all routers in the Internet are multicast-capable, MBONE utilizes "tunnels" to relay multicast datagrams across a cloud of non-multicast-capable routers. Upon entering a tunnel, IP-multicast packets are encapsulated as unicast IP packets (using the "IP-over-IP" [50] protocol). This encapsulation is removed upon exit from the tunnel, and the packet continues to be forwarded as a multicast datagram. MBONE provides user discovery and subscription of multicast groups via the Session Description Protocol (SDP) [51], which supports operations such as lookup and join (i.e., bind/subscribe) on the active multicast groups. IP multicast focuses solely on efficient forwarding of identical content to heterogeneous hosts. However, as discussed earlier, because of receiver heterogeneity and network congestion, it is highly desirable to adapt content delivery to match receiver capabilities and the network performance experienced by each receiver.
Receiver-driven Layered Multicast (RLM) [26] is a set of enhancements to IP multicast that transparently (to the network) addresses heterogeneity of receivers and the variability in network performance. With RLM, at a media source a layered encoder splits a given media stream m into multiple media layers (l1, , lj), where l1 is the base layer and the rest are enhancement layers [52]. These media layers are transmitted to separate multicast groups, each associated with the same multicast session. An RLM receiver interested in receiving a media stream transparently joins one or more of these multicast groups, depending on its requirements and the prevailing network congestion (as measured by packet loss). That is, each layer is given its own multicast group, say g(m, li). A session name, say s(m), is used to track the mapping of m into its multiple layered encoding process groups g(m, li). RLM receivers seeking s(m) are automatically subscribed to the base layer and to zero or more enhancement layers on the basis of a network-oriented performance indicator. Thus, a receiver may be a member of multiple multicast groups for the same multicast session. Since RLM relies primarily on network congestion/packet loss to determine the operational capabilities/characteristics of the receiver, it lacks explicit awareness of receiver capabilities. The receiver's layering decoder automatically joins and drops the enhancement multicast groups associated with the multicast session. Whereas increases in congestion/packet loss cause enhancement layers to be dropped, decreases in congestion cause enhancement layers to be added.
RSVP [5, 53, 54] is a receiver-initiated end-to-end signaling protocol that provides network-level signaling for reserving resources in end systems and network routers on the path between the sender and one or more receivers. With RSVP, reservation "soft states" are installed at all of the nodes participating in a reservation for an end-to-end flow, and maintained via periodic refreshes. RSVP supports resource reservation for unicast as well as multicast flows, and the primary model corresponds to one reservation per end-to-end application flow.
Necessary enhancements
Realizing our framework requires a number of enhancements and extensions to the above-mentioned technologies, none of which is content-aware. Our network design philosophy was to introduce application-level intelligence (i.e., content and receiver awareness) into the network to promote a more efficient control over the routing of shared media flows. From the viewpoint of receivers, our goals were twofold:
-
To enable receivers to subscribe to content with the best possible quality suiting their capabilities and preferences.
-
To enable sources to provision content without a priori knowledge of the capabilities of the actual receiver audience.
We argue that while the existing technologies described above are indispensable, they do not provide sufficient support to meet our goals. We also provide a glimpse into key extensions that we are developing to embed content awareness into the existing network infrastructure.
As mentioned, IP multicast provides efficient forwarding of the same content to multiple receivers, while MBONE provides the necessary connectivity among multicast islands. However, neither of them is content-aware. At first glance, it may seem that RLM addresses the goals mentioned above. However, this is not the case, as is next explained. By augmenting multicast groups with the notion of optional receiver membership into one or more enhancement layers, RLM retrofits a limited degree of media awareness into multicast sessions without modifying the underlying IP multicast infrastructure. However, for the same reason it fails to satisfy the two goals above.
RLM fails in the first goal in that it relies inherently on a network-oriented characterization (i.e., a congestion signal) to approximate receiver capabilities and preferences. Thus, RLM effectively maps the entire range of receiver capabilities (such as network bandwidth, processing power, and error tolerance) into a single measure of available network bandwidth. This constitutes a loss of capability information about the receiver audience and violates goal G1 outlined earlier. RLM also does not satisfy the second goal. To generate an appropriate layered encoding, RLM implicitly requires each media source to determine a priori the entire serviceable operational space (in terms of receiver capabilities and preferences) of the expected receiver audience. Such an approach is tractable only if relatively few discernible receiver operational points exist and can easily be determined by all media sources.
However, as argued in Section 1 and illustrated in Figure 1, we believe that this will not continue to be the case, given the advent of new client devices with limited processing power, such as thin clients and a variety of consumer hand-held devices, the increasing use of wireless and mobile network access, and an ever-increasing disparity in available processing power and network bandwidth for different clients. Moreover, the RLM approach requires that a feasible layered representation of media content be produced for each media type. However, not all media types may yield layered-encoding schemes capable of spanning all serviceable operational points of interest. As heterogeneity becomes prevalent, the number of discernible operational points in terms of receiver capabilities and preferences will increase. The operational space spanned by such extensive receiver heterogeneity must be efficiently managed for rapid proliferation of multimedia content. We believe that introducing explicit awareness of receiver capabilities and preferences into the network greatly facilitates application-level routing decisions to engineer available network resources in order to differentiate and match receiver heterogeneity and preferences. Such awareness can be harnessed to enhance content delivery while provisioning network resources for a variety of heterogeneous receivers.
As mentioned earlier, scalable provisioning of content-aware network resources involves the dynamic setup, management, and aggregation of end-to-end media flows. For several reasons, RSVP, the primary resource reservation signaling protocol for the Internet, is not immediately suitable for this purpose. First, RSVP is geared toward end-to-end network flows with no explicit awareness of media content. Second, RSVP provides a receiver-initiated mode of resource reservation in which a receiver communicates with the nearest previous-hop router. Finally, RSVP assumes a tighter coupling between the network view of a media source (i.e., the traversing of RSVP "path" messages) and the network view of a receiver (the traversing of RSVP "resv" messages). These reasons necessitate extensions to RSVP before it can be deployed to realize our framework, for which key abstractions (i.e., path segments, distribution paths, content distributors, and content distribution controllers) constitute application-level entities controlling media flows within the network.
Key extensions
We are currently developing key extensions to the IP multicast, MBONE, RLM, and RSVP technologies in order to facilitate realization of our framework. These extensions introduce content awareness and application-state management into multicast routing and network resource provisioning. Two key extensions are content-aware multicast group management and multicast session directory services for discovery and subscription. Developing these extensions involves
-
Extending IP multicast semantics to associate content representation with multicast sessions.
-
Grouping together receivers with homogeneous characteristics and preferences for more efficient content delivery.
-
Defining new group membership policies based on content representation and receiver capabilities.
Details of the extensions and implementation of our framework are beyond the scope of this paper; they are the subject of forthcoming papers.
7. Concluding remarks
We have presented an evolutionary framework for multimedia distribution services that efficiently addresses the needs of groups of heterogeneous receivers. The framework integrates several evolutionary measures. First, we have introduced receiver and content awareness into the network. Second, we have modeled the distribution of content between sender and receiver as a relay across multiple intermediaries. Finally, we have exposed media flows to content-aware network services. We propose the realization of these measures by overlaying a programmable multimedia network over the packet-switched network. We have modeled the overlay as a content-aware active network and illustrated the benefits of content-aware programmability over the routing and management of media flows.
The framework addresses various degrees of heterogeneity among "distribution paths" to receivers. The distribution path realizes a content-aware connection between a distribution tree and one or more receivers. The framework enables efficient generation of distribution paths to dynamically adapt to groups of receivers, thereby creating a programmable network capable of autonomously generating branches out of the core distribution tree of a multicast.
Our approach treats media flows as first-class entities within the interior of the overlay, in contrast to just at the exterior of the overlay. This allows the overlay to exploit application-level management of distribution paths and overlay network services to influence the routing of multimedia payloads on the underlying packet-switched network. In summary, our framework provides a vehicle for retrofitting today's packet-switched networks into the next generation of intelligent and programmable multimedia networks.
Acknowledgments
The authors thank F. Hendriks, J. Von Kaenel, L. Lumelsky, the e-Media Technology Group at the IBM Thomas J. Watson Research Center, and the anonymous reviewers of this paper for feedback on an earlier draft of this manuscript. An earlier draft of this manuscript was circulated at the IBM Thomas J. Watson Research Center from January through May 1998.
**Trademark or registered trademark of Sun Microsystems, Inc.
Footnotes
1
µ-law encoding is a form of logarithmic quantization used in telephone-quality coder/decoders (codecs).
2
A transcoder takes as input content in representation and recodes such content to representation ß. An enhancer takes as input content in representation and outputs representations ( + ß).
3
ITU-T Recommendation G.711 is used to compress, expand, or convert digitized PCM data among µ-law, A-law, and linear formats.
4
The notion is used in an object-oriented sense, and is intended to mean that the media flow itself is modeled as having behavior, state, and properties.
5
It is possible that one or more (e.g., m) incoming segments map to one or more (e.g., n) outgoing segments for a given path identifier pid. In this case, the node ni acts as an m n concentrator/multiplexor for content m.
6
This does not imply that the capability management authority is centralized nor that distributors lose their autonomy to such authority. It is best to think of such authority as a recommendation system.
7
The term impedance mismatch is borrowed from engineering circuit analysis [43] to denote the incompatibility between interfaces of two or more "plug-in" components.
Received June 10, 1998; accepted for publication July 15, 1999
|