Project
 
 Research Home  >> Smart Networking >> Peer to Peer Networking


Peer to Peer Networking

Addressing Scaling Issues in Peer-to-peer Architecture

Motivation

Recent music file sharing systems like Napster, Gnutella have spurred tremendous interests in peer-to-peer systems. Although the concepts involved in these file sharing systems are not new to large scale distributed systems, highly connected end-nodes, more power at desktop have made several imaginations of the past a reality. In addition to stirred up controversies in intellectual property rights, it has brought quite a few technical challenges that need to be addressed in the light of emerging growth in high speed networking. While Napster, Gnutella architectures have spurned very interesting applications, these systems beg for several improvements to handle several million users in a seamless manner that can provide a secure, scalable, and reliable peer-to-peer system. The query routing in a fully distributed architecture such as Gnutella is very naïve and often lead to inefficient use of bandwidth. While Napster solves this problem by centralized directory service, it is bogged down with well known problems associated with any centralized system, i.e., hot-spot due to bottleneck node and central point of failure. In this study, we intend to address scalability issues associated with a fully distributed system such as Gnutella. Our vision is that an efficient and scalable peer-to-peer system will be the basic building block of content distribution system of the future.

The scope of our work is to carve an architecture with efficient algorithms for routing, overlay formation so that Gnutella like systems scales to millions of users without current problems observed in a fully distributed system. Specifically we are focusing on two basic building blocks, namely, efficient query routing, and overlay formation among the dynamic set of users for efficient usage of bandwidth and timely location of objects. Our approach contrasts the methods in [Freenet, OceanStore] in that they depend upon systematic naming convention on the published documents in the system. In our approach, we simply assume that the objects are represented by a unique identification number. We do not require or assume any object identification number dependent object placement.

First, we examine two distinct architectures [Napster, Gnutella] to identify the various scaling issues associated with existing peer-to-peer models. Briefly, in [Napster] model, there are a handful of powerful servers that manage and serve the search and routing issues. While this model provides fast query response to a handful of clients, it suffers from the classical overload problem due to centralized nature of this architecture. In [Gnutella] based model, each client could choose to serve a query and routing issues. While this model addresses the overload problem of [Napster] architecture, it has several drawbacks that has made this less attractive in the existing implementation. First, the response time for serving a query could be unexpectedly high due to (i) inefficient construction of gnutella connectivity among the clients, and (ii) slower clients might prove to be bottleneck in serving and routing queries. Second, the query routing in [Gnutella] model leads to flooding the network with a large number of redundant query forwarding. We quantify the extent of these scaling problems in our model to motivate the need for an improved architecture.

In addressing the scaling problems in gnutella like architecture, we propose two improvements that we incorporate in Gnutella like model to address both the overload problem in Napster and inefficiency issues in Gnutella architecture.

Efficient overlay construction

Given n numbers of clients that are connected according to the existing bootstrap as in Gnutella, we choose m efficient clients that are members of an overlay that we construct based on latency measurements. In the static version of this problem, we address how to construct this overlay for efficient request routing. We need to determine what the value of m will be to handle requests efficiently and yet not overload the network with redundant messages. Next, we address the dynamic overlay construction that accounts for change in the clients and latency among the client paths. Here the challenge is to minimize the overhead of this overlay algorithm.

Cache-based routing for eliminating message flooding

In Gnutella model, query requests are handled using naïve flooding based request routing. Although there are other approaches proposed in [Freenet, OceanStore], these suffer from similar scaling problems. We propose a cache-based routing algorithm to address this issue. The crux of this algorithm is to cache the location of the documents in a cache while the original holder of a document replies to the query requested from a client. Using this information, we show that it can effectively handle the flooding problem in Gnutella request routing. The challenges associated in this algorithm are to address which node should cache the information pertaining to a query, how to handle object replication. Next, we further refine out cache-based routing with construction of spanning tree that can further reduce the overhead in query routing. Here the challenge is in the construction of spanning tree among a dynamic set of nodes. We also intend to explore if further benefit can be achieved by using some attribute-based aggregate information with regard to object locations as hints for guiding the query.

 
 Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact