Project
 
 Research Home  >> Smart Networking >> Smart Query


SMART QUERY

With the growth of e-commerce, personalized portals, shopping and games on the Web, a large fraction of the content on the Web is becoming more dynamic (personalized pages, form-based query, changing content, advertisements). Such content increases the processing load on the back-end server, which coupled with network congestion and latency results in unacceptable response times seen by the end-user. In order to reduce client latency and the origin server's load such dynamic content needs to be cached and serviced at intermediate edge server caches. Current proxy caches, however, only cache static content (text, images etc.) as dynamically generated content is explicitly marked as uncacheable. A large fraction of the dynamic content consists of form-based (simple attribute values or keyword) queries at e-commerce sites and search engines.

In such sites a typical set of steps involved are:
(i) The user fills out a form to input their selections which are shipped to the web server in a HTTP POST request or embedded in the URL,
(ii) the web server or corresponding application server parses the request and makes an SQL request to a back-end database to get the result,
(iii) the query results are processed, formatted included with other data and shipped back in an HTTP response to the client.

The query caching project aims at responding to a query from an intermediate edge server based on the cached responses of previous related queries that were serviced by the origin server. The queries can be broadly classified as keyword searches, list queries, range queries, and transactional queries. Among them only the queries that result in "reads" are cacheable. Thus transactional queries (that could result in updates/writes) are not considered. Query caching is clearly viable for applications such as search engines, catalog browsing at shopping sites, comparison shopping sites, range queries at travel, real estate sites etc.

As with any caching scheme the success of query caching depends on 3 factors:
(i) existence of a "hot" sets of queries,
(ii) locality among queries, and
(iii) consistency maintenance with the back-end.

It is conjectured that a small set of queries are extremely popular and a large fraction of queries are a refinement of an earlier more general query. Secondly, updates are infrequent making server-driven consistency techniques a viable option. The key feature of our query cache design is that it is completely dynamic and fully transparent to the underlying The solution is completely portable as there is no change to the underlying application, database, web server or client.

We are investigating a dynamic and application transparent design for query caching. Our aim is to provide auotmatic methods for cache selection, replacement and consistency. We are studying algorithms for using the cached data to satisfy exact as well as partial query results. At a very high level, a query cache consists of a mapping between query results and responses. We use a cache replacment policy to flush out unused query responses and manage the cache space. One important aspect of the project is to compare the different caching alternatives using realistic workloads and understand their performance implications.

 
 Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact