Skip to main content next up previous
Next: Preliminaries Up: Scalable Content-Based Retrieval from Previous: Scalable Content-Based Retrieval from

Introduction

There have been tremendous technological advances in the areas of processors, mass and tertiary storage devices, gigabit networks, and I/O devices over the past several years. These advances have made it feasible to access digital libraries that contain large quantities of high-resolution videos, images, audio, and text by a much broader community. As an example, the instruments on the first two Earth Observing System (EOS) platforms, to be launched in 1998 and 2000, will generate data at a rate of 281 GB/day. The raw data generated by various EOS platforms will be processed and stored in distributed active archive centers (DAACs) located throughout the United States [16]. Other examples are in the areas of culture preservation for museums, seismic and medical imaging, and news video clips in which terabytes of data are continuously acquired and stored.

In principle, research and education communities, as well as the general public, can benefit enormously from the availability of this data. Unfortunately, the infrastructures used in the current database and data warehousing do not support the powerful search, storage, retrieval and transmission techniques required to fully materialize such benefits. Mechanisms are required to assist automatic indexing and retrieval based on the content of the data. Furthermore, due to the sheer amount of the data, the content-based search mechanisms are required to scale well with the size of the data and the number of the users.

In general, the content of an image and video objects can be specified at three different levels of abstraction, namely, pixel level, feature level, and semantic level. Recently, several image and video database systems allowing content-based queries have been developed. These systems allows images or video indexing through the use of low-level image features such as shape, color histogram, and texture. Prominent examples for photographic images include the MIT PhotoBook [1], IBM QBIC [23], VisualSeek from the Columbia University [2], and the Multimedia Datablade from Infomix/Mirage [3]. These techniques have also been applied to medical images [4, 21], art work[20] and video clips [15, 5, 6, 7, 17]. Despite the tremendous progress in this area, there still exist a number of outstanding issues: (1) A replicable data model which is applicable to both photographic images and domain specific images such as satellite images, medical images, stock sequence, and seismic data. (2) An extensible framework which allows content-based queries using pre-extracted low-level features as well as user-defined features and semantics; (3) A storage and retrieval system that is scalable with both the amount of data and the number of users; (4) A query interface which is both intuitive and expressive to specify constraints for multimedia object. Note that the most challenging task is to establish a framework which is both extensible and scalable. Existing systems can be scalable if the retrieval is based only on the pre-extracted features. However, these approaches are not extensible to be useful across different image types and domains.

In this paper, we describe the architecture and implementation of an extensible and scalable framework for storage and retrieval of image and video data from a large archive. The extensibility of the framework is achieved by allowing the search constraints to be specified at one or more abstraction levels. Apparently, most commonly used features, feature indices, and object semantics are pre-extracted and stored in a database to maximize the retrieval efficiency. Meanwhile, the user can also define new features and new object types at query time, thus achieving a flexible and extensible query system. The scalability of the proposed architecture is achieved by adopting a progressive framework, in which a hierarchical scheme is used to decorelates and reorganize the information contained in the images at all of the abstraction levels. Conseuqently, the search operators can be applied on a much smaller portions of the data and progressively refine the the search results. This technique achieves a significant speedup as compared to more conventional implementations. The speedup factor for template matching (the search operator at the raw pixel level) and classification (the search operator at the semantic level) is more than 20 times. A 400% to 800% speedup has also been achieved for texture extraction and matching (the search operator at the feature level).

The remainder of this paper is organized as follows: Section 2 contains preliminaries on the existing types of multimedia databases and on the abstraction levels used in content-based search. The current system architecture is discussed in Section 3. Section 4 describes the strategy for combining searched results from multiple abstraction levels. The query frontend is specified in Section 5. Section 6 describes the progressive framework that is used as the foundation for data representation, image navigation, and content-based indexing. Section 7 describes data representation and outlines the ways in which we use data representation to make the processing more efficient. Sections 8 and  9 are devoted to the navigation and content-search operators. A query example is given in Section 10. Finally, in Section 11 we conclude the paper.


next up previous
Next: Preliminaries Up: Scalable Content-Based Retrieval from Previous: Scalable Content-Based Retrieval from