Skip to main content next up previous
Next: Scalable Architecture Up: Scalable Content-Based Retrieval from Previous: Introduction

Preliminaries

 

Content-based retrieval of image and video databases usually involves comparing a query object (also called target object), with the objects stored in the data repository. In our case, we can operate on both indexed objects and objects extracted at run-time. The search is usually based on a similarity comparison rather than on exact match, and the retrieved results are ranked according to a similarity index, e.g., a metric. When objects are represented as n-dimensional feature vectors tex2html_wrap_inline1256 and tex2html_wrap_inline1258 , where tex2html_wrap_inline1260 and tex2html_wrap_inline1262 , the most commonly used similarity metric is the Euclidean distance between these vectors:

equation106

or, in general, the tex2html_wrap_inline1264 distance metric which is defined as

eqnarray115

The objects of an image or video database can be defined and referred to at different abstraction levels, as described below (shown in Fig. 1):

  1. Raw Pixels:   At the lowest abstraction level, object are simply aggregations of raw pixels from the image. Comparison between objects or regions is done pixel-by-pixel, and commonly used similarity measures include the correlation coefficient and the tex2html_wrap_inline1264 distance. Comparison at the pixel level is very specific, and therefore is only used when a relatively precise match is required.
  2. Feature:   The next higher abstraction level for representing images is at the feature level. An image feature is a distinguishing primitive characteristics or attribute of an image [8]. Some features such as luminance, shape descriptor, and gray scale texture are natural as they correspond to visual appearance of an image. Other features such as amplitude histogram, color histogram, and spatial frequency spectra are artificial as they are usually obtained from specific manipulations of an image. Each image in an image archive can be segmented by using a set of n features, which are grouped into a feature vector, into regions consisting of homogeneous feature vectors. Similarity search in the n-dimensional feature space thus consists of comparing the target feature vector with the feature vectors stored in the database. These feature vectors can be predefined and pre-extracted, or user-defined and pre-extracted, or even user-defined and extracted at query time. Various spatial indexing schemes such as R-tree also exist to facilitate feature space indexing.
  3. Semantic:   This is the highest abstraction level at which a content-based search can be performed. Semantic information from an image is usually extracted from a pre-trained classifier or supplied through human interpretation. For satellite images, this information could include the type of land cover of a specific area such as water, forest, or urban. For medical images, this information could include the type of organ of a specific area such as liver, stomach, or colon. A semantic network can then be constructed and group similar semantic terms into a category. For example, pine trees and maple trees are grouped into trees, rose and sunflower are grouped into flowers, corn and wheat are grouped into crops, etc. The purpose of constructing such a semantic network is to allow the generalization of retrieval at the semantic level.

   figure135
Figure 1: Abstraction levels of an image.

We distinguish between simple (or atomic) objects and composite objects. The definition of a simple object reflects the three levels of abstraction defined above: a simple object is

We also allow additional constraints to be imposed on the attributes of an simple object, such as the size, location, and on the features that are not used in the object definition.

A composite object consists of a set of simple objects and a set of spatial or temporal constraints. For example, suppose that we have defined the simple object tex2html_wrap_inline1272 , tex2html_wrap_inline1274 , and tex2html_wrap_inline1276 , then the composite object tex2html_wrap_inline1278 , shown in Fig. 2, consists of tex2html_wrap_inline1280 and of the three spatial rules:

   figure149
Figure 2: Example of a composite object.

In general, spatial relationships such as left, right, adjacent, north, within d kilometers are pairwise relationship. As a result, a total of tex2html_wrap_inline1294 pairwise relationships can be generated for a composite object consisting of n simple objects for K different types of spatial relationships. Examples of composite objects include bridge, which can be defined as body of water separated by a road; or airports, which can be defined as regions containing parallel or perpendicular intersecting runways.

Within our system, an object can be persistent or transient. Persistent objects will stay in the system as long as the corresponding images exist. In contrast, transient objects only exist in the system for the duration of a user session.


next up previous
Next: Scalable Architecture Up: Scalable Content-Based Retrieval from Previous: Introduction