Next: Scalable Architecture
Up: Scalable Content-Based Retrieval from
Previous: Introduction
Content-based retrieval of image and video databases usually
involves comparing a query object
(also called target object),
with the objects stored in the data repository.
In our case, we can operate on both
indexed objects and objects extracted at run-time.
The search is usually based on a similarity comparison
rather than on exact match, and the retrieved
results are ranked according to a similarity index,
e.g., a metric.
When objects are represented as n-dimensional feature vectors
and
,
where
and
,
the most commonly used similarity metric
is the Euclidean distance between these vectors:
or, in general, the
distance metric
which is defined as
The objects of an image or video database
can be defined and referred to
at different abstraction levels,
as described below (shown in Fig. 1):
- Raw Pixels:
At the lowest abstraction level,
object are simply aggregations of
raw pixels from the image.
Comparison between objects or regions is done
pixel-by-pixel, and commonly used similarity measures
include the correlation coefficient and the
distance.
Comparison at the pixel level is very specific, and therefore is
only used when a relatively precise match is required. - Feature:
The next higher abstraction level for representing images is
at the feature level.
An image feature is a distinguishing primitive
characteristics or attribute of an image [8].
Some features such as luminance, shape descriptor, and gray scale texture
are natural as they correspond to visual appearance of an image.
Other features such as amplitude histogram, color histogram, and
spatial frequency spectra are artificial as they are usually obtained
from specific manipulations of an image.
Each image in an image archive can be segmented by using
a set of n features, which are grouped into a feature vector,
into regions consisting of homogeneous feature vectors.
Similarity search in the n-dimensional feature space
thus consists of comparing the target
feature vector with the feature vectors
stored in the database.
These feature vectors can be predefined and pre-extracted,
or user-defined and pre-extracted, or even user-defined
and extracted at query time.
Various spatial indexing schemes such as R-tree
also exist to facilitate
feature space indexing.
- Semantic:
This is the highest abstraction
level at which a content-based search can be performed.
Semantic information from an image is usually extracted
from a pre-trained classifier or supplied
through human interpretation.
For satellite images, this information could include the type
of land cover of a specific area
such as water, forest, or urban.
For medical images, this information could include
the type of organ of a specific area
such as liver, stomach, or colon.
A semantic network can then be constructed and group
similar semantic terms into a category.
For example, pine trees and maple trees are grouped into trees,
rose and sunflower are grouped into flowers,
corn and wheat are grouped into crops, etc.
The purpose of constructing such a semantic network
is to allow the generalization of
retrieval at the semantic level.
Figure 1: Abstraction levels of an image.
We distinguish between simple (or atomic)
objects and composite objects.
The definition of a simple object
reflects the three levels of abstraction
defined above:
a simple object is
- a connected region of raw pixels, or
- a connected region where selected features
are homogeneous
(e.g., texture feature), or
- a connected region with homogeneous semantics
(e.g. forest, urban, water).
We also allow additional constraints
to be imposed on the attributes
of an simple object, such as the size, location,
and on the features that
are not used in the object definition.
A composite object consists of
a set of simple objects and a set of spatial
or temporal constraints.
For example, suppose that we have defined
the simple object
,
, and
,
then the composite object
,
shown in Fig. 2,
consists of
and of the
three spatial rules:
-
is to the north and east of
, -
is to the north and west of
, -
is to the west of
.
Figure 2: Example of a composite object.
In general, spatial relationships such as left,
right, adjacent, north, within d kilometers
are pairwise relationship.
As a result, a total of
pairwise relationships
can be generated for a composite object consisting of n
simple objects for K different types of spatial relationships.
Examples of composite objects include bridge, which
can be defined as body of water separated by a road;
or airports,
which can be defined as regions containing parallel or
perpendicular intersecting runways.
Within our system, an object can be persistent or transient.
Persistent objects will stay in the system as long as
the corresponding images exist.
In contrast, transient objects only exist in the system
for the duration of a user session.
Next: Scalable Architecture
Up: Scalable Content-Based Retrieval from
Previous: Introduction