TRL
TOP PAGETokyo Research LaboratoryEmploymentProjectsRelated InformationIBM Research
Japanese page is here.

Video enrichment/still images



In this research, regions of an image that have similar color and texture are extracted automatically. It is then possible to encode the image based on the extracted regions, and each region can be manipulated as a separated object.

Region segmentation

The region segmentation method was realized with effective parallelism, automated in a way that it is independent of the splitting threshold for any image, avoiding oversplitting inside a texture region, and using Gaussian Markov Random Fields (GMRF) to represent texture properties. The approach taken in this method involves region merging, and the use of GMRF model for the initial conditions and contour sub-regions merging was possible by performing:

  • pseudo KL transform (transformation for less correlated color planes)
  • multiple GMRF models (4 neighbors and 8 neighbors)
  • hypothesize-and-verify scheme for merging small regions
Therefore, it was possible to process GMRF parameters linearly, merging regions larger than 6 pixels by testing their likelihood.

reg1 reg2 reg3
Quad-tree splitting Region merging Result from segmentation

Image compression using hierarchical vector quantization

Images are decomposed into minimal blocks of two by two pixels (1st order block), and each 1st order block is approximated by one of the following patterns:
block11 block12 block13 block13

Four N-th order blocks (2x2) form (N+1)-th order block. The ID of the (N+1)-th order block is generated from the four IDs of N-th order blocks and is written into a codebook. Data compression is achieved by putting such blocks together hierarchically.

block2

Image representation based on regions

An hierarchical structure was given to the image, with "region layers" as the unit to manipulate each region. A region layer consists of an overlay order, shape data, approximated region layer, and some data units of residual region layer.
orig160 fore160 back160
Original image Tree region on the top of stack Tree region at the bottom of stack
Scalable image representation

The tradeoff between amount of data/image quality of region, with levels going from approximated region to lossless image, can be chosen by deciding the order of residual region layer. The approximation methods are color averaging, triangular patches, orthogonal transformations, fractal representations, etc.

htree
ki0 ki1 ki2 ki3 ki6
Amount of data 1% 11% 28% 46% 100%
PSNR 24.3db 26.6db 34.2db 37.5db -
Approximation Lossless image

The image in the left, below, is an original image. The image in the right was constructed by selecting the image quality of each region to a slightly degraded level, so that human eyes can not detect the degradation. The image in the right, containing extra data such as shape data and overlay order, has 35% more data than the original image.
orig part

By using this technology, even without captioning an image, it is possible to search for objects in an image by specifying shape, color and texture. Also, regions can be manipulated separately, so they can be easily moved or combined. Another advantage is that an image can be displayed in several levels of quality, from a raw image with small amount of data, to a fully detailed original image, in a single image. Examples of applications are image searching in a digital library, fast data transfer in channels with limited bandwidth (like the Internet), and support for creators to put added value into their images.

Research home IBM home Order Privacy Legal Contact IBM
Last modified 30 September 1999