Skip to main content

iis header page


Executive Summary

A primary feature of public health analyses is that they typically affect large numbers of people over extended geographic regions. While detailed methodologies have been developed within epidemiology to characterize non-spatial aspects of risk assessment, the use of spatial data for risk evaluation and intervention have lagged. Preliminary studies indicate that spatial information, such as remotely sensed data adds significantly to our ability to understand these public health problems. However, these problems usually reflect temporally and spatially dynamic processes that are not sufficiently captured by a single, static data set and simple queries. Comprehensive investigation thus far has often been conducted in focused areas due to severe resource constraint. The models established through this approach are usually difficult to generalize to larger scales. Furthermore, the lack of full coverage of some of the data prevents accurate prediction of future threats from the model.

The recent rapid growth of large scale remotely sensed images and data from missions such as Mission to the Planet Earth has presented an exciting and unprecedented array of opportunities for environmental epidemiology. The availability of these new data enables the comprehensive coverage of the entire world spatially, spectrally, and temporally. Furthermore, the rapid advances in content-based retrieval, data mining, and knowledge discovery have made it possible to discover both simple and complex trends, association rules, and complex patterns from a large amount of data efficiently.

The goal of this work is to perform the basic research required for applying content-based retrieval techniques on a set of federated image and data archives in order to generate and validate environmental epidimiology models. The key problems to be explored in this research project are:

To illustrate the goal of this project, consider the following scenarios of Hantavirus. This public health problem is not easily resolved at present, however, with the success of the proposed research, a powerful set of tools and methods is at the disposal of researchers to develop new solutions.

The research proposed in this project solves two key problems:

In the first two of the following public health scenarios, the role of content-based retrieval is critical. In the third scenario, content-based interaction within a federated system is essential. Concurrent access to large databases by large numbers of users is required by scenarios 4 and 5.

Recently, techniques for implementing content-based querying of images have been explored. In particular, the IBM QBIC project and the Virage system (one of the Informix datablades) allow the retrieval of images based on the texture, color histogram, and shape. The Alexandria project from UCSB allows the retrieval of images based on local texture features. The SaFe/VisualSeek project from the Columbia University and the Blobworld/Bodyplan project from UC Berkeley allow the retrieval of image objects based on their spatial configurations.

However, these systems are insuffient for developing and validating the proposed sophisticated models. For example, we need effective methods for defining intricate composite objects (as illustrated in scenario 1 and 2) and for efficiently querying the composite objects. We flexibility in specifying the simple object constructs of composite objects in terms of user defined features, pixels, semantics, and user annotations. We need extensibility of the rule set which defines relationships between the objects along spatial, temporal, and spectral dimensions.

The approach we propose is based on structural decomposition of the search target. Each search target (potential locations for disease outbreak) is decomposed into a list of entities with possibly spatial, temporal and spectral constraints. Each entity can be described by specific pixel patterns, features (texture, spectral histogram, NDVI, or shape), semantics (such as urban, grassland), and time series patterns. The structural relationships among entities and the weighting of each entity is related to the statistical model that can be used for predicting future disease outbreak.

We intend to build a prototype of the results of the research that is based on digital images from EOSDIS and other missions to the planet earth. We porpose to make the modeling system, search engine, raw data, and modeling output available via the Internet to other ESIP partners and research communities. The testbed will examine a data collection that is of significant size, which will be used to estimate the quality of the techniques when applied to databases much larger than the testbed database.


| ESIP-II Project Home | Executive Summary| Partners| White Paper|

| Project home| Technical agenda| Publications| Contact|


[ Research home page | IBM home page | Order | Search | Contact IBM | Legal ]