|
Computational Biology and Medical Informatics
|
|
|
Computer
Science > Computational Biology and Medical Informatics > Computer Science Brochure
|
|
| Computer Science Brochure | |
|
Biology provides some of the most important, as well as the most complex, scientific challenges of our times. These problems include understanding the human genome, discovering the structure and functions of the proteins that the genes encode, and using this information efficiently for drug design. Most of these problems are extremely intensive from a computational perspective. Recently, IBM announced a large-scale research initiative to build a petaflop supercomputer, called Blue Gene , with the aim of tackling some of today's most computationally intensive biological problems. Raw computing power, however, must be complemented by smart algorithms. The IBM Computational Biology Center (CBC) has assembled a cross-disciplinary team of researchers. By leveraging their expertise, the CBC is now at the forefront of the international research activity in computational biology. Several of these activities are being pursued jointly with the Deep Computing Institute . Understanding the structure and functionality of proteins is among the most fundamental questions in biology today. Researchers in the CBC are tackling this problem using several complementary approaches. One approach to analyzing protein structure and function is by ab initio molecular dynamics simulations. This method, which has been successfully employed in the investigation of molecular systems of smaller size, is now being fine-tuned to handle the larger molecules typical of biological applications. Another key problem in protein structure analysis is understanding how proteins fold. The application of molecular dynamics algorithms in this area, however, is beyond the capacity of even today's largest supercomputers, except for the simplest of proteins. Our goal is to simulate the folding mechanism and to understand how the amino acid sequence uniquely determines a protein's three-dimensional structure. Over the next few years, we will develop and test molecular dynamics simulation methodologies on a wide range of size and complexity scales. It is anticipated that this research will provide a new window into the size and time scale of molecular simulations in biophysics. While many processes must be simulated to be fully understood, others lend themselves well to analytic exploration. For instance, many biological mechanisms manifest themselves through a rich variety of genomic patterns. Protein motifs, TATA boxes, and gene expression clusters are but a few examples. We have developed a variety of computational tools and models to discover and classify these patterns. As a counterpart to the molecular dynamics approach, we are also investigating methods of structure prediction based on the discovery of repeated patterns in the amino acids sequences. Pattern discovery tools are first used to identify all repeating patterns in families of proteins that are statistically and biologically significant. These one-dimensional patterns are then correlated with their corresponding three-dimensional structure. By using these as signatures on target sequences, structure prediction can be attempted. Pattern analysis algorithms are also being used to efficiently solve some key problems in biological sequences, including the problem of correctly aligning several sequences which are similar, and the problem of identifying similarities between a query sequence and a database of proteins. Recently, we have developed and implemented sophisticated algorithms for solving both problems. We have compiled some comprehensive and complete patterns which cover the sequence space of currently known natural proteins. In the new area of gene expression analysis, novel computational and statistical techniques have also been developed. These have been shown to accurately classify a variety of complex phenotypes in human cancer. These approaches are expected to lead to a variety of innovative diagnostic and therapeutic tools. Another key use of the knowledge of protein structure is in the area of rational drug design and drug discovery. Current approaches do not fully address the three-dimensional nature of the molecules. We are building on techniques developed for scalable similarity searching of small flexible molecules. The methodology has the promise of delivering a scalable, highly parallel approach to molecular search and retrieval. In addition, we are also developing novel solutions to provide unified access to the heterogeneous database with complex information from diverse origins.
Please contact Paridhi Verma to obtain copies of the Computer Science Brochure |