IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Deep Computing Institute


Description of Deep Computing

CASE ONE: Finding a cure for a disease requires years of study, enormous investment, and quite a bit of luck. Thousands of potential compounds are identified and experimented with in the laboratory, the most promising are tested on animals, and the most sucessful are evaluated on humans. A complete understanding of our biology -- in the way that engineers understand machines-- could dramatically shorten and simplify this process. This understanding of exactly how a gene or protein operates, what other compounds it is related to, and where, on its complex three-dimensional structure, chemical binding actually takes place, would also allow researchers to build drugs for specific problems.

CASE TWO: To build a better food crop can require years: selecting plants with desirable traits, breeding them, watching for new characteristics in the next generation, selecting and breeding again, testing the results... and still not getting what you want. Understanding what is occuring at the genetic level -- which genes make a plant resistant to which diseases, which result in greater crop production -- could speed the process up quite a bit.

In both cases a common problem of the information age crops up: an overwhelming amount of available data. By some counts, the information being gathered in public databases on genes and proteins is doubling every 12 to 14 months. That's a faster pace than even the heralded Moore's Law for microprocessor advancement. It is not uncommon for life science companies to accumulate data at a rate of several hundred gigabytes to a terabyte per week.

To make matters worse, the really interesting genes are the ones with the most subtle variations from, or relationships to, known ones, i.e., the most difficult ones to locate and classify. And some of these genes, known as "orphan" genes (because they have no currently known gene "relatives") become far more common as the complexity of the creature being studied increases. In humans, as much as 90% of the genome may be comprised of "orphan" genes.

Supercomputing speed alone is no match for this immense store of genetic data. Only Deep Computing, with its advanced pattern matching and discovery methods, can succeed.

Pattern Matching and Discovery

MONSANTO HAS A rather bold agenda for introducing new life-science products. A visit to their Web site reveals a long list of new drugs and genetically engineered plants coming down the pipeline. Some sound like standard improvements: plants that will yield more; drugs with fewer side effects. But many sound like the stuff of science fiction: Life and potatoes capable of protecting themselves from insects and viral diseases; corn impervious to weed-killers, allowing farmers to spray more effective herbicides with less frequency; and a strain of canola (rape) plant that has such high levels of beta-carotene it can provide a daily supply of vitamin A in a single teaspoon of oil -- a natural vitamin supplement, spawning a field called "nutraceuticals."

In order to achieve these aggressive goals, and speed the process of bringing these developments to market, Monsanto enlisted the help of IBM, entering a joint research agreement to identify and map the genetic structure of plant groups and human diseases. Monsanto supplies much of the biological domain knowledge, and IBM's Deep Computing solution provides sophisticated algorithms to speed the analysis of the terabytes of available genetic information.

Using two distinct approaches -- pattern discovery and pattern matching -- the IBM team has been able to help speed the process and also make discoveries that would not be possible using existing tools. For instance, an expert biologist attempting the particularly difficult task of associating a novel protein with only very subtle relations to some known protein would typically take several hours to make such an association. (Such attempts are frequently documented in scientific "competitions" in which experts "race" each other.) Amazingly, Teiresias, an algorithm developed by IBM researchers, can make such an identification in about two minutes!

Teiresias is a "pattern discovery" algorithm: it will search a store of data and find all patterns occurring two or more times. It's not necessary to know what to look for -- in fact, the beauty of the process is that it can return previously unknown patterns in an extremely short time. And although it is being widely used in the field known as bioinformatics (a new discipline constituting the crossroads between computer science and biology), it can just as easily be applied to text (the complete works of Charles Dickens), financial data (closing prices of stocks over a given historical period) -- just about any store of data where patterns may exist.

On the other hand, "pattern matching" algorithms attempt to find something identical or almost identical to a known item. This algorithmic approach is a perfect partner to pattern discovery in genomic research. Once interesting patterns (genes or proteins, for example) have been discovered in a data set, then these new patterns can be compared to other databases of known types to find the nearest match. From these comparisons, inferences can be made about the function and purpose of the newly discovered pattern.

Thanks to Deep Computing, Monsanto expects to reduce development cycles substantially, from the 7 to 12 years it currently take to develop new plant varieties to 4 to 6 years. And it hopes soon to make rational drug design a reality: finding how a drug needs to interact with various body compounds, and then being able to fabricate the appropriate molecules for the task.

FUTURE APPLICATIONS: One of the grand challenges of biology is to extend our knowledge from what comprises a protein in a two- dimensional sense (which molecules of what in what order) to a far greater mystery: why proteins fold into the complex three- dimensional configurations they do, something not at all obvious at the more generally understood chemical formula level. Many proteins, after being unfolded in a test-tube -- even synthetic proteins with no other biology present -- refold themselves in about a second. Although it's pure chemical physics, scientists have so far failed to emulate this process by direct means. Deep Computing is attacking this challenge on two fronts. First, using pattern discovery and then matching, researchers are finding sequences related to known three-dimensional structures. They can then deduce "reasons" for the structure and make inferences about its function.

But a far more ambitious second approach is that being carried out by a handful of zealots: trying to understand protein folding from first principles -- in essence, uncovering the subtle laws that will allow scientists to predict how a given protein will fold and what its function will be. By first calculating the force between atoms in the chemical matrix of the molecule, then calculating various possible three-dimensional structures for a given molecule, the researchers attempt to identify the one structure with the least "free energy," the most likely form the protein will take. Subsequent simulations and testing against known data help confirm hypothetical models.

What's at stake is of far-reaching importance: an approach that could accurately predict protein folding would have application beyond rational drug design and agrochemicals. Nanotechnologists, for example, and others in the computer industry could put such understanding of matter on the molecular scale to use in designing new polymers and other materials.

As in all the Deep Computing applications, pattern matching and discovery is not a totally independent type of application. Often, it can form the initial part of a larger solution that may also employ other parts of Deep Computing. Patterns may be located and inferences drawn, then simulations run to test those inferences. In medical science, for instance, Deep Computing offers the rather heady promise of one day modeling both the human organism and the various diseases that afflict it, then testing via simulation various treatments or drugs.

In time, such an approach could also be individualized, with treatments for killers such as cancer being designed for the genetic makeup of the individual being treated. Currently, this approach -- treating a particular person's cancer as a unique disease -- has shown promise, but it is too time-consuming, difficult, and costly to be viable. Deep Computing, by helping to furnish the fundamental knowledge of how things work at the protein and genetic level, and by enabling simulation of metabolic functions and cells gone awry, may one day make such treatment routine.

Next Next



    About IBMPrivacyContact