CASE ONE: Finding a cure for a disease requires years of study, enormous investment, and quite a bit of luck. Thousands of potential compounds are identified and experimented with in the laboratory, the most promising are tested on animals, and the most sucessful are evaluated on humans. A complete understanding of our biology -- in the way that engineers understand machines-- could dramatically shorten and simplify this process. This understanding of exactly how a gene or protein operates, what other compounds it is related to, and where, on its complex three-dimensional structure, chemical binding actually takes place, would also allow researchers to build drugs for specific problems.
CASE TWO: To build a better food crop can require years: selecting plants with desirable traits, breeding them, watching for new characteristics in the next generation, selecting and breeding again, testing the results... and still not getting what you want. Understanding what is occuring at the genetic level -- which genes make a plant resistant to which diseases, which result in greater crop production -- could speed the process up quite a bit.
In both cases a common problem of the information age
crops up: an overwhelming amount of available data. By
some counts, the information being gathered in public
databases on genes and proteins is doubling every 12 to
14 months. That's a faster pace than even the heralded
Moore's Law for microprocessor advancement. It is not
uncommon for life science companies to accumulate data at
a rate of several hundred gigabytes to a terabyte per week.
To make matters worse, the really interesting genes are the
ones with the most subtle variations from, or relationships
to, known ones, i.e., the most difficult ones to locate and
classify. And some of these genes, known as "orphan"
genes (because they have no currently known gene
"relatives") become far more common as the complexity of
the creature being studied increases. In humans, as much
as 90% of the genome may be comprised of "orphan"
genes.
Supercomputing speed alone is no match for this immense
store of genetic data. Only Deep Computing, with its
advanced pattern matching and discovery methods, can
succeed.
MONSANTO HAS A rather bold agenda for introducing new
life-science products. A visit to their Web site reveals a
long list of new drugs and genetically engineered plants
coming down the pipeline. Some sound like standard
improvements: plants that will yield more; drugs with fewer
side effects. But many sound like the stuff of science
fiction: Life and potatoes capable of protecting
themselves from insects and viral diseases; corn
impervious to weed-killers, allowing farmers to spray more
effective herbicides with less frequency; and a strain of
canola (rape) plant that has such high levels of
beta-carotene it can provide a daily supply of vitamin A in
a single teaspoon of oil -- a natural vitamin supplement,
spawning a field called "nutraceuticals."
In order to achieve these aggressive goals, and speed the
process of bringing these developments to market,
Monsanto enlisted the help of IBM, entering a joint
research agreement to identify and map the genetic
structure of plant groups and human diseases. Monsanto
supplies much of the biological domain knowledge, and
IBM's Deep Computing solution provides sophisticated
algorithms to speed the analysis of the terabytes of
available genetic information.
Using two distinct approaches -- pattern discovery and
pattern matching -- the IBM team has been able to help
speed the process and also make discoveries that would
not be possible using existing tools. For instance, an expert
biologist attempting the particularly difficult task of
associating a novel protein with only very subtle relations
to some known protein would typically take several hours
to make such an association. (Such attempts are frequently
documented in scientific "competitions" in which experts
"race" each other.) Amazingly, Teiresias, an algorithm
developed by IBM researchers, can make such an
identification in about two minutes!
Teiresias is a "pattern discovery" algorithm: it will search a
store of data and find all patterns occurring two or more
times. It's not necessary to know what to look for -- in fact,
the beauty of the process is that it can return previously
unknown patterns in an extremely short time. And although
it is being widely used in the field known as bioinformatics
(a new discipline constituting the crossroads between
computer science and biology), it can just as easily be
applied to text (the complete works of Charles Dickens),
financial data (closing prices of stocks over a given
historical period) -- just about any store of data where
patterns may exist.
On the other hand, "pattern matching" algorithms attempt
to find something identical or almost identical to a known
item. This algorithmic approach is a perfect partner to
pattern discovery in genomic research. Once interesting
patterns (genes or proteins, for example) have been
discovered in a data set, then these new patterns can be
compared to other databases of known types to find the
nearest match. From these comparisons, inferences can be
made about the function and purpose of the newly
discovered pattern.
Thanks to Deep Computing, Monsanto expects to reduce
development cycles substantially, from the 7 to 12 years it
currently take to develop new plant varieties to 4 to 6
years. And it hopes soon to make rational drug design a
reality: finding how a drug needs to interact with various
body compounds, and then being able to fabricate the
appropriate molecules for the task.
FUTURE APPLICATIONS: One of the grand challenges of
biology is to extend our knowledge from what comprises a
protein in a two- dimensional sense (which molecules of
what in what order) to a far greater mystery: why proteins
fold into the complex three- dimensional configurations
they do, something not at all obvious at the more generally
understood chemical formula level. Many proteins, after
being unfolded in a test-tube -- even synthetic proteins
with no other biology present -- refold themselves in about
a second. Although it's pure chemical physics, scientists
have so far failed to emulate this process by direct means.
Deep Computing is attacking this challenge on two fronts.
First, using pattern discovery and then matching,
researchers are finding sequences related to known
three-dimensional structures. They can then deduce
"reasons" for the structure and make inferences about its
function.
But a far more ambitious second approach is that being
carried out by a handful of zealots: trying to understand
protein folding from first principles -- in essence,
uncovering the subtle laws that will allow scientists to
predict how a given protein will fold and what its function
will be. By first calculating the force between atoms in the
chemical matrix of the molecule, then calculating various
possible three-dimensional structures for a given molecule,
the researchers attempt to identify the one structure with
the least "free energy," the most likely form the protein will
take. Subsequent simulations and testing against known
data help confirm hypothetical models.
What's at stake is of far-reaching importance: an approach
that could accurately predict protein folding would have
application beyond rational drug design and
agrochemicals. Nanotechnologists, for example, and others
in the computer industry could put such understanding of
matter on the molecular scale to use in designing new
polymers and other materials.
As in all the Deep Computing applications, pattern
matching and discovery is not a totally independent type
of application. Often, it can form the initial part of a larger
solution that may also employ other parts of Deep
Computing. Patterns may be located and inferences drawn,
then simulations run to test those inferences. In medical
science, for instance, Deep Computing offers the rather
heady promise of one day modeling both the human
organism and the various diseases that afflict it, then
testing via simulation various treatments or drugs.
In time, such an approach could also be individualized,
with treatments for killers such as cancer being designed
for the genetic makeup of the individual being treated.
Currently, this approach -- treating a particular person's
cancer as a unique disease -- has shown promise, but it is
too time-consuming, difficult, and costly to be viable. Deep
Computing, by helping to furnish the fundamental
knowledge of how things work at the protein and genetic
level, and by enabling simulation of metabolic functions
and cells gone awry, may one day make such treatment
routine.