human computer interaction
overview how it works future applications for researchers

Hunting for Patterns with Splash:

SPLASH is a new high performance deterministic algorithm for pattern discovery. It works equally well with discrete signals and with continuous ones. It can deal with very general patterns that are defined through arbitrary homology metrics. This means that SPLASH is not limited to the detection of identity in signals but can as easily detect similarity.

Applications

The application of this algorithm range from the analysis of weak motifs in multiple DNA or protein sequences, using mutation probability matrices, to the analysis of gene expression data, to the prediction of protein 3D structural motifs and folding. Some advanced applications of this technology are being explored in collaborations with scientists at the Whitehead Institute and at Mayo Clinic.

Performance

Splash is currently the highest performance algorithm of its kind. For instance, it can process the Histone 1 database, with more than 200 sequences, to find about 25,000 motifs occurring 100 times or more, with at least 4 matching residue in any window of 12, in about 14 seconds on a 266MHz Pentium II desktop.
It is also embarrassingly parallel in nature and it can fully exploit the the IBM RS/6000 SP, scaleable parallel architecture. Coupled with a statistical framework developed by IBM scientists, SPLASH can determine the statistical relevance of discovered patterns and use combinations of discovered patterns to classify new data such as orphan sequences or gene expression data.

Find out how it works