| Exhaustive Motif Discovery | ||||
Splash can be used to exhaustively analyze a sequence database for all non-overlapping motifs that are statistically significant. This is useful to discover, in order of relative sequence support, all regions of a protein family that have been preserved by evolution and may therefore play a functional or structural role. The approach uses the pattern discovery algorithm in a loop. First pattern discovery is run using the standard discovery options. If a pattern is not found, first the density constraint is reduced according to a user defined formula, then the minimum support is reduced by 5%, until a pattern is found. The most statistically significant pattern among all those found is reported in a result file. The pattern is then masked anywhere it occurs in InputFile and the discovery is repeated. The procedure stops when either a user defined maximum number of motifs is identified or when the minimum support drops below a user defined threshold.
This approach is illustrated by the following block diagram |
0. SPLASH
Syntax 6. Search 7. References |
|||
![]() |
||||
For instance, when running against the set of 274 trypsin sequences defined by PROSITE motif PS00134, this procedure can be used to identify motifs that included each of the three catalytic residues H, S, and D. These are reported in the following table with their support and ZScore: |
||||
![]() |
||||
![]() |
||||