| Performance | ||||
| In Fig. 1, we report the
performance of Splash and Pratt as a function of an increasing database
size. The database is produced as follows. First an appropriate sequence
sample set is selected from the Brookhaven PDB database to obtain the
desired size. Then the position of the amino acids are randomized. This
results in a random database of an appropriate length, with the same
amino acid frequency as the corresponding PDB sample. Patterns that
occur in more that 20% of the sequences in the database are reported.
Total size of the databases is 8192 2 i. Databases for
values of i ranging from 0 to 6 are processed. The largest random
database is approximately 512,000 residues. On the left y axis,
we report the time, in seconds, required by Splash and Pratt to process
the random databases in log scale. On the x axis, we report the
database size, also in log scale. On the right y axis, we report
the number of discovered patterns in linear scale. This is shown by the
curve with the diamond symbol. The density constraints are k0=2,
l0=5 . The maximum memory footprint of the program is
12MB. The patterns discovered by Pratt are identical to those discovered
by Splash. However, Splash is increasingly faster than Pratt as the
database size increases. At the smallest database size, 8 KRes, it is
about 6 times faster. At size 128 KRes., it is about 3 orders of
magnitude faster. Also, while patterns reported by Pratt are limited to
a maximum span. 50 characters in this case, Splash would discover any
pattern that satisfies the density and support constraints, no matter
how long. In Fig. 2, we report similar performance measurement against a
histone I database, with 209 proteins, at increasingly higher values of
the support. This is an interesting case because this database is
pattern-rich, generating in excess of 10,000 patterns for k0=2,
l0=5, J0=100.
Fig. 1: Splash and Pratt: Time versus Database Size. Pattern discovery time is reported versus database size.
Fig. 2: Splash vs. Pratt (a) vs. pattern support j (b) vs. the number of discovered pattern. |
0. SPLASH
Syntax 6. Search 7. References |
|||
![]()