Multi Model Mode

In this mode SPLASHSearch can scan one or more sequences against a several model built by running SPLASH on a FASTA format training set containing members of a functional or structural protein family. These models must be precompiled using the "Compile Model" mode. A master model file must be prepared with the following syntax:

model1_0.motifs.bin
model2_0.motifs.bin
...
modeln_0.motifs.bin

A sequence database seqs.fa can then be scanned against all models by running

splashsearch 
   -multimodel models_file 
   -seqs seqs.fa 
   [-html]

The complete syntax is provided in the Search Syntax document.

The results are set to stdout and can be piped into a file. if -html is used, then a file seqs.fa.html is generated. This file can be examined with a regular html browser and shows matching motifs color coded on the corresponding sequences. The format of the results is as follows:

seq_label  db_pvalue  seq_pvalue [modelId]
...

seq_label Sequence label of the fasta sequence
db_pvalue Probability of equivalent or better match of a random sequence of same length, from a databases of same length to the model.
seq_pvalue Probability of equivalent or better match of a random sequence of same length to the model. Typically values lower than 1E-06 are considered good matches
modelId this corresponds to the prefix of the model file, up to the first "_" (underscore). It can be used to identify the model match.

Example:

running

splash trypsin.fa -E 10 0 0 -mt 0 -w 12 -l 4 -f metaSPLASH

produces the file trypsin.fa_0.motifs. This model can then be compiled into a binary file, trypsin.fa_0.motifs.bin, by running

splashsearch -build 
             -model trypsin.fa_0.motifs 
             -train trypsin.fa

Then using a master file models.txt and running

splashsearch -multimodel models.txt
             -seqs trypsin.fa 
             -o trsm.txt -html

produces the results file trsm.txt and the HTML file trypsin.fa.html

0. SPLASH
1. Algorithm
2. Performance
3. Pattern Discovery

Syntax
DNA/Protein Seq.
Constraints
Statistical Constr.
Similarity Matrix
Parallel Execution
Output Format
Other

4. Exhaustive Discovery

Syntax

5. Hierarchical Discovery

Syntax

6. Search

Syntax

7. References