Single Model Mode

In this mode SplashSearch can scan one or more sequences against a single model built by running SPLASH on a FASTA format training set containing members of a functional or structural protein family.

If family.fa contains the training set sequences, a model that can be used by SPLASHSearch could be generated as follows:

splash family.fa -E 10 0 0 -w 12 -mt 0 -l 4 -f metaSPLASH

This generates a file family.fa_0.motifs, which contains the model.

A sequence database seqs.fa can then be searched against this model by running

splashsearch -model family.fa_0.motifs 
             -train family.fa 
             -seqs  seqs.fa
             [-html]

The complete syntax is provided in the Search Syntax document.

The results are sent to stdout and can be piped into a file. if -html is used, then a file seqs.fa.html is generated. This file can be examined with a regular html browser and shows matching motifs color coded on the corresponding sequences.  The results are formatted as follows:

seq_label  db_pvalue  seq_pvalue []
...
seq_id Sequence label of the fasta sequence
db_pvalue Probability of equivalent or better match of a random sequence of same length, from a databases of same length to the model.
seq_pvalue Probability of equivalent or better match of a random sequence of same length to the model. Typically values lower than 1E-06 are considered good matches


Example:

Running

splash trypsin.fa -E 10 0 0 -mt 0 -w 12 -l 4 -f metaSPLASH

produces the file trypsin.fa_0.motifs. Then, running

splashsearch -model trypsin.fa_0.motifs 
             -train trypsin.fa 
             -seqs trypsin.fa 
             -o trs.txt -html

produces the results file trs.txt and the HTML file trypsin.fa.html

0. SPLASH
1. Algorithm
2. Performance
3. Pattern Discovery

Syntax
DNA/Protein Seq.
Constraints
Statistical Constr.
Similarity Matrix
Parallel Execution
Output Format
Other

4. Exhaustive Discovery

Syntax

5. Hierarchical Discovery

Syntax

6. Search

Syntax

7. References