Exhaustive Motif Analysis Syntax
Syntax is

splash InputFile -E maxMotifs sparseness minSupport [options] 

maxMotifs

maximum number of desired non-overlapping motifs 

sparseness 0 Only k0 w0 is used as a density constraints

1 Both (k0, w0) and (k0, 2 w0) are used

2 (k0, w0), (k0, 2 w0), and (k0, 4 w0) are used

minSupport  lowest motif support as a percentage of the total number of sequences in InputFile

Suggested Syntax

splash file -E 10 0 0 -l 5 -f HTML. This will discover up to 10 motifs with at least 5 tokens each.

Example:

running splash trypsin.fa -E 10 0 0 -l 5 -f HTML, produces the following html file trypsin.fa_0.motifs_0.html

Options for Exhaustive Pattern Discovery:

Option -f [HTML][REGEX][PSSM][HMM][MAST][metaMEME]
[metaSPLASH]
Default -f REGEX
Description
REGEX A single output file: InputFile_minSupport.motifs 
is created. Motifs are listed as regular expression with the same syntax as of regular pattern discovery
HTML This option generates a file InputFile_minSupport.motifs.html whith a color annotated representation of the discovered motifs and of where they occur in the training set sequences.
PSSM A file:
InputFile_minSupport.motifs_id.pssm
is created for each motif id. The file contains a PSSM with 

1) a list of amino acids
2) a row for each amino acid position in th PSSM  and an array of raw frequencies of amino acids at that position
3) a list of amino acid raw frequencies in InputFile. 

HMM A file:
InputFile_minSupport.motifs_id.hmm
is created for each motif id. The file contains a set of prealigned sequences in FASTA format. These files can be used directly as an input for HMMBuild in the HMMer package
MAST A file:
InputFile_minSupport.motifs
is created which contains a series of PSSM models. This file is compliant with the MAST syntax and can be used by MAST to screen sequence databases
metaMEME A file:
InputFile_minSupport.motifs
is created which contains a series of PSSM models. This file is compliant with the metaMEME syntax and can be used by metaMEME to screen sequence databases
metaSPLASH A file:
InputFile_minSupport.motifs
is created which contains a series of PSSM models. This file is compliant with the splashsearch syntax and can be used by splashsearch to screen sequence databases

Option -% min_percent_support 
Default -% 100
Description Allows to save time by reducing the initial minimum support

Option -w window
Default -w 8
Description In combination with -k, allows to search for patterns that are more ore less dense

Option -k min_tokens_in_window
Default -k 3
Description In combination with -w, allows to search for patterns that are more ore less dense

 
Option -Z minZScore
Default -Z 1000
Description  

 
Option -z
Default set
Description  

 
Option -n- min_patterns
Default -n- 10
Description Support and density are decreased until at least min_pattern are found that satisfy all the other constraints. It is suggested to use at least the default -n- 10.

 
Option -FP maxFP
Default -FP inf
Description Reported pattern must have an expected number of false positive matches in a database with the same size and amino acid frequency of SWISS-PROT Rel. 36. The latter is computed by using the product of the probability of matching the individual amino acids or amino acid classes contained in the pattern.

If the patterns are going to be used as regular expressions rather than to build a PSSM or HMM, it is suggested to use a value approximately equal to 1/10 of the sequences in the input file.

 
Option -mt similarity_threshold
Default -mt 1.0
Description This option sets the threshold used to define similarity classes. It is suggested to use -mt 0 for very sensitive searches

 
Option -b number_of_indentites
Default -b 2
Description This option is used to limit the minimum number of identical matches in the regular expression. It is suggested to use -b 1 for very sensitive searches. This may result in significantly longer running times.

 
Option -l minimum_number_of_matches_in_pattern
Default -l same as -k
Description This option is used to limit reported pattern to have at least l tokens. It is suggested to use at least -l 4 to get pattern that are sufficiently specific

 
Option -S max_seconds_per_run
Default -S 1800 (1/2 hour)
Description This option limits the total time allocated to each individual iteration. This is useful because very low support levels can result in extremely long runs. If the time is exceeded the program reports the motifs discovered so far and terminates.

Other options can be used but are not suggested

0. SPLASH
1. Algorithm
2. Performance
3. Pattern Discovery

Syntax
DNA/Protein Seq.
Constraints
Statistical Constr.
Similarity Matrix
Parallel Execution
Output Format
Other

4. Exhaustive Discovery

Syntax

5. Hierarchical Discovery

Syntax

6. Search

Syntax

7. References