Hierarchical Discovery Syntax

WARNING: This mode is still experimental and my end in an infinite loop. Changing the parameters, e.g., increasing -l or the minClSize usually eliminates the problem. This will be fixed in a later revision.

Syntax is

splash InputFile -H [minSupport [minClSize [sparseness]]] [options] 

minSupport  lowest motif support as a percentage of the total number of sequences in InputFile (Suggested 30)
minClSize largest Cluster size for which no further analysis is performed. (Suggested 5)
sparseness 0 Only k0 w0 is used as a density constraints

1 Both (k0, w0) and (k0, 2 w0) are used

2 (k0, w0), (k0, 2 w0), and (k0, 4 w0) are used

(Suggested 1, with -k 3 and -w 8)

Suggested Syntax

splash file -H 5 30 0 -l 5

Example: 

running splash trypsin.fa -H 20 30 0 -l 5 produces the following taxonomy: trypsin_Browser.html

Options for Pattern Discovery:

Option -% min_percent_support 
Default -% 100
Description Allows to save time by reducing the initial minimum support

Option -w window
Default -w 8
Description In combination with -k, allows to search for patterns that are more ore less dense

Option -k min_tokens_in_window
Default -k 3
Description In combination with -w, allows to search for patterns that are more ore less dense

 
Option -Z minZScore
Default -Z 1000
Description  

 
Option -z
Default set
Description  

 
Option -n- min_patterns
Default -n- 10
Description Support and density are decreased until at least min_pattern are found that satisfy all the other constraints. It is suggested to use at least the default -n- 10.

 

Option -mt similarity_threshold
Default -mt 1.0
Description This option sets the threshold used to define similarity classes. It is suggested to use -mt 0 for very sensitive searches

 
Option -b number_of_indentites
Default -b 2
Description This option is used to limit the minimum number of identical matches in the regular expression. It is suggested to use -b 1 for very sensitive searches. This may result in significantly longer running times.

 
Option -l minimum_number_of_matches_in_pattern
Default -l same as -k
Description This option is used to limit reported pattern to have at least l tokens. It is suggested to use at least -l 4 to get pattern that are sufficiently specific

Other options can be used but are not suggested

0. SPLASH
1. Algorithm
2. Performance
3. Pattern Discovery

Syntax
DNA/Protein Seq.
Constraints
Statistical Constr.
Similarity Matrix
Parallel Execution
Output Format
Other

4. Exhaustive Discovery

Syntax

5. Hierarchical Discovery

Syntax

6. Search

Syntax

7. References