|
WARNING: This mode is still experimental and my
end in an infinite loop. Changing the parameters, e.g., increasing -l or
the minClSize usually eliminates the problem. This will be fixed in a
later revision.
Syntax is:
splash InputFile -H [minSupport [minClSize [sparseness]]]
[options]
| minSupport |
lowest motif support as a percentage of the total
number of sequences in InputFile (Suggested 30) |
| minClSize |
largest Cluster size for which no further analysis
is performed. (Suggested 5) |
| sparseness |
0 Only k0 w0 is used as a
density constraints 1 Both (k0, w0) and
(k0, 2 w0) are used
2 (k0, w0), (k0, 2 w0),
and (k0, 4 w0) are used
(Suggested 1, with -k 3 and -w 8) |
Suggested Syntax:
splash file -H 5 30 0 -l 5
Example:
running splash trypsin.fa -H
20 30 0 -l 5 produces the
following taxonomy: trypsin_Browser.html
Options for Pattern Discovery:
| Option |
-% min_percent_support |
| Default |
-% 100 |
| Description |
Allows to save time by reducing the initial minimum
support |
| Option |
-w window |
| Default |
-w 8 |
| Description |
In combination with -k, allows to search for
patterns that are more ore less dense |
| Option |
-k min_tokens_in_window |
| Default |
-k 3 |
| Description |
In combination with -w, allows to search for
patterns that are more ore less dense |
| Option |
-Z minZScore |
| Default |
-Z 1000 |
| Description |
|
| Option |
-z |
| Default |
set |
| Description |
|
| Option |
-n- min_patterns |
| Default |
-n- 10 |
| Description |
Support and density are decreased until at least min_pattern
are found that satisfy all the other constraints. It is suggested
to use at least the default -n- 10. |
| Option |
-mt similarity_threshold |
| Default |
-mt 1.0 |
| Description |
This option sets the threshold used to define similarity
classes. It is suggested to use -mt 0 for very sensitive
searches |
| Option |
-b number_of_indentites |
| Default |
-b 2 |
| Description |
This option is used to limit the minimum number of
identical matches in the regular expression. It is suggested to
use -b 1 for very sensitive searches. This may result in
significantly longer running times. |
| Option |
-l minimum_number_of_matches_in_pattern |
| Default |
-l same as -k |
| Description |
This option is used to limit reported pattern to
have at least l tokens. It is suggested to use at least -l
4 to get pattern that are sufficiently specific |
Other options can be used but are not
suggested |
|
0. SPLASH
1. Algorithm
2. Performance
3. Pattern Discovery
Syntax
DNA/Protein Seq.
Constraints
Statistical Constr.
Similarity Matrix
Parallel Execution
Output Format
Other
4. Exhaustive Discovery
Syntax
5. Hierarchical Discovery
Syntax
6. Search
Syntax
7. References
|