4- How Good is the Algorithm?We have used two different methods to assess the quality of the signatures obtained by the extraction algorithm described in the previous section. One technique is to generate random sequences, and compare estimated and actual measured probabilities of those random sequences with one another. The resulting plot of estimated vs. actual probabilities provides a very clear picture of the algorithm's effectiveness. The second technique is to compare the estimated probabilities of signatures extracted from viruses by either the algorithm itself or a human expert, and observe the number of false positives as a function of the estimated probability. The remainder of the section treats both techniques in some detail.
|