IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    

IBM Technical Journals

Special report: Celebrating 50 years of the IBM Journals
All topics > Computing Methodologies >

Efficient randomized pattern-matching algorithms

Award plaque by R. M. Karp
and M. O. Rabin

We present randomized algorithms to solve the following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first occurrence of X as a consecutive block within Y. The algorithms represent strings of length n by much shorter strings called fingerprints, and achieve their efficiency by manipulating fingerprints instead of longer strings. The algorithms require a constant number of storage locations, and essentially run in real time. They are conceptually simple and easy to implement. The method readily generalizes to higher-dimensional pattern-matching problems.

Originally published:

IBM Journal of Research and Development, Volume 31, Issue 2, pp. 249-260 (1987).

Significance:

In this seminal paper, Karp and Rabin invented a randomized string-matching algorithm in which short patterns, known as fingerprints, are used instead of the given pattern in order to speed up the matching process. Although the algorithm runtime is quadratic in the worst case, the average runtime is close to linear. This property, together with the fact that the algorithm can be used to search at the same time for any of a large number of patterns, makes it very effective for multiple pattern matching, for example, in the detection of plagiarism. The algorithm can be extended to solve related problems, such as two-dimensional string matching.

Comments:


    About IBMPrivacyContact