Abstract

We present an algorithm, RQ for finding all occurrences of a fixed-length pattern, Pi,J?2> * * * »Pp , in a text string, where each p,- can match an arbitrary set of characters. Our algorithm is optimal in that it examines the minimum average number of text characters, which is not necessarily the same as being optimal in running time. This paper answers the question of optimal string searching put forth in [KMP77]. Let a = the alphabet size, P= the length of the string matched by the pattern, T= the length of the text, W= the word size in bits of the underlying machine, and $(i?Q) = the average number of text characters examined RQvWe derive an asymptotic approximation for $(RQ) when P< a. We also show that &(RQ) < (4 Ioga P/3)(T/P), when P > a. In the worst case, RQ examines T characters. Our algorithm requires space 0(||II|| |P/W|). In addition, our method of analysis is applicable to other algorithms modeled by a finite automaton. We present an efficient implementation of our algorithm when P < W. In practice, compared to the Boyer-Moore algorithm, RQ requires slightly more space, accepts a more general range of patterns, and runs in comparable time.

Date of this Version

2-1-1989

Share

COinS