The Boyer-Moore algorithm (BM) is a fast, compact algorithm for finding all occurrences of a pattern string in a text string. Previous papers have addressed the worst-case running time of BM, which occurs rarely in practice. In this paper, we derive an approximation to Φ (BM) the average number of character probes made by BM. Let M = pattern length, N = text string length, α = the alphabet size, q = 1 /α and q= I — q. By modeling BM as a probabilistic finite automaton, we show that Φ(BM) h when M < α and that Φ(BM ) N q(l + g V ) when M > α. An immediate consequence is that Φ(BM) is O(N/ log α M) as M -> \infty The above formulas match well with measured data.
Algorithm, average-case complexity, pattern, pattern matching, string, string searching
Date of this Version