Analysis of some trie parameters under probabilistic models

Bonita Marie Rais, Purdue University

Abstract

A word is a string of symbols, finite or infinite in length, from a finite alphabet. Many algorithms in computer science, such as pattern matching, data compression, searching, hashing, and conflict resolution algorithms, require that a set of words be efficiently stored and analyzed. However, most algorithms are designed to optimize the asymptotic worst-case performance. Often times this approach targets unrealistic, even pathological, inputs and neglects the possibility that a simpler algorithm might perform just as well, or even better, in practice. The task is to discover if such algorithms for words exist. Since the efficiency of any algorithm depends heavily on its underlying data structures, it is necessary to analyze the characteristics of these structures under a probabilistic framework. The data structures most frequently used in algorithms on words are digital trees; in particular, the trie and its variants, the PATRICIA trie which has no one-way branches, the suffix tree whose keys are suffixes of a particular string, and the compact suffix (PAT) tree. This research focuses on the suffix tree and the PAT tree since little is known about either. Initially, the PATRICIA trie is examined because the typical behavior of PATRICIA tries and PAT trees should not differ too much from one another, even though the PATRICIA trie is constructed over statistically independent keys. The limiting distribution for the depth of the PATRICIA trie under symmetric and asymmetric alphabets is computed. In the asymmetric case, the limiting distribution for the depth in a PATRICIA trie storing n keys is normal although the results for the symmetric case are quite different. However, in either case, the results lead to the conclusion that the PATRICIA trie is well balanced. Using techniques and information gained from analyzing the PATRICIA trie, we compute the asymptotic height of a suffix tree. Finally, the results we obtain for the depth of PAT trees confirm the expectation of similar behavior with PATRICIA tries; that is, the limiting distribution for the depth in a PAT tree is again normal with similar mean and variance.

Degree

Ph.D.

Advisors

Szpankowski, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS