Probabilistic analysis of digital search trees

Jing Tang, Purdue University

Abstract

A generalized Digital Search Tree (in short: b-DST), built from strings over a V-ary alphabet $\cal A$ is a data structure that makes use of digital properties of the strings. The root contains b strings and V links, each pointing to a node which is a root of a b-DST itself. The k-th symbol of a string determines, at level k of the tree, which subtree should be chosen to search or store the string. A b-DST can be characterized by depth, internal path length, and number of nodes. The depth of a b-DST is defined as the number of nodes in a path from the root to a randomly selected string stored in the tree. We assume that the strings are statistically independent and the number of strings is fixed and equal to m. In the Bernoulli model, we assume the symbols in a string constitute an independent sequence of Bernoulli trials. If all symbols are equally likely to occur, then the model is called symmetric Bernoulli model, otherwise we deal with the asymmetric Bernoulli model. In another probabilistic model considered in this thesis, we postulate that the next symbol in a string depends on a finite number of previous ones. This is called the Markovian model. This thesis contains two new results: the first one concerns the b-DST under the asymmetric Bernoulli model, the second one is on b-DST for b = 1 under the Markovian model. In both cases, we obtain asymptotic expansions of the mean, the variance, and the limiting distribution of the depth. In the proof, we used several analytical techniques such as: (i) Poissonization (a technique that replaces the original input by a Poisson process in order to take advantages of several unique properties of this process); (ii) Mellin transform and its inverse (used to obtain asymptotics of Poisson Generating Functions); and (iii) Depoissonization (that translates the results under the Poisson model into those under the Bernoulli model). A numerical method is also developed to evaluate a constant in our formula for the depth of a b-DST, which is important in some applications. We should point out that the results and techniques proposed here can be used to analyze other digital data structures under Bernoulli and Markovian models.

Degree

Ph.D.

Advisors

Szpankowski, Purdue University.

Subject Area

Mathematics|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS