Profiles of Patricia Tries
Digital trees are data structures that represent sets of strings according to their shared prefix structure. In the most fundamental of such trees, a trie, each string in the set is represented by a sequence of edges, each representing a single letter of the string, starting at the root of the tree and ending at a leaf, the parent edge of which corresponds to the last letter of the longest prefix that the string shares with any other string in the set. A PATRICIA trie is a trie in which each non-branching path is compressed into a single edge. The external profile B n,k, defined to be the number of leaves at level k of a PATRICIA trie on n strings, is an important "summarizing'' parameter, in terms of which several other parameters of interest can be formulated. Here we derive precise asymptotics for the expected value and variance of Bn,k , as well as a central limit theorem with error bound on the characteristic function, for PATRICIA tries on n infinite binary strings generated by a memoryless source with bias p > 1/2 for k ∼ α\log n with α ∈ (1/log(1/q) + ε, 1/log(1/ p) – ε) for any fixed ε > 0 . In this range, E[Bn,k] = Θ(Var[Bn,k ]) , and both are of the form Θ(n β(α)/√log n), where the Θ hides bounded, periodic functions of log n whose Fourier series we explicitly determine. The compression property leads to extra terms in the Poisson functional equations for the profile which are not seen in tries or digital search trees, resulting in Mellin transforms which are only implicitly given in terms of the moments of Bm,j for various m and j . Thus, the proofs require information about the profile outside the main range of interest. We then extend our results to the boundaries of the central region, allowing analyses of the typical height and fillup level, both of which exhibit a surprising phase transition with respect to p . Our derivations rely on analytic techniques, including Mellin transforms, analytic de-Poissonization, the saddle point method, and careful bounding of complex functions.
Szpankowski, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our