Variance of the internal profile in suffix trees

Jeffrey B Gaither, Purdue University

Abstract

A suffix tree is a data structure used for storing, sending, and analyzing words. Suffix trees are particularly useful for identifying or counting repeated occurrences of some pattern in a long string: for example, counting the number of occurrences of a pattern in a string of DNA. In this thesis we examine the internal profile of a suffix tree, which is the number of internal nodes the suffix tree has at a given level. Our goal is to analyze the variance of the internal profile as the tree grows large. While it is known that the expected internal profile of a suffix tree agrees asymptotically with that of a trie (which is a simpler cousin to a suffix tree, much easier to analyze), the variances of the internal profiles of suffix trees and tries are known, on the basis of experiments, to be different. We use combinatorics on words and residue calculus to derive an exact expression for the variance, and then show that this expression can be replaced with an approximation that is tractable via known methods. We then apply the Mellin transform and saddle point analysis to calculate asymptotic properties (matched by empirical results) for the dominant term of our approximation. Our methodology should be extendable to the other terms too, and therefore suffice to derive an asymptotic expression for the variance in full.

Degree

Ph.D.

Advisors

Bell, Purdue University.

Subject Area

Mathematics|Statistics|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS