Analysis of an error resilient Lempel -Ziv algorithm via suffix trees

Mark Daniel Ward, Purdue University

Abstract

In a suffix tree, the multiplicity matching parameter (MMP) Mn is the number of leaves in the subtree rooted at the branching point of the (n + 1)st insertion. Equivalently, the MMP is the number of pointers into the database in the Lempel-Ziv '77 data compression algorithm. We prove that the MMP asymptotically follows the logarithmic series distribution plus some fluctuations. In the proof we compare the distribution of the MMP in suffix trees to its distribution in tries built over independent strings. Our results are derived by both probabilistic and analytic techniques of the analysis of algorithms. In particular, we utilize combinatorics on words, bivariate generating functions, pattern matching, recurrence relations, analytical poissonization and depoissonization, the Mellin transform, and complex analysis.

Degree

Ph.D.

Advisors

Szpankowski, Purdue University.

Subject Area

Mathematics|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS