ON THE FEASIBILITY OF A METHOD FOR RECOGNIZING ELEMENTS OF A SET OF DATA KEYS
Abstract
As Knuth noted, a hashing function that is one-to-one on the set of keys in the hash table is highly desirable. If the set of keys does not change often, then the decreased lookup time may repay the cost of finding a one-to-one hashing function. Sprugnoli called one-to-one hashing functions "perfect hashing functions" (phfs), and suggested developing algorithms that find a phf for any given key-set. The only completely general perfect hashing scheme known--Jaeschke's reciprocal hashing--selects values for numerical parameters in a hashing function so that the resulting function is one-to-one on T, the set of keys in the table. However, the length of the parameters may, according to Jaeschke's theorems, be proportional to k(.)log(U), where k is the number of keys in T, and U is the size of the universal set from which the keys in T are chosen; and his experiments show parameter lengths linear in k. This suggests that the practicality of this method is limited, since phfs computing with a parameter of size k can be faster than binary search only for bounded values of k. We study whether, theoretically, parameter growth proportional to k is inherent in perfect hashing methods similar to Jaeschke's. By combinatorial analyses of the minimal size for a collection of functions that contains a phf for every key-set of size k, we prove that the theoretical minimum parameter size is at least proportional to k, and at most proportional to k + log log(U). Thus, the observed parameter growth in Jaeschke's method is unavoidable. That result assumes the hash table is of size k, which is minimal. Generalizing our arguments to the case of larger tables, we prove that if table size is a constant multiple of k, then the parameter size remains linear in k, but even modest increases in table size can substantially reduce the coefficient of proportionality. This allows the possibility of perfect hashing methods useful for key-sets that are substantially larger than those acceptable for methods using minimal-size tables, but not arbitrarily large.
Degree
Ph.D.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.