Nonparametric estimation of the Bayes error

Donald Michael Hummels, Purdue University

Abstract

This thesis is concerned with the performance of nonparametric classifiers and their application to the estimation of the Bayes error. Although the behavior of these classifiers as the number of preclassified design samples becomes infinite is well understood, very little is known regarding their finite sample error performance. Here, we examine the performance of Parzen and k-nearest neighbor (k-NN) classifiers, relating the expected error rates to the size of the design set and the various design parameters (kernel size and shape, value of k, distance metric for nearest neighbor calculation, etc.). These results lead to several significant improvements in the design procedures for nonparametric classifiers, as well as improved estimates of the Bayes error rate. Our results show that increasing the sample size is in many cases not an effective practical means of improving the classifier performance. Rather, careful attention must be paid to the decision threshold, selection of the kernel size and shape (for Parzen classifiers), and selection of k and the distance metric (for k-NN classifiers). Guidelines are developed toward proper selection of each of these parameters. The use of nonparametric error rates for Bayes error estimation is also considered, and techniques are given which reduce or compensate for the biases of the nonparametric error rates. A bootstrap technique is also developed which allows the designer to estimate the standard deviation of a nonparametric estimate of the Bayes error.

Degree

Ph.D.

Advisors

Fukunaga, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS