Supported by the National Science Foundation under Grant ECS8513720.


This thesis is concerned with the performance of nonparametric classifiers and their application to the estimation of the Rayes error. Although the behavior of these classifiers as the number of preclassified design samples becomes infinite is well understood, very little is known regarding their finite sample error performance. Here, we examine the performance of Parzen and k-nearest neighbor (k-NN) classifiers, relating the expected error rates to the size of the design set and the various, design parameters (kernel size and shape, value of k, distance metric for nearest neighbor calculation, etc.). These results lead to several significant improvements in the design procedures for nonparametric classifiers, as well as improved estimates of the Bayes error rate. , Our results show that increasing the sample size is in many cases not an effective practical means of improving the classifier performance. Rather, careful attention must be paid to the decision threshold, selection of the kernel size and shape (for Parzen classifiers), and selection of k and the distance metric (for k-NN classifiers). Guidelines are developed toward propper selection of each of these parameters. The use of nonparametric error rates for Bayes error estimation is also considered, and techniques are given which reduce or compensate for the biases of the nonparametric error rates. A bootstrap technique is also developed which allows the designer to estimate the standard deviation of a nonparametric estimate of the Bayes error.

Date of this Version