A central problem in classifier design is the estimation of classification error. The difficulty in classifier design arises in situations where the sample distribution is unknown and the number of training samples available is limited. In this paper, we present a new approach for solving this problem. In our model, there are two types of classification error: approximation and generalization error. The former is due to the imperfect knowledge of the underlying sample distribution, while the latter is mainly the result of inaccuracies in parameter estimation, which is a consequence of the small number of training samples. We therefore propose a criterion for optimal classifier selection, called the Generalized Minimum Empirical Criterion (GMEE). The GMEE criterion consists of two terms, corresponding to the estimates of two types of error. The first term is the empirical error, which is the classification error observed for the training samples. The second is an estimate of the generalization error, which is related to the classifier complexity. In this paper we consider the Vapnik-Chervonenkis dimension (VCdim) as a measure of classifier complexity. Hence, the classifier which minimizes the criterion is the one with minimal error probability. Bayes consistency of the GMEE criterion has been proven. As an application, the criterion is used to design the optimal neural network classifier. A corollary to the Bayes optimality of neural network-based classifiers has been proven. Thus, our approach provides a theoretic foundation for the connectionist approach to optimal classifier design. Experimental results are given to validate the approach, followed by discussions and suggestions for future research.
Date of this Version