DENSITY IDENTIFICATION AND RISK ESTIMATION

THOMAS EDWARD FLICK, Purdue University

Abstract

This study investigates two related facets of statistical pattern recognition, density identification and risk estimation. Density identification is accomplished by density testing and parameter estimation. Successful density identification leads to efficient and accurate methods of risk estimation. Otherwise risk estimation must be done with less desirable nonparametric methods. In addition to density identification, both parametric and nonparametric risk estimation are studied. A density test is proposed to determine whether a data set is Gaussian. The test is based on the "magnifying glass" method of clustering, which looks at regions of a distribution as if through a magnifying glass. The method works best on Gaussian distributions, so that the overall Gaussianness of a distribution is indicated by its tightness after clustering. The problem of parametrizing an undifferentiated bimodal Gaussian mixture is addressed. The discussion considers the n-dimensional, general case with all parameters unknown. The problem is solved by an improved version of the well-known moment method. Also, moments are suggested as a means of counting the modes of a Gaussian mixture with equal covariances. In an investigation of parametric risk estimation, classification error is analyzed for a situation with hundreds of classes. The error (called single class error) is found to depend on only a few parameters, the average nearest neighbor distance between class means, the effective dimensionality of their distribution (an estimate is proposed), and the Gaussian noise level. Classification error can be estimated from standard error curves provided. Improvements in nonparametric risk estimation are possible with the 2-NN rule. This procedure reduces the effects of sample size by eliminating the first order deviation of the finite sample risk from the asymptotic. Since the asymptotic 2-NN rule is related to the common 1-NN rule by a factor of two, it is possible to substitute the 2-NN rule for the 1-NN rule for more accurate risk estimation. The 1-NN risk estimate may be improved by an optimal global metric. This is proposed mainly as a theoretical basis for nonparametric feature extraction. The approach is to reduce mean-squared error between the finite sample risk and the asymptotic by estimating the optimal weighting matrix A(,0) for a quadratic NN metric. The use of A(,0) in nonparametric feature extraction is suggested. Also, a parametrically-directed NN selection scheme is described.

Degree

Ph.D.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS