Classification of high dimensional data with limited training samples

Saldju Tadjudin, Purdue University

Abstract

An important problem in pattern recognition is the effect of limited training samples on classification performance. When the ratio of the number of training samples to the dimensionality is small, parameter estimates become highly variable, causing the deterioration of classification performance. This problem has become more prevalent in remote sensing with the emergence of a new generation of sensors. While the new sensor technology provides higher spectral and spatial resolution, enabling a greater number of spectrally separable classes to be identified, the needed labeled samples for designing the classifier remain difficult and expensive to acquire. In this thesis, several issues concerning the classification of high dimensional data with limited training samples are addressed. First of all, better parameter estimates can be obtained using a large number of unlabeled samples in addition to training samples under the mixture model. However, the estimation method is sensitive to the presence of statistical outliers. In remote sensing data, classes with few samples are difficult to identify and may constitute statistical outliers. Therefore, a robust parameter estimation method for the mixture model is introduced. Motivated by the fact that covariance estimates become highly variable with limited training samples, a covariance estimator is developed using a Bayesian formulation. The proposed covariance estimator is advantageous when the training set size varies and reflects the prior of each class. Finally, a binary tree design is proposed to deal with the problem of varying training sample size. The proposed binary tree can function as both a classifier and a feature extraction method. The benefits and limitations of the proposed methods are discussed and demonstrated with experiments.

Degree

Ph.D.

Advisors

Landgrebe, Purdue University.

Subject Area

Electrical engineering|Remote sensing

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS