Feature extraction and classification algorithms for high-dimensional data
Abstract
In this research, feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next a novel approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A novel characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: (1) it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; (2) it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, we study some problems encountered in analyzing high dimensional data and propose possible solutions. We first recognize the increased importance of the second order statistics in analyzing high dimensional data. By investigating the characteristics of high dimensional data, we suggest the reason why the second order statistics must be taken into account in high dimensional data. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. We propose a method to visualize statistics using a color code. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.
Degree
Ph.D.
Advisors
Landgrebe, Purdue University.
Subject Area
Electrical engineering
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.