New covariance-based feature extraction methods for classification and prediction of high-dimensional data
Abstract
When analyzing high dimensional data sets, it is often necessary to implement feature extraction methods in order to capture relevant discriminating information useful for the purposes of classification and prediction. The relevant information can typically be represented in lower-dimensional feature spaces, and a widely used approach for this is the principal component analysis (PCA) method. PCA efficiently compresses information into lower dimensions; however, studies indicate that it is not optimal for feature extraction especially when dealing with classification problems. Furthermore, for high-dimensional data having limited observations, as is typically the case with remote sensing data and nonstationary data such as financial data, covariance matrix estimation becomes unreliable, and this adversely affects the representation of data in the PCA domain. In this thesis, we first introduce a new feature extraction method called summed component analysis (SCA), which makes use of the structure of eigenvectors of the common covariance matrix to generate new features as sums of certain original features. Secondly, we present a variation of SCA, known as class summed component analysis (CSCA). CSCA takes advantage of the relative ease of computing the class covariance matrices and uses them to determine data transformations. Since the new features consist of simple sums of the original features, we are able to gain a conceptual meaning of the new representation of the data which is appealing for man-machine interface. We evaluate these methods on data sets with varying sample sizes and on financial time series, and are able to show improved classification and prediction accuracies.
Degree
Ph.D.
Advisors
Ersoy, Purdue University.
Subject Area
Finance|Electrical engineering
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.