Robust clustering and regression using generalized maximum likelihood subset least-squares with applications

Jin-Nan Liaw, Purdue University

Abstract

We consider the use of subset least squares as a robust estimator for clustering and regression. The subset least squares is a one-dimensional method applying the generalized maximum likelihood principles to obtain a proper estimated set of good data from a contaminated data set. In fact, the subset least squares can be categorized as an asymmetrically trimmed mean estimator. For classification, we consider the use of subset least squares as a robust classifier. Several different classifiers are designed to simultaneously estimate the number of observations within each cluster, the center of each cluster, and the covariance of each cluster. For regression, we develop a new method to detect outliers and perform robust regression. Most current robust regression methods are very computationally expensive. On the contrast, the proposed algorithm has computational complexity of order O(N$\sp2$), where N is the number of observations. We run our algorithm on ten well-known data sets which are obtained from literature and all the significant outliers in each data set are detected. The results demonstrate that the new algorithm is superior to current robust regression algorithms in terms of computational speed and the detection of outliers. A robust linear prediction of speech using this algorithm is also considered. Future work will focus on using this algorithm in practical applications such as image restoration, robust surface reconstruction, robust system modeling, robust trajectory estimation, and robust nonlinear regression.

Degree

Ph.D.

Advisors

Kashyap, Purdue University.

Subject Area

Electrical engineering|Statistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS