Feature selection for unsupervised learning applied to content-based image retrieval

Jennifer Guani Dy, Purdue University

Abstract

This thesis explores the feature selection for unsupervised learning problem. We investigate the problem through our algorithm called FSSEM (Feature Subset Selection wrapped around Expectation-Maximization clustering) and through two different performance criteria for evaluating candidate feature subsets: maximum likelihood and scatter separability. We identify two issues: the need for selecting the number of clusters, and the need for normalizing the bias of feature selection criteria with respect to dimension. We show theoretical proofs on the dimensionality biases, and present a normalization scheme that can be applied to any criteria to ameliorate these biases. In addition to our automated algorithm, we developed Visual-FSSEM which incorporates visualization and feature selection in an interactive environment. We apply FSSEM to a medical content-based image retrieval system using our customized-queries approach (CQA) for retrieval. CQA first classifies a query using the features that best differentiate the major classes and then customizes the query to that class by using the features that best distinguish the images within the chosen major class. Our experiments show that our system improves doctors' diagnoses and that CQA increases retrieval precision from the traditional single feature vector approach.

Degree

Ph.D.

Advisors

Brodley, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS