Date of Award

Winter 2014

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering

First Advisor

Mireille Boutin

Committee Chair

Mireille Boutin

Committee Member 1

David J. Love

Committee Member 2

Edward J. Delp

Committee Member 3

Xiaojun Lin

Abstract

Clustering high-dimensional data is more difficult than clustering low-dimensional data. The problem is twofold. First, there is an efficiency problem related to the data size, which increases with the dimensionality. Second, there is an effectiveness problem related to the fact that the mere existence of clusters in sample sets of high dimensions is questionable, as empirical samples hardly tend to cluster together in a meaningful fashion. The current approach to addressing this issue is to seek clusters in embedded subspaces of the original space. However, as dimensionality increases, a naive exhaustive search among all subspaces becomes exponentially more complex, which leads to an overwhelming time complexity. We propose an alternative approach for high-dimensional data clustering. Our solution is a top-down hierarchical clustering method using a binary tree of 1D random projections. As real data tends to have a lot of structures, we show that a 1D random projection of real data captures some of that structure with a high probability. More specifically, the structure manifests itself as a clear binary clustering in the projected data (1D). Our approach is efficient because most of the computations are performed in 1D. To increase efficiency of our method even further, we propose a fast 1D 2-means clustering method, which takes advantage of the 1D space. Our method achieves a better quality of clustering as well as a lower run-time compared to existing high-dimensional clustering methods.

Share

COinS