Abstract

Clustering high-dimensional data is more difficult than clustering low-dimensional data. The problem is twofold. First, there is an efficiency problem related to the data size, which increases with the dimensionality. Second, there is an effectiveness problem related to the fact that the mere existence of clusters in sample sets of high dimensions is questionable, as empirical samples hardly tend to cluster together in a meaningful fashion. The current approach to addressing this issue is to seek clusters in embedded subspaces of the original space. However, as dimensionality increases, a naive exhaustive search among all subspaces becomes exponentially more complex, which leads to an overwhelming time complexity. We propose an alternative approach for high-dimensional data clustering. Our solution is a top-down hierarchical clustering method using a binary tree of 1D random projections. As real data tends to have a lot of structures, we show that a 1D random projection of real data captures some of that structure with a high probability. More specifically, the structure manifests itself as a clear binary clustering in the projected data (1D). Our approach is efficient because most of the computations are performed in 1D. To increase efficiency of our method even further, we propose a fast 1D 2-means clustering method, which takes advantage of the 1D space. Our method achieves a better quality of clustering as well as a lower run-time compared to existing high-dimensional clustering methods.

Disciplines

Computer Engineering

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering

First Advisor

Mireille Boutin

Committee Chair

Mireille Boutin

Committee Member 1

David J. Love

Committee Member 2

Edward J. Delp

Committee Member 3

Xiaojun Lin

Date of Award

Winter 2014

Recommended Citation

Han, Sangchun, "A Method for Clustering High-Dimensional Data Using 1D Random Projections" (2014). Open Access Dissertations. 280.
https://docs.lib.purdue.edu/open_access_dissertations/280

Download

Included in

Computer Engineering Commons

COinS

Open Access Dissertations

A Method for Clustering High-Dimensional Data Using 1D Random Projections

Abstract

Disciplines

Degree Type

Degree Name

Department

First Advisor

Committee Chair

Committee Member 1

Committee Member 2

Committee Member 3

Date of Award

Recommended Citation

Included in

Search

Links

Links for Authors

Browse

Open Access Dissertations

A Method for Clustering High-Dimensional Data Using 1D Random Projections

Author

Abstract

Disciplines

Degree Type

Degree Name

Department

First Advisor

Committee Chair

Committee Member 1

Committee Member 2

Committee Member 3

Date of Award

Recommended Citation

Included in

Share

Search

Links

Links for Authors

Browse