Date of Award
12-2016
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Advisor
David F. Gleich
Committee Chair
David F. Gleich
Committee Member 1
Alex Pothen
Committee Member 2
Ahmed Sameh
Committee Member 3
Xavier Tricoche
Abstract
Complex optimization models and problems in machine learning often have the majority of information in a low rank subspace. By careful exploitation of these low rank structures in clustering problems, we find new optimization approaches that reduce the memory and computational cost.
We discuss two cases where this arises. First, we consider the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address overlapping and outliers in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. We utilize low rank structures in the solution matrix of the convex formulation and use a low-rank factorization of the solution matrix directly as a practical alternative. The resulting optimization problem is non-convex, but has a smaller number of solution variables, and can be locally optimized using an augmented Lagrangian method. In addition, we consider two fast multiplier methods to accelerate the convergence of the augmented Lagrangian scheme: a proximal method of multipliers and an alternating direction method of multipliers. For the proximal augmented Lagrangian, we show a convergence result for the non-convex case with bound-constrained subproblems. When the clustering performance is evaluated on real-world datasets, we show this technique is effective in finding the ground-truth clusters and cohesive overlapping communities in real-world networks.
The second case is where the low-rank structure appears in the objective function. Inspired by low rank matrix completion techniques, we propose a low rank symmetric matrix completion scheme to approximate a kernel matrix. For the kernel k-means problem, we show empirically that the clustering performance with the approximation is comparable to the full kernel k-means.
Recommended Citation
Hou, Yangyang, "Low rank methods for optimizing clustering" (2016). Open Access Dissertations. 935.
https://docs.lib.purdue.edu/open_access_dissertations/935