Cyber Center Publications

Document clustering with universum

Dan Zhang
Jingdong Wang
Luo Si, Purdue UniversityFollow

Abstract

Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As a recently proposed concept, Universum is a collection of "non-examples" that do not belong to any concept/cluster of interest. This paper proposes a novel document clustering technique -- Document Clustering with Universum, which utilizes the Universum examples to improve the clustering performance. The intuition is that the Universum examples can serve as supervised information and help improve the performance of clustering, since they are known not belonging to any meaningful concepts/clusters in the target domain. In particular, a maximum margin clustering method is proposed to model both target examples and Universum examples for clustering. An extensive set of experiments is conducted to demonstrate the effectiveness and efficiency of the proposed algorithm.

Keywords

Algorithms, clustering, constrained concave-convex procedures, margin clustering, universum

Date of this Version

2011

DOI

10.1145/2009916.2010033

Comments

SIGIR '11 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information.

Link to Full Text

Find in your library

COinS

Cyber Center Publications

Document clustering with universum

Abstract

Keywords

Date of this Version

DOI

Comments

Search

Links

Links for Authors

Browse

Cyber Center Publications

Document clustering with universum

Authors

Abstract

Keywords

Date of this Version

DOI

Comments

Share

Search

Links

Links for Authors

Browse