Generic frameworks for interactive personalized interesting pattern discovery

Mansurul Bhuiyan MD., Purdue University

Abstract

The traditional frequent pattern mining algorithms generate an exponentially large number of patterns of which a substantial portion are not much significant for many data analysis endeavours. Due to this, the discovery of a small number of interesting patterns from the exponentially large number of frequent patterns according to a particular user's interest is an important task. Existing works on pattern summarization compute a summary that globally represents the entire frequent pattern-set. Such a representative set solves the interesting pattern discovery problem from a global perspective which, more ofter, far from the personalization that is required to fulfill the pattern discovery criteria of a particular user. In this dissertation, we propose two interactive pattern discovery frameworks to identify a set of interesting patterns for a particular user without requiring any prior input on the measure of interest of patterns from the user. The proposed frameworks are generic to support discovery of the interesting set, sequence and graph type patterns. In the first framework, we solve the problem by proposing a novel solution which is based on Markov Chain Monte Carlo (MCMC) sampling of patterns. Our solution allows interactive sampling so that the sampled patterns can fulfill the user's requirement effectively. Instead of returning all the patterns for feedback, the proposed paradigm sends back a small set of randomly selected patterns so that an adversary will not be able to reconstruct the entire dataset with higher accuracy using the released set of patterns, hence protecting the confidentiality of the data-set as a whole. This feature enables the proposed framework to mine interesting patterns from hidden datasets. In the second framework, we develop an interactive %personalized interesting pattern discovery framework called PRIIME that is based on iterative learning of a user's interestingness function. The learning process is supervised where the supervision is provided by a user through feedback on a small set of pattern. Depending on the nature of feedback i.e. non-negative real valued or discrete, in PRIIME, we develop gradient boosted tree-based regression and softmax classification algorithms that use a limited number of interactive feedbacks from the user to learn the interestingness profile of the user, and use this profile for pattern recommendation. Both the proposed frameworks are generic in nature. First proposed framework discover patterns by commencing a random walk over a pattern space so the framework is inherently generic. However, in PRIIME, the interactive pattern discovery is performed by modeling a traditional regression/classification function, for which we need a vector representation of pattern instances. For vector representation of set patterns we use bag-of-words based model. However, for graph and sequential pattern we propose a neural net~(NN) based unsupervised feature construction approach. In an interactive pattern discovery system, effective nomination of patterns for feedback to train a learning model is an important task. In both the proposed frameworks, we develop efficient strategy that combine exploration and exploitation to select patterns for feedback. We show experimental results on several real-life datasets to validate the performance of the proposed frameworks. We also compare with the existing methods of interactive pattern discovery to show that our methods are substantially superior in performance. Finally, to portray the applicability of the interactive pattern discovery, we build a new home discovery tool for home buyers called Raven. It uses interactive feedback over a collection of home feature-sets to learn a buyer's interestingness profile. Then it recommends a small list of homes that match with the buyer's interest, eventually decreasing the time interval between home search initiation and purchase.

Degree

Ph.D.

Advisors

Hasan, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS