Abstract

In recent years, a number of emerging applications, such as sensor monitoring systems, RFID networks and location based services, have led to the proliferation of uncertain data. However, traditional data mining algorithms are usually inapplicable in uncertain data because of its probabilistic nature. Uncertainty has to be carefully handled; otherwise, it might significantly downgrade the quality of underlying data mining applications.

Therefore, we extend traditional data mining algorithms into their uncertain versions so that they still can produce accurate results. In particular, we use a motivating example of sequential pattern mining to illustrate how to incorporate uncertain information in the process of data mining. We use possible world semantics to interpret two typical types of uncertainty: the tuple-level existential uncertainty and the attribute-level temporal uncertainty. In an uncertain database, it is probabilistic that a pattern is frequent or not; thus, we define the concept of probabilistic frequent sequential patterns. And various algorithms are designed to mine probabilistic frequent patterns efficiently in uncertain databases. We also implement our algorithms on distributed computing platforms, such as MapReduce and Spark, so that they can be applied in large scale databases.

Our work also includes uncertainty computation in supervised machine learning algorithms. We develop an artificial neural network to classify numeric uncertain data; and a Naive Bayesian classifier is designed for classifying categorical uncertain data streams. We also propose a discretization algorithm to pre-process numerical uncertain data, since many classifiers work with categoric data only. And experimental results in both synthetic and real-world uncertain datasets demonstrate that our methods are effective and efficient.

Keywords

Applied Sciences, Data Mining, Sequential Pattern Mining, Uncertain Database

Disciplines

Computer Sciences

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Yuni Xia

Committee Chair

Yuni Xia

Committee Member 1

Christopher Clifton

Committee Member 2

Snehasis Mukhopadhyay

Committee Member 3

Jennifer Neville

Committee Member 4

Sunil Prabhakar

Date of Award

3-2016

Recommended Citation

Ge, Jiaqi, "Sequential pattern mining with uncertain data" (2016). Open Access Dissertations. 650.
https://docs.lib.purdue.edu/open_access_dissertations/650

Download

Included in

Computer Sciences Commons

COinS

Open Access Dissertations

Sequential pattern mining with uncertain data

Abstract

Keywords

Disciplines

Degree Type

Degree Name

Department

First Advisor

Committee Chair

Committee Member 1

Committee Member 2

Committee Member 3

Committee Member 4

Date of Award

Recommended Citation

Included in

Search

Links

Links for Authors

Browse

Open Access Dissertations

Sequential pattern mining with uncertain data

Author

Abstract

Keywords

Disciplines

Degree Type

Degree Name

Department

First Advisor

Committee Chair

Committee Member 1

Committee Member 2

Committee Member 3

Committee Member 4

Date of Award

Recommended Citation

Included in

Share

Search

Links

Links for Authors

Browse