Efficient query processing for uncertain data

Yinian Qi, Purdue University

Abstract

Applications with uncertain data pose many challenges for data management and query processing. This dissertation advances the state of the art for efficient query processing over uncertain data. We study three types of probabilistic queries: nearest-neighbor queries, skyline queries and the general select-project-join queries, all of which could leverage a probability threshold for pruning such that only results that satisfy the query with probabilities over the given threshold are returned. For nearest-neighbor queries, we design novel indexes and data structures to monitor the pruning status and uncover pruning opportunities. For skyline queries, we propose two filtering schemes to quickly identify interesting instances whose skyline probabilities are over the threshold: i) by bounding an instance's skyline probability, and ii) by comparing the instance with others based on dominance relationship. In applications of skyline analysis where "thresholding'' is not desirable, we propose the problem of computing all skyline probabilities and for the first time present two worst-case sub-quadratic algorithms for it. We further give an efficient algorithm to solve the online version of the problem. Finally, we study the general select-project-join (SPJ) queries under the Orion uncertainty model and propose optimization rules to leverage the threshold for early pruning of unqualified tuples. We also extend our study to SPJ queries with duplicate elimination. We adopt a general tuple uncertainty model for this case and design new techniques for handling duplicate elimination. Our experiments on various data sets show that our techniques are both effective and efficient.

Degree

Ph.D.

Advisors

Prabhakar, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS