Efficient querying of constantly evolving data

Dmitri V Kalashnikov, Purdue University

Abstract

This thesis addresses important challenges in the emerging areas of sensor (streaming data) databases and moving objects databases. It focuses on the important class of applications that are characterized by (a) constant change in the data values; (b) long-running (continuous) queries that have to be repeatedly evaluated as the data changes; (c) inherent imprecision in the data; and (d) need for near real-time results. The thesis addresses the scalability and performance challenges faced by these applications. The first part of the thesis studies the problem of scalable efficient processing of continuous range queries on moving objects. We introduce two novel highly scalable solutions to the problem: a disk-based technique called Velocity Constrained Indexing (VCI) and an in-memory technique called grid indexing. VCI is a technique for maintaining an index on moving objects that allows the index to be useful without constantly updating it as the data values change. For in-memory settings, we show the superiority of our grid indexing solution to other methods. The second part of the thesis covers the problem of similarity joins for low- and high-dimensional data. Two new similarity join algorithms are introduced: the Grid-join is for low-dimensional data and the EGO*-join is for high-dimensional data. Both algorithms show substantial improvement over the state of the art similarity join algorithms for low- and high-dimensional domains. Finally, the third part of the thesis presents an analysis and novel solutions of the important problem of handling the uncertainty inherent in the environments with constantly changing data. Probabilistic queries are introduced and a classification of queries is developed based on the nature of query result set. Algorithms are provided for solving typical probabilistic queries from each class. We show that, unlike standard queries, probabilistic queries have a notion of quality of answer. We introduce several metrics for measuring the quality as well as various update policies for improving it.

Degree

Ph.D.

Advisors

Prabhakar, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS