Towards efficient processing of big spatial data

Ahmed M Aly, Purdue University

Abstract

The ubiquity of location-aware devices has resulted in a plethora of location-based services in which huge amounts of spatial data need to be efficiently processed. To cope with such proliferation of spatial data, this dissertation addresses two key issues that are overlooked by existing spatial-query processing platforms: i) the multiplicity of predicates in spatial queries, and ii) the dynamic nature of big spatial data. A user's query can include multiple spatial and relational predicates. However, existing spatial-query processors focus only on the execution of queries with single spatial predicates, e.g., range or k-nearest-neighbor (kNN, for short). Queries with multiple kNN and relational predicates raise correctness and performance challenges. Because a kNN predicate implicitly applies a ranking operation, applying a kNN predicate before or after another (spatial or relational) predicate in a query evaluation pipeline may result in different outputs. Hence, classical query optimization heuristics, e.g., pushing selects below joins, may compromise the correctness of evaluation of these queries. This dissertation presents new algorithms and optimizations that can enhance the performance of these queries while maintaining the correctness of their evaluation. Furthermore, to arbitrate between the different optimizations, novel techniques for estimating the cost of the kNN predicates are presented. Experimental evaluation demonstrates that the proposed algorithms and optimizations, coupled with the cost estimation techniques, achieve orders of magnitude enhancement in query performance. To process large-scale spatial data, several cluster-based spatial-query processing systems have been proposed in the literature. However, these systems employ static data-partitioning structures that cannot adapt to data changes, and that are insensitive to the query workload. Hence, these systems are incapable of consistently providing good performance. To close this gap, this dissertation presents AQWA, an adaptive and workload-aware mechanism for partitioning large-scale spatial data. AQWA does not assume prior knowledge of the data distribution or the query workload. Instead, as data is consumed and queries are processed, the data partitions are incrementally updated. Experimental evaluation, which is based on real spatial data and various workloads of range and kNN queries, demonstrates that, compared to the state-of-the-art systems, AQWA achieves an order of magnitude enhancement in query performance.

Degree

Ph.D.

Advisors

Aref, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS