Rank -aware query processing and optimization

Ihab F Ilyas, Purdue University

Abstract

This dissertation focuses on supporting ranking in relational database systems through a rank-aware query processing and optimization framework. We introduce ranking algorithms and operators to be adopted by current relational query engines and we provide a cost-based query optimization technique that integrates the proposed operators in practical relational query processors. In particular, we introduce two rank-join algorithms. The first algorithm joins multiple ranked inputs on key attributes and is realized as the binary key rank-join query operator KRJN. The second rank-join algorithm is more general and joins multiple ranked inputs on general join conditions. The second algorithm is realized in two binary query operators, HRJN and HRJN*. Our rank-join algorithms make use of the individual orders of the input relations. The join results are ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. To make these operators practically useful, we introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending System $ dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting property that triggers the generation of rank-aware query plans. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this work is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans, although challenging, is key to the full integration of rank-join operators in real-world query processing engines. We experimentally evaluate our rank-join operators and optimization framework by modifying the query optimizer of an open-source database management system. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance for our techniques. The experiments also show the validity of our framework and the accuracy of the proposed estimation model.

Degree

Ph.D.

Advisors

Aref, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS