Optimization and parallelization of database queries

Myong Hoon Kang, Purdue University

Abstract

In the current work, we derive a complete approach to optimization and automatic parallelization of multiple queries. We define four levels: command-level, operation-level, algorithm-level, and low-level. Both optimization and parallelization are considered at all levels of multiple-query representation. Optimization and parallelization transformations are applied to a series of queries, to the algorithms used to encode operations within a series of queries, and to the lower-level code representing individual database operations. Although automatic parallelization of conventional language programs is now widely accepted, relatively little emphasis has been placed on automatic parallelization of multiple queries. The first step of automatic parallelization is to detect potential parallelism. The data dependence analysis proposed here provides a method for precisely determining if the portions of relations affected by various database operations overlap. Both inter- and intra-query parallelism can be detected using these techniques. Use of the dependence information to generate serializable, deadlock-free, serial or parallel schedules is also discussed. The next step is the efficient use of inter- and intra-query parallelism. Database operations are highly dependent on the size of memory which is allocated for the operation. We investigate the memory allocation problem and also examine tradeoffs involving parallelism at different levels. Further, we propose techniques for process packaging and hybrid static/dynamic scheduling. An adaptation of self-scheduling technique is used for scheduling within a process.

Degree

Ph.D.

Advisors

Dietz, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS