Optimization and parallelization of database queries
Abstract
In the current work, we derive a complete approach to optimization and automatic parallelization of multiple queries. We define four levels: command-level, operation-level, algorithm-level, and low-level. Both optimization and parallelization are considered at all levels of multiple-query representation. Optimization and parallelization transformations are applied to a series of queries, to the algorithms used to encode operations within a series of queries, and to the lower-level code representing individual database operations. Although automatic parallelization of conventional language programs is now widely accepted, relatively little emphasis has been placed on automatic parallelization of multiple queries. The first step of automatic parallelization is to detect potential parallelism. The data dependence analysis proposed here provides a method for precisely determining if the portions of relations affected by various database operations overlap. Both inter- and intra-query parallelism can be detected using these techniques. Use of the dependence information to generate serializable, deadlock-free, serial or parallel schedules is also discussed. The next step is the efficient use of inter- and intra-query parallelism. Database operations are highly dependent on the size of memory which is allocated for the operation. We investigate the memory allocation problem and also examine tradeoffs involving parallelism at different levels. Further, we propose techniques for process packaging and hybrid static/dynamic scheduling. An adaptation of self-scheduling technique is used for scheduling within a process.
Degree
Ph.D.
Advisors
Dietz, Purdue University.
Subject Area
Electrical engineering|Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.