Enabling automatic parallelization of industrial-grade applications

Brian S Armstrong, Purdue University

Abstract

Automatic parallelization techniques for finding loop-based parallelism fail to find efficient parallelism in industrial-grade applications in today’s scientific computing market. Additional techniques are needed over what is used to parallelize computational kernels in order to enable automatic parallelizing compilers to find significant parallelism in industrial-grade applications. Applying the same state-of-the-art automatic parallelizing techniques that have successfully achieved speedups with computational kernels to industrial-grade applications fails to speed up the applications. Most of the information and transformations required for automatic parallelization to be applied effectively to industrial-grade applications are already available to the state-of-the-art compiler. Though industrial-grade applications should be parallelizeable, an exponential increase in compile-time complexity and software engineering patterns lead to a failure in applying symbolic analysis, which prevents automatic parallelization. This thesis presents two new enabling techniques, the Propagating Expression Bounds Interprocedurally (PEBIL) and Array Containment Hierarchy (ARCH) techniques, with the goal of enabling symbolic analysis to be effective when the compiler performs linearization, partial program analysis, and inlining. PEBIL extracts value ranges of variables used in array reference expressions and propagates these constraints to surrounding array references to provide information that is unavailable due to linearization, inlining, and partial program analysis. The representation of constraints retains its precision when code is transformed by linearization and inlining. ARCH identifies aliasing of array variables from programming constructs, apart from symbolic analysis, and represents the aliasing using an explicit, interprocedural representation which retains precision even after inlining and linearization are applied. ARCH interprocedurally hoists alias relationships outside of loops enclosing subroutine call sites. The effectiveness of the PEBIL and ARCH techniques is evaluated by implementing them in Polaris. Compile-time metrics used to evaluate the techniques with three applications representative of industrial-grade applications consist of the number of loops parallelized, variables the compiler conservatively assumes possess cross-iteration data dependencies, variables the compiler is not able to determine value ranges for, and data dependencies safely proven false by data dependence tests. In all of these metrics, the combined application of the new enabling techniques achieves improvements over the base case, though in varying degrees depending on the coding style of the applications and the presence of array indirection. The techniques improve the compiler’s ability to parallelize loops by up to 46 percent and 17 percent on average. The techniques successfully decrease the number of variables the compiler conservatively assumes have cross-iteration dependencies by up to 13 percent and 8 percent on average. The techniques decrease the number of variables for which the compiler cannot determine value ranges by up to 84 percent and 66 percent on average. The number of data dependencies the compiler’s data dependence tests are able to successfully prove false increases by up to a factor of 5.5. Additionally, significant run-time speedups of four to eight on an eight-processor machine and three on a four-processor machine are achieved for a number loops that are parallelized by the PEBIL and ARCH techniques.

Degree

Ph.D.

Advisors

Eigenmann, Purdue University.

Subject Area

Computer Engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS