Systematic optimization of basic linear algebra computations for distributed-memory systems
Abstract
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array design are based on linear transformations of loop iteration spaces. Additional important optimizations and designs are possible by using modular mappings, which are described by linear transformations modulo a constant vector. In this thesis, necessary and sufficient conditions for modular mappings to be one-to-one are investigated for rectangular domains of arbitrary dimensions. This thesis also identifies and characterizes a class of (BLAS-like) algorithms that can be optimized for parallel execution by modular mappings. To reduce communication overheads, this thesis provides conditions of data alignments and partitioning that allow perfect alignment between computation and data. Subsequently, techniques are provided to generate modular mappings that satisfy these conditions. Cannon's algorithm for parallel matrix multiplication is generalized to multiply input matrices that are block-cyclic distributed across a two-dimensional processor array with an arbitrary number of processors. This generalization is based on modular mappings Experimental results show that generalized Cannon's algorithm performs better than previous work (SUMMA) for parallel matrix multiplication. Within the framework of modular mappings, this thesis addresses the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The main feature of DDI programs is that the input data distribution is not fixed but is instead determined by parameters. When DDI matrix multiplication programs are used in an algorithm with multiple matrix products, half of the data redistributions otherwise required can be eliminated.
Degree
Ph.D.
Advisors
Fortes, Purdue University.
Subject Area
Computer science|Electrical engineering|Systems design
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.