Compiler, fault tolerance, and performance prediction aspects of reconfigurable parallel processing systems

Gene Saghi, Purdue University

Abstract

Several parallel parallel processing systems exist that can be partitioned and/or can operate in multiple modes of parallelism (and can switch between these modes in an efficient fashion). This work addresses compiler, fault tolerance, and performance prediction aspects relevant to the full exploitation of the capabilities of such machines. The contributions made by this research can be applied toward the overall goal of the efficient use of reconfigurable systems. By overlapping the operation of the control unit and processing elements within an SIMD machine, the execution times of SIMD programs can be reduced. An architectural model for achieving overlapped operation in machines with instruction sets consisting of instructions with varying word lengths and execution times is presented. A framework for obtaining a balanced work load between the control unit and processing elements is provided. The maximization by compiler of the overlap of the operation of the machine control unit with that of the processing elements is examined. The use of dynamic reconfiguration has been proposed to tolerate faults in large-scale partitionable parallel processing systems. If a processor develops a permanent fault during the execution of a task on a submachine A, three recovery options are migration of the task to another submachine, task migration to a subdivision of A, and redistribution of the task among the fault-free processors in A. Quantitative models of these reconfiguration schemes are developed to consider what information is needed to make a choice among these methods for a practical implementation. It will be pointed out that in certain situations collecting precise values for all needed parameters will be very difficult. Therefore, the model parameters are analyzed, together with the cost of making the wrong reconfiguration choice, to determine a useful heuristic that is based on the information available. A multistage cube or hypercube inter-processor network is assumed. A discussion of how the cyclic reduction algorithm can be mapped onto the MasPar MP-1, nCUBE 2, and PASM parallel processing systems is presented. Each of these represents a different mode of parallelism that can be used in the design of parallel machines. Cyclic reduction, a known approach for the parallel solution of tridiagonal and block tridiagonal systems of equations, is the vehicle used to explore mapping in this study. Specific issues addressed here are SIMD/MIMD trade-offs, the effect on execution time of increasing the number of processors used, the impact of the inter-processor communications network on performance, the importance of predicting algorithm performance as a function of the mapping used, and the advantages of a partitionable system. Analytical results are validated by experimentation on all three machines.

Degree

Ph.D.

Advisors

Siegel, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS