The use of dynamic reconfiguration has been proposed to tolerate faults in large-scale partitionable parallel processing systems. If a processor develops a permanent fault during the execution of a task on a submachine A, three recovery options are migration of the task to another submachine, task migration to a subdivision of A, and redistribution of the task among the fault-free processors in A. Quantitative models of these reconfiguration schemes are developed to consider what information is needed to make a choice among these methods for a practical implementation. It is pointed out that in certain situations collecting precise values for all needed parameters is very difficult. Therefore, the model parameters are then analyzed, together with the cost of making the wrong reconfiguration choice, to determine a useful heuristic that is based on the information available. A multistage cube or hypercube inter-processor network is assumed. PASM, an experimental SIMD/MIMD mixed-mode machine with a partitionable multistage cube communication network, and nCUBE 2, a commercially available MIMD machine with a partitionable hypercube communication network, are used as vehicles for studying the model parameters.
Date of this Version