Detailed modeling and reliability estimation of fault-tolerant processor arrays

Noe Lopez-Benitez, Purdue University

Abstract

Recent advances in VLSI/WSI technology have led to the design of processor arrays with a large number of processing elements confined in small areas. The use of redundancy to increase fault-tolerance has the effect of reducing the ratio of area dedicated to processing elements over the area occupied by other resources in the array. The assumption of fault-free hardware support (switches, buses, interconnection links, etc.,), leads at best to conservative reliability estimates. However, detailed modeling entails not only an explosive growth in the model state space but also a difficult model construction process. To address the latter problem, a systematic method to construct Markov models for the reliability evaluation of processor arrays is proposed. This method is based on the premise that the fault behavior of a processor array can be modeled by a Stochastic Petri Net (SPN). However, in order to obtain a more compact representation, a set of attributes is associated with each transition in the Petri net model. This representation is referred to as a Modified Stochastic Petri Net (MSPN) model. A MSPN allows the construction of the corresponding Markov model as the reachability graph is being generated. The Markov model generated can include the effect of failures of several different components of the array as well as the effect of a peculiar distribution of faults when the reconfiguration occurs. Specific reconfiguration schemes such as Successive Row Elimination (SRE), Alternate Row-Column Elimination (ARCE) and Direct Reconfiguration (DR), are analyzed in detail. Randomization techniques are used to solve the inherently large models that can be generated via a MSPN representation. A model reduction technique based on the discrimination of states with low mean holding times is discussed. Finally, an analysis of hierarchical structures formed with variations of the schemes analyzed, is presented. The results reported in this thesis were obtained using MGRE (Model Generation and Reliability Evaluator) which is a software package designed to analyze fault-tolerant processor arrays for which a MSPN representation is given.

Degree

Ph.D.

Advisors

Fortes, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS