Aspects of reconfigurable parallel processing systems: Architecture, interconnection, and task allocation
Approaches for providing communications among the processors and memories of large-scale parallel processing systems are often based on the multistage cube and data manipulator topologies. One goal of this research is to provide system designers with the tools and methods to use in deciding which of these two topologies performs best on the basis of a set performance and cost criteria for a given set of implementation parameters. A technique for studying buffered multistage networks is described that is more efficient than simulation techniques and more tractable than analytical methods. The technique is applied to the dilated multistage cube and data manipulator topologies. One interesting characteristic of the data manipulator topology is the existence of multiple disjoint paths through the network for some source and destination combinations. Properties of disjoint paths are derived and used to point out advantages and limits of this characteristic. The organization of the PASM parallel processing system is overviewed and an efficient masking technique for large-scale microprocessor-based SIMD architectures is presented. SIMD architectures require mechanisms that efficiently enable and disable (mask) processors to support flexible programming. Most current SIMD architectures incorporate masking logic that allows them to disable themselves based on data conditional results calculated at the processor's level (local masking). Global processor masks, specified by the control unit, are more efficient for tasks where the masking is data independent. An efficient hybrid masking technique is proposed that supports global as well as local masking. A design for the proposed hybrid mechanism is described. One benefit of partitionable systems is their ability to execute independent tasks simultaneously. Previous work has identified conditions such that, when there are k tasks to be processed, partitioning the system such that all k tasks are processed simultaneously minimizes overall execution time. This result, however, assumes that execution times are data independent. It is shown that data-dependent tasks do not necessarily execute faster when processed simultaneously even if the condition is met. A model is developed that provides for the possible variability of a task's execution time and is used in a framework to study the problem of finding an optimal mapping for identical, independent data-dependent execution time tasks onto partitionable systems.
Siegel, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our