Parallel computers constructed using conventional processors offer the potential to acheve large improvements in execution speed at reasonable cost, however, these machlnes tend to efficiently implement only coarse-grain MIMD parallelism. To achieve the best possible speedup through parallel execution, a computer must be capable of effectively using all the different types of parallelism that exist in each program. A combination of SIMD, VLW. and MIMD parallelism, at a variety of granularity levels, exists in most applications; thus, hardware that can support multiple types of parallelism can achieve better performance with a wider range of codes. In the companion paper [CoD94], we present a new hardware barrier architecture that provides the full DBM functionality we discussed in [OKD90], but can be implemented with much simpler hardware. In this paper, we show how this mechanism can be used to efficiently support multi-mode moderate-width parallelism with instruction-level granularity (i.e., synchronization cost is approximately one LOAD instruction).
Dynamic Bamer Synchronization, MIMDNLIWISIMD Mixed-Mode Computation, Execution Models, MIMD/SIMD Coding Strategies, Instruction-Level Parallelism
Date of this Version