Both conventional wisdom and engineering practice hold that a massively parallel MIMD machine should be constructed using a large number of independent processors and an asynchronous interconnection network. In this paper, we suggest that it may be beneficial to implement a massively parallel MIMD using microcode on a massively parallel SIMD microengine; the synchronous nature of the system allows much higher performance to be obtained with simpler hardware. The primary disadvantage is simply that the SIMD microengine must serialize execution of different types of instructions - but again the static nature of the machine allows various optimizations that can minimize this detrimental effect. In addition to presenting the theory behind construction of efficient MIMD machines using SIMD microengines, this paper discusses how the techniques were applied to create a 16,384- processor shared memory barrier MIMD using a SIMD MasPar MP-1. Both the MIMD structure and benchmark results are presented. Even though the MasPar hardware is not ideal for implementing a MIMD and our microinterpreter was written in a high-level language (MPL), peak MIMD performance was 280 MFLOPS as compared to 1.2 GFLOPS for the native SIMD instruction set. Of course, comparing peak speeds is of dubious value; hence, we have also included a number of more realistic benchmark results.
MIMD, SIMD, Microcode, Compilers, Common Subexpression Induction.
Date of this Version