Speculative distributed shared -memory multiprocessors organized as processor -and -memory hierarchies

Renato Jansen O Figueiredo, Purdue University

Abstract

Area-efficiency arguments motivate heterogeneity in the design of future multiprocessors. This thesis proposes a novel Heterogeneous Distributed Shared-Memory (HDSM) architecture that is organized as a processor-and-memory hierarchy. The top level of the hierarchy has few instruction-level parallel (ILP) processors with large on-chip caches for fast execution of sequential codes. Lower levels employ a larger number of simpler processors with smaller individual caches of chip-multiprocessors (CMPs) for efficient execution of parallel codes. This thesis analyzes the proposed organization quantitatively to (1) determine its performance relative to conventional machines, (2) provide HDSM design guidelines based on next-generation ILP and CMP technologies, and (3) investigate its performance under conventional and speculative programming models. Extensive simulation analyses consider 3-level, 4-node instances of the hierarchy. A comparison to a conventional DSM with equal silicon area shows that the hierarchical design outperforms the homogeneous counterpart for explicitly parallel (by 37% on average for 10 benchmarks from the SPLASH-2 suite) and parallelized applications (speedups ranging from 10% to 110% for 4 benchmarks from the Spec95 and NAS suites). A sensitivity analysis shows that support for hardware multithreading in top-level processors improves the performance of parallel workloads (by 15% on average). Another analysis uses a factorial design experiment to determine the relative impact of heterogeneity on performance, and concludes that the organization has low sensitivity (15%) to the speed of memories in the bottom level. The performance analyses consider the execution of unmodified applications programmed in a model typically supported by conventional shared-memory multiprocessors: single-program, multiple-data (SPMD). This thesis proposes three static assignment policies of SPMD tasks to heterogeneous processors and analyzes their performance for applications that exhibit (1) explicit parallelism in the form of homogeneous threads, and (2) implicit loop-level parallelism automatically detected by a compiler. In addition, this thesis proposes a novel hardware-based data-dependence speculation technique for DSMs that allows a compiler to relax the constraint of data-independence to issue SPMD tasks in parallel. It is the first coarse-grain DSM data-dependence speculation technique to extend conventional directory-based coherence protocols to support application-transparent speculative versions of shared-memory blocks. A simulation-based evaluation shows that the proposed mechanism allows for automatic extraction of thread-level parallelism from sequential programs with irregular data structures and inherent coarse-grain parallelism (windows of millions of instructions, and working sets of hundreds of KBytes) that cannot be detected statically. This analysis shows that speculatively parallelized programs from the Olden suite execute with higher performance (parallel speedups of up to 6.8) in the thread-parallel levels of the HDSM hierarchy than in aggressive instruction-parallel uniprocessors.

Degree

Ph.D.

Advisors

Fortes, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS