Chip multiprocessors with on-chip aggregate function network
Abstract
State-of-the-art on-chip networks and block-based cache coherence protocols used in cache-coherent shared-memory Chip MultiProcessors (CMPs) are inefficient for collective operations across cores. Performance of CMPs can be seriously degraded by the multitude of memory requests and coherence messages required to implement each collective operation. This thesis presents a CMP-AFN architecture and Instruction Set Architecture (ISA) extensions that augment a conventional shared-memory CMP with a tightly-integrated Aggregate Function Network (AFN) that implements low-latency collective operations without using or interfering with the memory hierarchy. For a modest increase in circuit complexity, traffic within a CMP’s internal network is dramatically reduced, improving the performance of caches and reducing power consumption. Full system simulations of 16-core CMPs show a CMP-AFN outperforms the reference design significantly, eliminating up to 52% of memory accesses and up to 73% of private L1 data cache misses in both the EPCC OpenMP microbenchmarks and SPEC OMP benchmarks.
Degree
Ph.D.
Advisors
Dietz, Purdue University.
Subject Area
Electrical engineering|Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.