Exploiting new design tradeoffs in chip multiprocessor caches

Zeshan Ahmad Chishti, Purdue University

Abstract

Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to utilize the numerous on-chip transistors. Multiple CMP cores substantially increase the capacity pressure on the limited on-chip cache capacity while requiring fast data access. The lowest level on-chip CMP cache not only needs to utilize its capacity effectively but also has to mitigate the increased latencies due to slow wire delay scaling. Conventional shared and private caches can provide either capacity or fast access but not both. To mitigate wire delays in large lower-level caches, this thesis proposes a novel technique called Distance-Associativity, which employs non-uniform-access latency for widely-spaced cache subarrays. Distance associativity enables flexible placement of a core’s frequently-accessed data in the closest subarrays for fast access. To provide both capacity and fast access in CMP caches, this thesis makes the key observation that CMPs fundamentally reverse the latency-capacity tradeoff that exists in conventional symmetric multiprocessors (SMPs) and distributed shared memory multiprocessors (DSMs). While CMPs rely on limited on-chip cache capacity but fast on-chip communication, SMPs and DSMs have virtually unlimited cache capacity but slow offchip communication. To exploit this tradeoff reversal, this thesis proposes three novel mechanisms: (i) controlled replication, (ii) in-situ communication, and (iii) capacity stealing. This work also observes that commercial multithreaded programs exhibit substantial variations in capacity demands and communication behaviors. Optimizations using static replication thresholds such as controlled replication and in-situ communication cannot adapt to workload variations. To this end, this thesis proposes the use of dynamic replication thresholds in controlled replication and in-situ communication. Experimental results show that for a 4-core CMP with 8 MB cache, the proposed CMP-NuRAPID cache outperforms conventional shared caches by 20% and 33% in multithreaded and multiprogrammed workloads respectively.

Degree

Ph.D.

Advisors

Vijaykumar, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS