Exploring and evaluating control -flow and thread -level parallelism

Chen-Yong Cher, Purdue University

Abstract

Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlapping multiple threads in a wide-issue superscalar processor. While trace cache, value prediction, and prefetching are shown to be effective in improving single-thread performance in the superscalar, there has been no analysis of these techniques in an SMT processor. SMT brings new factors both for and against these techniques, and it is not known how these techniques would fare in SMT. Therefore, I evaluate these techniques in an SMT. My key contributions are: (1) I identify a fundamental interaction between the techniques and SMT's sharing of resources among multiple threads, and (2) I quantify the impact of this interaction on SMT throughput. SMT's sharing of the instruction storage, physical registers, and issue queue impacts the effectiveness of trace cache, value prediction, and prefetching, respectively. My simulations show that (1) compared to a similar-sized i-cache, trace cache degrades throughput; (2) with a typical number of physical registers, value prediction degrades throughput; and (3) For memory-intensive workloads, prefetching improves throughput with many threads. For workloads with mixed memory demand, prefetching has little opportunity and slightly degrades throughput. Although modern superscalar processors achieve high branch prediction accuracy, certain branches either are inherently difficult to predict or incur destructive interference in prediction tables, causing significant performance loss due to mispredictions. I propose a novel micro-architecture, called Skipper, to handle such difficult branches by exploiting control-flow independence. Skipper altogether avoids incorrect instructions by skipping over, without even fetching, the control-flow dependent computation conditioned by a difficult branch. Instead, Skipper fetches and executes the control-flow independent instructions which need to be executed irrespective of the branch's outcome. Because Skipper executes the correct control-flow dependent instructions after the difficult branch is resolved, it conserves the valuable resources. Skipper is the first proposal to exploit control-flow independence by skipping over control-flow dependent computation in a superscalar pipeline. Skipper fetches the skipped control-flow dependent instructions after the difficult branch is resolved, out of program order. SPECint95 simulations shows that Skipper performs 10% and 8% better than super-scalar and previously proposed Polypath, respectively, when all three micro-architectures have equal i-cache bandwidth and hardware resources.

Degree

Ph.D.

Advisors

Vijaykumar, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS