Dynamic trace analysis for performance evaluation of superscalar processors

Ray A. Kamin, Purdue University

Abstract

The key to achieving a high performance/cost ratio for microprocessors is identifying performance bottlenecks and making insightful tradeoffs early in the design process. However, as the design complexity increases with each processor generation, the combinatorial explosion of tradeoffs prevents designers from fully exploring the design space. This dissertation investigates both analytic and simulation-based techniques for performance evaluation. A survey of the existing literature, techniques for exploiting fine-grain parallelism, and processors design choices are presented. Performance is evaluated within a framework of multiple instruction issue, speculative execution, dynamic scheduling, and finite scope of concurrency detection. A trace-driven superscalar simulator based on the SPARC architecture was developed to support this work. To study performance, a method of analyzing dynamic instruction traces to characterize program parallelism is introduced. This technique allows performance evaluation of many architectural variations with a single execution pass through an application or benchmark of interest. A parameter is introduced to quantify the available parallelism within programs versus the scope of concurrency detection. This dissertation also investigates the feasibility of sophisticated dynamic scheduling algorithms for finite-resource processors. The simulation tools are used to analyze the performance of five dynamic scheduling algorithms under increasing scopes of concurrency detection. A unique pipelining approach is introduced to address implementation limitations. The results for six benchmarks are presented. The trace length of many contemporary benchmarks requires an inordinate amount of computer resources to fully simulate each design variation. To reduce the compute time, frequently researchers simulate a fraction of the total instruction trace. Currently, the minimum trace length required to adequately characterize an application is an open question. We examine the variation in simulation results produced by reduced instruction traces for many integer and floating point SPEC92 benchmarks. We present statistical insight into minimizing simulation time while maintaining confidence that the results characterize the overall benchmark. Included are time-variant analysis of basic block length and instruction-level parallelism.

Degree

Ph.D.

Advisors

Adams, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS