Exploiting instruction level parallelism with the DS architecture

Yinong Zhang, Purdue University

Abstract

DS is a new microarchitecture that combines decoupled (DAE) and superscalar techniques to exploit instruction level parallelism. Programs for DS are compiled into two instruction substreams: the dominant substream navigates the control flow and the rest of computational task is shared between the dominant and subsidiary substreams. Each substream is processed by a separate superscalar core realizable with current VLSI technology. The implementation complexity usually found in a monolithic superscalar processor can be decentralized in each superscalar core. DS machines are binary compatible with superscalar machines having the same instruction set, and DS machines in a family are binary compatible. Three special instructions, BRQ, GETQ, and PUTQ, are required to support the DS architecture and enforce correct program semantics. To coordinate the out-of-order execution in the two substreams of DS, two symmetrical indexed data queues are introduced to carry out inter-substream data communication, and a branch queue is introduced to carry out inter-substream control flow synchronization. All queues can be implemented with a moderate amount of hardware using existing technologies. The run time behavior of DS can be explained by an analytical model based on queueing theory. This model inspires the code partitioning algorithm necessary for DS to achieve its full run time performance potential. A novel technique for controlling slip between substreams is introduced. The compiler issues of instruction count balancing, residence time balancing, and data-copying instruction elimination, important to any split-stream scheme, are discussed. Other important issues, such as extra instruction insertion, FIFO enforcement, and deadlock prevention are also examined. Performance is compared with an aggressive superscalar processor. On average, DS delivers approximately the same level of performance. This is achieved with less complex hardware and better potential for fast clock rates.

Degree

Ph.D.

Advisors

Adams, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS