Implicitly multithreaded processors

Il Park, Purdue University

Abstract

Simultaneous Multithreading (SMT) is proposed to improve pipeline throughput by overlapping execution of multiple threads. However, SMT cannot improve single-thread performance. To improve single-thread performance, I propose the Implicitly-Multi-Threaded (IMT) architecture to execute compiler-specified speculative threads on to a modified SMT pipeline. IMT reduces hardware complexity by relying on the compiler to select suitable thread spawning points and to orchestrate inter-thread register communication. This study shows that a naive mapping of even optimized compiler-specified threads onto SMT performs only comparably to an aggressive superscalar; a naive IMT (N-IMT) inefficiently shares SMT's resources among threads irrespective of resource availability, thread resource usage, and inter-thread dependence. Optimized IMT (O-IMT) proposes key microarchitectural optimizations to alleviate these inefficiencies in N-IMT. I propose three primary optimizations and two secondary optimizations. The three primary optimizations are: (1) resource- and dependence-based fetch policy to fetch and execute suitable instructions, (2) context multiplexing to improve utilization and map as many threads to a single context as allowed by availability of resources, and (3) early thread-invocation to hide thread start-up overhead by overlapping one thread's invocation with other threads' execution. Two secondary optimizations are: (1) speculatively releasing register values to avoid the implementation and performance issues of N-IMT's thread-level squashing and (2) two-phase commit to reduce register pressure by freeing some registers at instruction commit, before the thread commits. Using SPEC2K benchmarks and execution-driven simulation, this study shows the performance comparison among an aggressive superscalar, N-IMT, O-IMT, previously-proposed Threaded Multipath Execution (TME), and Dynamically MultiThreaded (DMT) Processors. The results indicate that N-IMT outperforms DMT, but outperforms neither an aggressive superscalar nor TME. With three primary microarchitectural mechanisms, O-IMT improves performance by considerable speed-up over an aggressive superscalar and TME. Even though two secondary optimizations do not increase the O-IMT's speed-up significantly on average, they significantly improve some specific benchmarks' performance.

Degree

Ph.D.

Advisors

Vijaykumar, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS