"Implicitly multithreaded processors" by Il Park
 

Implicitly multithreaded processors

Il Park, Purdue University

Abstract

Simultaneous Multithreading (SMT) is proposed to improve pipeline throughput by overlapping execution of multiple threads. However, SMT cannot improve single-thread performance. To improve single-thread performance, I propose the Implicitly-Multi-Threaded (IMT) architecture to execute compiler-specified speculative threads on to a modified SMT pipeline. IMT reduces hardware complexity by relying on the compiler to select suitable thread spawning points and to orchestrate inter-thread register communication. This study shows that a naive mapping of even optimized compiler-specified threads onto SMT performs only comparably to an aggressive superscalar; a naive IMT (N-IMT) inefficiently shares SMT's resources among threads irrespective of resource availability, thread resource usage, and inter-thread dependence. Optimized IMT (O-IMT) proposes key microarchitectural optimizations to alleviate these inefficiencies in N-IMT. I propose three primary optimizations and two secondary optimizations. The three primary optimizations are: (1) resource- and dependence-based fetch policy to fetch and execute suitable instructions, (2) context multiplexing to improve utilization and map as many threads to a single context as allowed by availability of resources, and (3) early thread-invocation to hide thread start-up overhead by overlapping one thread's invocation with other threads' execution. Two secondary optimizations are: (1) speculatively releasing register values to avoid the implementation and performance issues of N-IMT's thread-level squashing and (2) two-phase commit to reduce register pressure by freeing some registers at instruction commit, before the thread commits. Using SPEC2K benchmarks and execution-driven simulation, this study shows the performance comparison among an aggressive superscalar, N-IMT, O-IMT, previously-proposed Threaded Multipath Execution (TME), and Dynamically MultiThreaded (DMT) Processors. The results indicate that N-IMT outperforms DMT, but outperforms neither an aggressive superscalar nor TME. With three primary microarchitectural mechanisms, O-IMT improves performance by considerable speed-up over an aggressive superscalar and TME. Even though two secondary optimizations do not increase the O-IMT's speed-up significantly on average, they significantly improve some specific benchmarks' performance.

Degree

Ph.D.

Advisors

Vijaykumar, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS