Abstract

This paper presents a novel compiler algorithm for selecting program slices that prefetch load values concurrently with program execution. The algorithm is evaluated in the context of an intelligent memory system. The architecture consists of a main processor and a simple memory processor. The intelligent memory system pre-executes program slices and forwards values of critical loads to the main processor ahead of their use. The compiler algorithm selects program slices for memory processor execution, and inserts synchronization instructions that synchronize main and memory processors. Experimental results of the generated code on a cycle-accurate simulator show a speedup of up to 1.33 (1.13 on average) over an aggressively latency-optimized system running fully optimized code.

Date of this Version

January 2003

Download

COinS

Department of Electrical and Computer Engineering Technical Reports

An Algorithm for Register-Synchronized Precomputation In Intelligent Memory Systems

Abstract

Date of this Version

Search

Links

Links for Authors

Browse

Department of Electrical and Computer Engineering Technical Reports

An Algorithm for Register-Synchronized Precomputation In Intelligent Memory Systems

Authors

Abstract

Date of this Version

Share

Search

Links

Links for Authors

Browse