In modern architectures, memory access latency is an increasingly performance-limiting factor. To reduce this latency, we propose concepts and implementation of a new technique that uses an in-memory processor to precompute future, critical load addresses and forward the computed values to the main processor. The acronym for this technique is IMPT for In-Memory Precomputation-based forwarding Threads. IMPT combines the advantages of precomputationbased techniques with the low memory access latency of processing-in-memory. To evaluate IMPT, we use a cycle-accurate simulation of an aggressive out-of-order processor with accurate simulation of bus and memory contention. The results show a performance gain of up to 1.47 (1.21 on average) over an aggressive superscalar processor. The average load access latency decreases by up to 55% (32% on average).
Date of this Version