Improving cache performance by smart page mapping in application programs

Rong Xu, Purdue University


This thesis studies the use of software methods to improve memory performance in a heterogeneous cache system. In particular, it studies a cache system that consists of two data caches, the main cache and the mini cache, at the same level of the cache hierarchy. Both caches may differ in size, associativity, and replacement policy. Furthermore, this cache system allows application programs to specify the caching policy for each virtual page among three choices, i.e. whether to map the page to the main cache, the mini-cache, or neither. In the latter case, the memory accesses to this page bypass both caches. This process of specifying the mapping between the caching policies and the virtual pages is called cache mapping. This thesis investigates the problem of optimal cache mapping, assuming that we can predict the trace of the memory reference in advance. The practical effectiveness of such cache mapping will depend on the degree of accuracy of such prediction. However, even assuming perfect prediction, cache mapping faces complex and important issues, which are the focus of this thesis. On the theoretical side, we prove that the problem of finding the optimal cache mapping for an arbitrary memory trace is NP-hard. On the experimental side, we propose two techniques to perform the cache mapping. The first technique is a heuristic algorithm which is applied to the entire memory trace. The cache mapping problem is reduced to an approximated problem and then transformed to a network flow problem which can be solved in polynomial time. The second technique is based on trace sampling. Under the proposed sampling framework, a number of important loops are automatically selected for sampling. The optimal cache mapping for the simplified memory trace is obtained by solving an Integer Linear Programming (ILP) problem. Both techniques are implemented for two real world PDAs, namely the Compaq iPAQ 3650 which has an Intel StrongARM SA-1110 processor core, and the Compaq iPAQ 3950 which has an Intel XScale processor core. We report the experimental results including performance improvements, memory access statistics, and energy consumption savings.




Li, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server