Generally applicable techniques for improving locality in irregular programs, which operate over pointer-based data structures such as trees and graphs, are scarce. Focusing on a subset of irregular programs, namely, tree traversal algorithms like Barnes-Hut and nearest neighbor, recent work has proposed point blocking, a technique analogous to loop tiling in regular algorithms, to improve locality. However, point blocking requires that programs be “pre-optimized” using application-specific techniques to be effective. In this work, we identify the root cause of point blocking’s poor performance on baseline irregular algorithms, and propose traversal splicing, a new, general, automatic locality optimization for irregular tree traversal codes, that addresses these drawbacks. For four benchmark algorithms, we show that traversal splicing can deliver substantial single-thread performance improvements of up to 338% (geometric mean: 138%) over baseline implementations, and up to 112% (geometric mean: 77%) over point-blocked implementations. Further, we show that in many cases, applying traversal splicing to a baseline implementation yields performance that is competitive with carefully hand-optimized implementations.
Date of this Version