Department of Electrical and Computer Engineering Technical Reports

General Transformations for GPU Execution of Tree Traversals

Abstract

With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU memory hierarchy is well-understood for regular applications that operate on dense data structures such as arrays and matrices. However, there has been significantly less work in the area of irregular algorithms and even less so when pointer-based dynamic data structures are involved. Recently, irregular algorithms such as Barnes-Hut and kd-tree traversals have been implemented on GPUs, yielding significant performance gains over CPU implementations. However, the implementations often rely on exploiting application-specific semantics to get acceptable performance. We argue that there are general-purpose techniques for implementing irregular algorithms on GPUs that exploit similarities in algorithmic structure rather than application-specific knowledge. We demonstrate these techniques on several tree traversal algorithms, achieving speedups of up to 38_ over 32-thread CPU versions.

Keywords

vectorization, tree traversals, GPU, irregular programs

Date of this Version

2013

Download

Included in

Other Electrical and Computer Engineering Commons

COinS

Department of Electrical and Computer Engineering Technical Reports

General Transformations for GPU Execution of Tree Traversals

Abstract

Keywords

Date of this Version

Included in

Search

Links

Links for Authors

Browse

Department of Electrical and Computer Engineering Technical Reports

General Transformations for GPU Execution of Tree Traversals

Authors

Abstract

Keywords

Date of this Version

Included in

Share

Search

Links

Links for Authors

Browse