Runtime frameworks for heterogeneous parallel computing

Jacques A Pienaar, Purdue University

Abstract

Heterogeneous parallel computing platforms, which contain multiple Processing Units (PUs) with distinct architectures (such as multi-core processors and many-core accelerators), present the potential for significant improvements in performance and power efficiency. Applications from several important domains have been ported to heterogeneous platforms with significant performance improvements, making a compelling case for heterogeneous parallel computing. However, developers often cite the difficulty of creating high-performance programs as a significant challenge that must be addressed before the performance potential of heterogeneous parallel platforms can be realized in the mainstream. The challenge of programming heterogeneous platforms is being addressed through innovations in programming models and languages, optimizing compilers, and runtime frameworks. A large portion of these efforts has focused on lowering the complexity of programming many-core accelerators, leaving open the challenge of programming an ensemble of PUs in a unified manner. In this dissertation, we explore runtime frameworks that enable high performance programming of heterogeneous parallel platforms. Runtime frameworks relieve the programmer from the burden of performing tasks such as partitioning, mapping, and scheduling computations in the application to the hardware resources (PUs). On the other hand, they enable high performance executions by performing these decisions intelligently, considering information that is available at runtime. Thus, runtime frameworks address the dual challenges of improving performance while reducing programming effort. This dissertation makes the following contributions to runtime frameworks for heterogeneous parallel computing: • A unified Model Driven Runtime (MDR) for heterogeneous parallel platforms that allows programmers to develop an application without fixing the partitioning and scheduling of computations to the PUs of the target platform. MDR dynamically partitions and schedules tasks in the application onto the PUs and automates the transfer of data across their memory spaces. We identify four key criteria—Suitability of computations to PUs, Locality of data, Availability of PUs, and Criticality of computations (SLAC)—that must be considered by any heterogeneous runtime framework in order to achieve superior performance under varying application and platform characteristics. MDR considers all these factors and uses performance models to drive key run-time decisions such as mapping of tasks to PUs, scheduling of tasks, and copying of data between memory spaces. • Automatic Heterogeneous Pipelines (AHP), a framework to create optimized software pipelines for heterogeneous parallel platforms from unpipelined programs with simple programmer-specified annotations. Pipelining involves two key steps—stage identification, which partitions the application into pipeline stages, and mapping and scheduling the computations within the stages onto the PUs of the heterogeneous platform. We show that there exists a strong cyclic interdependency between the stage identification and scheduling steps, which must be considered in order to achieve good performance. AHP addresses this interdependency by performing stage identification and scheduling in an iterative and interleaved manner, thereby enabling the creation of high performance software pipelines. • Heterogeneous Work Stealing (HWS), a work stealing runtime framework for heterogeneous platforms. Work stealing is a popular runtime scheduling strategy for parallel platforms. It achieves load-balanced execution by allowing parallel workers or threads to steal work from each other. Work stealing has been shown to be highly efficient on homogeneous parallel platforms and is widely used in commercial runtime frameworks. For heterogeneous platforms, we show that conventional work stealing can result in poor performance since it does not consider the suitability of tasks to the PUs on which they are executed. We propose two techniques, Task-type Balanced Stealing and Speed Proportional Stealing, which significantly improve the performance of work stealing on heterogeneous parallel platforms. We implement the proposed runtime techniques by building upon existing parallel programming frameworks such as Intel TBB and NVIDIA CUDA. We evaluate the developed runtime frameworks on several heterogeneous platforms ranging from netbooks to laptops and servers, all of which contain multi-core processors and many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs) and Intel's MIC. Our results establish the potential of the proposed runtime techniques in achieving superior performance while eschewing the extensive programming effort involved in manually tuning applications to heterogeneous platforms.

Degree

Ph.D.

Advisors

Raghunathan, Purdue University.

Subject Area

Computer Engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS