Performance Analysis and Tuning of Automatically Parallelized OpenMP Applications
Abstract
Automatic parallelization combined with tuning techniques is an alternative to manual parallelization of sequential programs to exploit the increased computational power that current multi-core systems offer. Automatic parallelization concentrates on finding any possible parallelism in the program, whereas tuning systems help identifying efficient parallel code segments and serializing inefficient ones using runtime performance metrics. In this work we study the performance gap between automatic and hand parallel OpenMP applications and try to find whether this gap can be filled by compile-time techniques or it needs dynamic or user-interactive solutions. We implement an empirical tuning framework and propose an algorithm that partitions programs into sections and tunes each code section individually. Experiments show that tuned applications perform better than original serial programs in the worst case and sometimes outperform hand-parallel applications. Our work is one of the first approaches delivering an auto-parallelization system that guarantees performance improvements for nearly all programs; hence it eliminates the need for users to “experiment” with such tools in order to obtain the shortest runtime of their applications.
Keywords
automatic parallelization, automatic tuning, performance evaluation
Date of this Version
2011
Comments
OpenMP in the Petascale Era Lecture Notes in Computer Science, 2011, Volume 6665/2011, 151-164