Optimizations for High Performance Computing on General Purpose GPUS
High performance Computing is increasingly being done on parallel machines like GPUs. In my work, I deal with 2 major kinds of optimizations: block size tuning and mixed precision tuning. Block size tuning involves selecting an optimal block size for CUDA kernels, where threads of execution are grouped into blocks. Earlier techniques for this involve running Autotuning, which involves multiple kernel executions; and Nvidia's Occupancy Calculator, which gives multiple possible solutions, none of which might be the actual optimal. My technique uses an SVR based on static kernel features as well as dynamic features to predict an optimal block size. This is then evaluated for 89 kernels from 10 different applications. The second optimization is mixed precision tuning, where I adjust the datatype of datatype of various variables in two CUDA applications in order to obtain a speedup without exceeding some user specified threshold. Here I built a static performance model to calculate the effects of changes of various variable types and used that in a genetic search algorithm in conjunction with an execution filter to reduce the number of program executions, which are needed to determine the error. This was evaluated on two different GPU programs.
Bagchi, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our