Digital signal processors as HPC accelerator and performance tuning via static analysis and machine learning
Heterogeneous systems today employ GPUs to accelerate parallelizable sections of the application. Lately, multicore digital signal processors (DSPs) have been demonstrated as energy-efficient, low-latency accelerators. The Keystone II is a SoC containing quad ARM CPU cores and octa DSP cores with high floating point performance but with far less power consumption compared to GPUs. Programming of heterogeneous systems often requires the need of optimal block size for the kernel to give optimal performance of the application. In this thesis, we develop a novel model STATuner for prediction of optimal block size using static code analysis and machine learning. We identify a set of static metrics that can be used to characterize a kernel and cluster them. We then build a classifier model that can be used to predict a kernel’s optimal block size. We use a set of representative kernels to train our model and then we can predict the optimal block size for a new kernel that falls in any of the cluster of kernels from our training set. STATuner models yield atleast 20% more accurate prediction than competitor techniques. We suggest techniques to accelerate real-time image analytics applications on Keystone II using MSMC memory and customised libraries. We compare the performanceper-watt of Keystone II and GPU and achieve 5X better energy efficieny than the GPU for image processing applications. We see that the Keystone II SoC has great potential to serve as an energy efficient, large-scale computing unit for real-time processing of image analytics workloads.
Lin, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our