Accuracy and Performance Improvements in Custom CNN Architectures
Convolutional Neural Networks (CNNs) are biologically inspired feed forward artificial neural networks. The artificial neurons in CNNs are connected in a manner similar to the neurons in the mammalian visual system. CNNs are currently used for image recognition, semantic segmentation, natural language processing, playing video games and many other applications. A CNN can consist of millions of neurons that require billions of computations to produce a single output. Currently CNN workloads are accelerated by GPUs. While fast, GPUs are power hungry and are not feasible in mobile and embedded applications like car, home automation, etc. Recently interest has surged in developing FPGA/ASIC based novel architectures for processing using CNNs in real time while keeping the power budget low. The current generation of custom architectures utilize either single or half precision floating point or 16bit Q8.8 fixed point processing elements. However, floating point hardware is larger and slower than fixed point hardware, especially in FPGAs, which have dedicated fixed point units but no floating point units. Due to this, choosing a number format becomes a performance vs accuracy tradeoff. In this work, we aim to test various number representation schemes and their effect on the bandwidth requirements and accuracy of a custom CNN architecture. We also test architectural changes that improve the throughput of the processing elements. Together, we expect to improve both accuracy and performance of a custom CNN accelerator architecture. The system is prototyped on the Xilinx Zynq Series Devices.
Culurciello, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our