Fast and robust convolutional neural networks optimized for embedded platforms

Jonghoon Jin, Purdue University

Abstract

Visual intelligence is the ability to recognize objects with their internal representations. These representations are used to produce semantics or understand relations among the objects. Human visual cortex has been known to be a superior system to any kinds of artificial vision systems and many researchers have improved machine vision systems by taking inspiration from it. Recently, bio-inspired vision such as convolutional neural networks (CNNs) have shown great promise in visual understanding. The algorithms become a record holder on many visual understanding tasks, but it is challenging to deploy them on embedded platforms where processing and power budget are limited. This proposal describes three methods to improve CNNs in terms of accuracy and runtime, which facilitate their applicability on embedded devices. Firstly, we improve robustness of CNNs against strong noise by utilizing uncertainty information. With the uncertainty noise added to the networks, convolution, max-pooling, and ReLU layers are modified to benefit from the noise model accordingly. Since the proposed model makes decision based on the uncertainty region instead of point-wise prediction, it alleviates the need for averaging multiple model predictions used to stabilize output. Secondly, we propose a filter separation technique for CNNs in order to accelerate their runtime as well as to reduce their memory usage. The redundancy of the parameters, especially weights of the convolutional filters in CNNs has been extensively studied and different heuristics have been proposed to construct a low rank basis of the filters after training. We propose a network architecture that consists of consecutive sequence of one-dimensional filters across all directions in 3D space. The proposed network not only achieves comparable performance as conventional CNNs, but also significantly reduces the redundancy of the parameters in CNNs. The proposed convolution pipeline provides around two times speed-up during feedforward pass compared to the baseline model due to the significant reduction of learning parameters. Furthermore, it does not require additional efforts in manual tuning or post processing once the model is trained. Lastly, a hardware implementation for CNNs is proposed to accelerate its computation. While CNNs produce hundreds of intermediate results whose constant memory accesses result in inefficient use of general purpose processor hardware, the custom hardware with efficient routing strategy maximizes hardware utilization and obtains high performance in real world applications. The proposed coprocessor is capable of a peak performance of 40 G-ops/s while consuming less than 4 W of power. All of those methods optimize the algorithm for embedded devices thereby making it feasible to use in various mobile platforms.

Degree

Ph.D.

Advisors

Culurciello, Purdue University.

Subject Area

Computer Engineering|Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS