Efficient and Consistent Convolutional Neural Networks for Computer Vision
Abstract
Convolutional Neural Networks (CNNs) are machine learning models that are commonly used for computer vision tasks like image classification and object detection. State-of-the-art CNNs achieve high accuracy by using many convolutional filters to extract features from the input images for correct predictions. This high accuracy is achieved at the cost of high computational intensity. Large, accurate CNNs typically require powerful Graphics Processing Units (GPUs) to train and deploy, while attempts at creating smaller, less computationallyintense CNNs lose accuracy. In fact, maintaining consistent accuracy is a challenge for even the state-of-the-art CNNs. This presents a problem: the vast energy expenditure demanded by CNN training raises concerns about environmental impact and sustainability, while the computational intensity of CNN inference makes it challenging for low-power devices (e.g. embedded, mobile, Internet-of-Things) to deploy the CNNs on their limited hardware. Further, when reliable network is limited or when extremely low latency is required, the cloud cannot be used to offload computing from the low-power device, forcing a need to research methods to deploy CNNs on the device itself: to improve energy efficiency and mitigate consistency and accuracy losses of CNNs.This dissertation investigates causes of CNN accuracy inconsistency and energy consumption. We further propose methods to improve both, enabling CNN deployment on low-power devices. Our methods do not require training to avoid the high energy costs associated with training.To address accuracy inconsistency, we first design a new metric to properly capture such behavior. We conduct a study of modern object detectors to find that they all exhibit inconsistent behavior. That is, when two images are similar, an object detector can sometimes produce completely different predictions. Malicious actors exploit this to cause CNNs to mispredict, while image distortions caused by camera equipment and natural phenomena can also cause mispredictions. Regardless the cause of the misprediction, we find that modern accuracy metrics do not capture this behavior, and we create a new consistency metric to measure the behavior. Finally, we demonstrate the use of image processing techniques to improve CNN consistency on modern object detection datasets.To improve CNN energy efficiency and reduce inference latency, we design the focused convolution operation. We observe that in a given image, many pixels are often irrelevant to the computer vision task – if the pixels are deleted, the CNN can still give the correct prediction. We design a method to use a depth mapping neural network to identify which pixels are irrelevant in modern computer vision datasets. Next, we design the focused convolution to automatically ignore any pixels marked irrelevant outside the Area of Interest (AoI). By replacing the standard convolutional operations in CNNs with our focused convolutions, we find that ignoring those irrelevant pixels can save up to 45% energy and inference latency.Finally, we improve the focused convolutions, allowing for (1) energy-efficient, automated AoI generation within the CNN itself and (2) improved memory alignment and better utilization of parallel processing hardware. The original focused convolution required AoI generation in advance, using a computationally-intense depth mapping method. Our AoI generation technique automatically filters the features from the early layers of a CNN using a threshold. The threshold is determined using an Accuracy vs Latency curve search method. The remaining layers will apply focused convolutions to the AoI to reduce energy use. This will allow focused convolutions to be deployed within any pretrained CNN for various observed use cases. No training is required.
Degree
Ph.D.
Advisors
Lu, Purdue University.
Subject Area
Artificial intelligence|Computer science|Energy|Information Technology|Internet and social media studies
Off-Campus Purdue Users:
	To access this dissertation, please log in to our
	proxy server.
 
				