Tree-Based Unidirectional Neural Networks for Low-Power Computer Vision on Embedded Devices

Abhinav Goel, Purdue University

Abstract

Deep Neural Networks (DNNs) are a class of machine learning algorithms that are widely successful in various computer vision tasks. DNNs filter input images and videos with many convolution operations in each layer to extract high-quality features and achieve high accuracy. Although highly accurate, the state-of-the-art DNNs usually require server-grade GPUs, and are too energy, computation and memory-intensive to be deployed on most devices. This is a significant problem because billions of mobile and embedded devices that do not contain GPUs are now equipped with high definition cameras. Running DNNs locally on these devices enables applications such as emergency response and safety monitoring, because data cannot always be offloaded to the Cloud due to latency, privacy, or network bandwidth constraints. Prior research has shown that a considerable number of a DNN’s memory accesses and computation are redundant when performing computer vision tasks. Eliminating these redundancies will enable faster and more efficient DNN inference on low-power embedded devices. To reduce these redundancies and thereby reduce the energy consumption of DNNs, this thesis proposes a novel Tree-based Unidirectional Neural Network (TRUNK) architecture. Instead of a single large DNN, multiple small DNNs in the form of a tree work together to perform computer vision tasks. The TRUNK architecture first finds the similarity between different object categories. Similar object categories are grouped into clusters. Similar clusters are then grouped into a hierarchy, creating a tree. The small DNNs at every node of TRUNK classify between different clusters. During inference, for an input image, once a DNN selects a cluster, another DNN further classifies among the children of the cluster (sub-clusters). The DNNs associated with other clusters are not used during the inference of that image. By doing so, only a small subset of the DNNs are used during inference, thus reducing redundant operations, memory accesses, and energy consumption. Since each intermediate classification reduces the search space of possible object categories in the image, the small efficient DNNs still achieve high accuracy. In this thesis, we identify the computer vision applications and scenarios that are well suited for the TRUNK architecture. We develop methods to use TRUNK to improve the efficiency of the image classification, object counting, and object re-identification problems. We also present methods to adapt the TRUNK structure for different embedded/edge application contexts with different system architectures, accuracy requirements, and hardware constraints. Experiments with TRUNK using several image datasets reveal the effectiveness of the proposed solution to reduce memory requirement by ∼50%, inference time by ∼65%, energy consumption by ∼65%, and the number of operations by ∼45% when compared with existing DNN architectures. These experiments are conducted on consumer-grade embedded systems: NVIDIA Jetson Nano, Raspberry Pi 3, and Raspberry Pi Zero. The TRUNK architecture has only marginal losses in accuracy when compared with the state-of-the-art DNNs.

Degree

Ph.D.

Advisors

Lu, Purdue University.

Subject Area

Artificial intelligence|Energy|Logic

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS