Hardware/Software Co-Design for Keyword Spotting on Edge Devices

Jacob Bushur, Purdue University

Abstract

The introduction of artificial neural networks (ANNs) to speech recognition applications has sparked the rapid development and popularization of digital assistants. These digital assistants perform keyword spotting (KWS), constantly monitoring the audio captured by a microphone for a small set of words or phrases known as keywords. Upon recognizing a keyword, a larger audio recording is saved and processed by a separate, more complex neural network. More broadly, neural networks in speech recognition have popularized voice as means of interacting with electronic devices, sparking an interest in individuals using speech recognition in their own projects. However, while large companies have the means to develop custom neural network architectures alongside proprietary hardware platforms, such development precludes those lacking similar resources from developing efficient and effective neural networks for embedded systems. While small, low-power embedded systems are widely available in the hobbyist space, a clear process is needed for developing a neural network that accounts for the limitations of these resource-constrained systems. In contrast, a wide variety of neural network architectures exists, but often little thought is given to deploying these architectures on edge devices. This thesis first presents an overview of audio processing techniques, artificial neural network fundamentals, and machine learning tools. A summary of a set of specific neural network architectures is also discussed. Finally, the process of implementing and modifying these existing neural network architectures and training specific models in Python using TensorFlow is demonstrated. The trained models are also subjected to post-training quantization to evaluate the effect on model performance. The models are evaluated using metrics relevant to deployment on resource-constrained systems, such as memory consumption, latency, and model size, in addition to the standard comparisons of accuracy and parameter count. After evaluating the models and architectures, the process of deploying one of the trained and quantized models is explored on an Arduino Nano 33 BLE using TensorFlow Lite for Microcontrollers and on a Digilent Nexys 4 FPGA board using CFU Playground.

Degree

M.Sc.

Advisors

Chen, Purdue University.

Subject Area

Artificial intelligence|Computer Engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS