Accelerating Emerging Neural Workloads

Jacob R Stevens, Purdue University

Abstract

Due to a combination of algorithmic advances, wide-spread availability of rich data sets, and tremendous growth in compute availability, Deep Neural Networks (DNNs) have seen considerable success in a wide variety of fields, achieving state-of-the art accuracy in a number of perceptual domains, such as text, video and audio processing. Recently, there have been many efforts to extend this success in the perceptual, Euclidean-based domain to nonperceptual tasks, such as task planning or reasoning, as well as to non-Euclidean domains, such as graphs. While several DNN accelerators have been proposed in the past decade, they largely focus on traditional DNN workloads, such as Multi-layer Perceptions (MLPs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). These accelerators are ill-suited to the unique computational needs of the emerging neural networks. In this dissertation, we aim to fix this gap by proposing novel hardware architectures that are specifically tailored to emerging neural workloads. First, we consider memory-augmented neural networks (MANNs), a new class of neural networks that exhibits capabilities such as one-shot learning and task planning that are well beyond those of traditional DNNs. MANNs augment a traditional DNN with an external differentiable memory that is used to store dynamic state. This dissertation proposes a novel accelerator that targets the main bottleneck of MANNs: the soft reads and writes to this external memory, each of which requires access to all the memory locations. We then focus on Transformer networks, which have become very popular for Natural Language Processing (NLP). A key to the success of these networks is a technique called self-attention, which employs a softmax operation. Softmax is poorly supported in modern, matrix multiply-focused accelerators since it accounts for a very small fraction of traditional DNN workloads. We propose a hardware/software co-design approach to realize softmax efficiently by utilize a suite of approximate computing techniques. Next, we address graph neural networks (GNNs). GNNs are achieving state-of-the-art results in a variety of fields such as physics modeling, chemical synthesis, and electronic design automation. These GNNs are a hybrid between graph processing workloads and DNN workloads; they utilize DNN-based feature extractors to form hidden representations for each node in a graph and then combine these representations through some form of a graph traversal. As a result, existing hardware specialized for either graph processing workloads or DNN workloads is insufficient. Instead, we design a novel architecture that balances the needs of these two heterogeneous compute patterns. We also propose a novel feature dimensionblocking dataflow to further increase performance by mitigating the memory bottleneck. Finally, we address the growing difficulty in tightly coupling new DNNs and a hardware platform. Given the extremely large DNN-HW design space consisting of DNN selection, hardware operating condition, and DNN-to-HW mapping, it is infeasible to exhaustively search this space by running each sample on a physical hardware device. This has led to the need for highly accurate, machine learning-based performance models which can predictthe latency/power/energy even faster than direct execution. We first present a taxonomy to characterize the possible approaches to these performance estimators. Based on the insights from this taxonomy, we present a new performance estimator that combines coarsegrained and fine-grained to achieve superior accuracy with a limited number of training samples.

Degree

Ph.D.

Advisors

Raghunathan, Purdue University.

Subject Area

Artificial intelligence|Energy

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS