Energy Efficient Hardware for Neural Network Applications

Trishit Dutta, Purdue University

Abstract

With the explosion of AI in recent years, there has been an exponential rise in the demand for computing resources. Although Moore’s law has so far kept up with conventional computational demands in the past, it has become evident that the efficiency and area gains with transistor scaling are no longer exponential, but rather incremental. The standard Von Neumann architecture imposes a limit on efficiency and latency as data is shuttled repeatedly between the compute and memory units. On the other hand, AI workloads rely heavily on matrix-vectormultiplication which get exponentially expensive with vector widths. In-memory and nearmemory computing have come up as promising alternatives that addresses both these issues elegantly while reducing energy requirements. A variety of NN models rely on fast and repetitive evaluation of exponential transcendental functions. In many cases, this is done by range reduction technique and math tables. For optimal energy efficiency and throughput, it is best if these tables reside as close as possible to the circuit where it is consumed. We propose a mixed-signal macro with dual functionality: ability to do matrix vector multiplication as well evaluate exp(x) for 32-bit IEEE 754 floating point number. The said macro consists of 64x64 array of special 8T cells that stores the math tables without hindering normal SRAM functionality. The charge based MVM engine uses two ADCs with reconfigurable precision, allowing faster throughput for sparse inputs. As the outputs of these operations are separate, it allows for high flexibility to use the macro in any neural-network hardware that needs either or both the functions. Spiking Neural Networks (SNN) can perform sequential learning tasks efficiently by using the inherent recurrence of membrane potential (Vmem) accumulation over several timesteps. However, the data movement of Vmem creates additional memory accesses, which becomes a bottleneck operation. Additionally, SNN input spikes are highly sparse in nature, which can be exploited for efficient hardware implementation. We propose an SNN accelerator based on inmemory processing that addresses these. The said accelerator consists of 9 compute macros and 3 neuron macros, which can be programmed to work either serially (9 compute, 1 neuron) or in 3 parallel sets (3 compute, 1 neuron) to support different layer sizes. Peripheral logic computes the membrane potential and stores it in the same compute macro, thus avoiding unnecessary data movement. The neuron macro keeps track of final membrane potential and generates output spikes. This accelerator was designed to run at 200Mhz at 1.2v in TSMC 65nm node.

Degree

M.Sc.

Advisors

Roy, Purdue University.

Subject Area

Logic|Design|Artificial intelligence|Electrical engineering|Energy|Mathematics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS