Hardware and Software Accelerators for Big Data Machine Learning Workloads

Nitin Nitin, Purdue University

Abstract

We are in the computing era of super-zetta data bytes (a.k.a. Big Data). Big Data is critical to developing Machine Learning based rich analytics applications of today and tomorrow. Such extreme data size demands newer solutions in software (SW) and in hardware (HW) to enable high performance systems. In this thesis, I propose two solutions to enable high performance acceleration in Big Data Machine Learning (BDML) processing: (1) SW acceleration with an approximate MapReduce system and (2) HW acceleration through a Processing-Near-Memory (PNM) architecture. In the first part, I present an approximate MapReduce framework to accelerate big data processing in the cloud. In MapReduce, input data is processed to output tuples. Many applications perform approximate computations by sampling over key space to achieve performance improvements at the cost of less precise, yet statistically bounded, results. A simple way to approximate is through uniform sampling across the entire key space. This approach unfortunately not only oversamples popular keys but also perniciously undersamples rare keys. Accordingly, I perform distributed stratified random sampling in MapReduce to achieve approximations with strong error bounds across the entire key space. Further, to avoid oversampling across popular keys I develop a distributed feedback driven algorithm that coordinates between map tasks to cut sampling of unnecessary tuples. In part two of this thesis, I propose a processing-near-memory (PNM) architecture for Big Data Machine Learning (BDML). The technology push of 3D-memory and application pull of BDML have created a unique opportunity for PNM. With such an opportunity in mind, I explore and identify the key characteristics of MapReduce workloads that make them amenable to bandwidth- and energy-efficient PNM. Based on these characteristics, I propose Millipede: a row-oriented PNM architecture which (pre)fetches and operates on entire memory rows to exploit BDMLs' row-density. Millipede consists of thousands of corelets that employ well-known MIMD execution to handle BDMLs' irregularity. However, corelets may stray far from each other due to their MIMD execution which destroys row-locality and leads to performance degradation. Hence, I make use of cross-corelet flow-control in Millipede to prevent such degradation.

Degree

Ph.D.

Advisors

Vijaykumar, Purdue University.

Subject Area

Computer Engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS