M3: Stream Processing on Main-Memory MapReduce

Ahmed Aly, Purdue University
Asmaa Sallam
Bala Gnanasekaran
Long-Van Nguyen-Dinh
Walid G. Aref, Purdue University
Mourad Ouzzani, Purdue University
Arif Ghafoor, Purdue University

IEEE 28th International conference on Data Engineering, 2012

Abstract

The continuous growth of social web applications along with the development of sensor capabilities in electronic devices is creating countless opportunities to analyze the enormous amounts of data that is continuously steaming from these applications and devices. To process large scale data on large scale computing clusters, MapReduce has been introduced as a framework for parallel computing. However, most of the current implementations of the MapReduce framework support only the execution of fixed-input jobs. Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates. In this demonstration, we showcase M$^3$, a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered. M$^3$ extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoop Distributed File System (HDFS) to support main-memory-only processing. Moreover, M$^3$ supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.