Accelerating Data Stream Mining using Graphics Processing Units

Chandima Hewa Nadungodage, Purdue University

Abstract

Many real-world applications are capable of producing continuous, infinite streams of data. During the past two decades, various research has been conducted to address the challenges in data stream mining. However, data rates and volumes produced by such applications continue to grow exponentially due to the recent advancements in technology and the growing online community. Hence, there is an increasing need to explore more efficient and powerful methods to process such data streams and produce results in a timely manner. Graphics Processing Units (GPUs) are designed to handle highly parallel workloads and can execute thousands of concurrent threads. Over the past decade, GPU computing became more popular in general-purpose data mining applications. With the recent advances in GPU technology, GPU has become good a candidate for processing streaming data in a timely manner. In this research, we study how to use parallel processing powers of GPUs to accelerate data stream mining applications. We propose a general-purpose, GPU-based data stream mining framework named GPU-SMF. The proposed framework offers a novel operator scheduling mechanism to allocate stream mining operators to GPU kernels in such a way that enhances operator concurrency even within a single GPU and increases throughput of the application. Using the proposed framework, we implement two novel stream mining algorithms for an online recommendation system and an online outlier detection application. Experimental results on real-world datasets show that our proposed methods are efficient and scalable when mining continuous data streams with large volume and high input rates.

Degree

Ph.D.

Advisors

Xia, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS