Online periodicity mining

Mohamed G Elfeky, Purdue University

Abstract

This dissertation addresses the online periodicity mining problem. Periodicity mining is the process of discovering frequent periodic patterns in an attempt towards predicting the future behavior in time series data. The ubiquitousness of sensor devices that generate real-time, append-only and semi-infinite data streams has revived the need for online processing. ^ We define periodicity mining as a two-step process: discovering potential periodicity rates (Periodicity Detection), and discovering the frequent periodic patterns of each periodicity rate (Mining Periodic Patterns). We propose new algorithms for both online periodicity detection and online mining of periodic patterns. For the latter, the proposed algorithm incrementally maintains an efficient data structure, namely the max-subpattern tree, from which the periodic patterns are discovered. For the periodicity detection, we define two types of periodicities: segment periodicity and symbol periodicity. Whereas segment periodicity concerns the periodicity of the entire time series, symbol periodicity concerns the periodicities of the various symbols or values of the time series. For each periodicity type, we propose an efficient convolution-based periodicity detection algorithm. Furthermore, we propose online periodicity mining algorithms that integrate both periodicity mining steps, and thus are able to discover the periodic patterns of unknown periods. All the proposed online algorithms require only one pass over the time series and no reprocessing of previously seen data. Finally, we address the inevitable problem of the presence of noise in real-world time series data. We propose a new online periodicity detection algorithm that deals efficiently with all types of noise. Based on time warping, the proposed algorithm warps (extends or shrinks) the time axis at various locations to optimally remove the noise. ^ Experimental studies for all the proposed algorithms are carried out using both synthetic and real-world data. Results show that the proposed algorithms outperform the existing periodicity mining algorithms in terms of the time performance, the accuracy of the discovered periodicity rates and periodic patterns, and the resilience to noise. Real-data experiments demonstrate the practicality of the discovered periodic patterns. ^

Degree

Ph.D.

Advisors

Major Professors: Ahmed K. Elmagarmid, Purdue University, Walid G. Aref, Purdue University.

Subject Area

Computer Science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS