Query processing in stream database systems
Abstract
The processing of data streams plays a central role in emerging applications such as pervasive computing, sensor-based environments, and on-line business processing. Such applications receive unbounded input streams that are processed against a set of standing queries. To overcome the infinite nature of data streams, the queries define windows (scopes of interest) to limit access to the unbounded input data. The window queries are repeatedly evaluated each time a new input arrives, and hence are termed sliding window queries. The straightforward application of traditional pipelined query processing techniques to sliding window queries may result in inefficient and incorrect behavior. In this thesis, I address several research challenges for building a scalable query processing engine for stream database systems. I propose various scheduling techniques that guarantee the correct execution of pipelined sliding window queries. Based on the scheduling techniques, I present new algorithms for correctly evaluating complex window-based query operations. I address scalability issues through sharing the execution of multiple concurrent queries and propose new query evaluation strategies for shared execution. My research on shared execution opens new venues for optimizing multiple sliding-window queries considering the window size as an optimization parameter. I propose new algorithms to evaluate join queries over data streams using general window constraints. Since video is considered a stream of consecutive image frames, video operations may be expressed as queries over video streams. From this viewpoint I used the proposed query engine to express and execute basic video operations such as fast forward and region-based blurring as queries over video streams. I have studied the performance of the proposed techniques both analytically and experimentally, using real streams of retail transactions, medical video data, and synthetic data streams, and in the context of a prototype stream database system. The performance study demonstrates the superiority and practicality of the proposed techniques in terms of response time and throughput.
Degree
Ph.D.
Advisors
Aref, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.