Efficient query processing for rich and diverse real-time data

Rimma Nehme, Purdue University

Abstract

In recent years, data streams have become ubiquitous as technology is improving and the prices of sensor, location-tracking and portable devices are falling. Examples of streaming data include observations from sensor networks, location updates from GPS devices, measurements from health monitoring devices, status updates, comments and self-expressions from users on the web, e.g., by “twittering”. The current state-of-the-art Data Stream Management Systems (or short DSMSs) typically consider a very simple streaming environment, where data streams transmit only data tuples, and based on these arriving data tuples, continuous queries are evaluated on the server. For execution of a continuous query, typically, a single execution plan (based on the latest overall statistics) is employed for processing all arriving data. We believe that such “first-generation” DSMSs (or as we also refer to them “Streams 1.0” systems), while enabling users and applications to pose queries over data streams, are, however, ill-equipped to support many of the functionalities crucial to the newly-emerging streaming applications, e.g., ubiquitous healthcare or geo-social networking. Motivated by the growing trend of such new stream-based applications, in this thesis, we propose to equip DSMSs with several important functionalities, namely: (1) the access control enforcement for security of streaming data, (2) the tagging of streaming data for producing “richer” and more meaningful results, and (3) the diversity-aware query processing for efficient processing of queries, where subsets of data may exhibit distinct statistical properties. For each of the above features, we provide the concrete problem definition, the motivating examples, develop and analyze algorithms, and present the experimental results using a general-purpose DSMS prototype. We believe that the ideas presented in this thesis can significantly contribute the development of the “next generation” of DSMSs – or the so-called “Streams 2.0” systems.

Degree

Ph.D.

Advisors

Rundensteiner, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS