Multi-route query processing and optimization
A modern query optimizer typically picks a single query plan for all data based on overall data statistics. However, many have observed that real-life datasets tend to have non-uniform distributions. Selecting a single query plan may result in ineffective query execution for possibly large portions of the actual data. In addition most stream query processing systems, given the volume of data, cannot precisely model the system state much less account for uncertainty due to continuous variations. Such systems select a single query plan based upon imprecise statistics. In this paper, we present “Query Mesh” (or QM), a practical alternative to state-of-the-art data stream processing approaches. The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of the data with distinct statistical properties. We use terms “plans” and “routes” interchangeably in our work. A classifier model is induced and used to assign the best route to process incoming tuples based upon their data characteristics. We formulate the QM search space and analyze its complexity. Due to the substantial search space, we propose several cost-based query optimization heuristics designed to effectively find nearly optimal QMs. We propose the Self-Routing Fabric (SRF) infrastructure that supports query execution with multiple plans without physically constructing their topologies nor using a central router like Eddy. We also consider how to support uncertain route specification and execution in QM which can occur when imprecise statistics lead to more than one optimal route for a subset of data. Our experimental results indicate that QM consistently provides better query execution performance and incurs negligible overhead compared to the alternative state-of-the-art data stream approaches.
Data stream database systems; Adaptive query processing
Date of this Version