Rank aggregation techniques for context-aware database management systems

Hicham Galal Elmongui, Purdue University

Abstract

This dissertation addresses rank aggregation in context-aware database management systems (DBMSs). Given multiple orderings on a set of entities, the rank aggregation problem is to find an "enhanced" ordering on the same set. Besides being an important tool in information retrieval, providing for rank aggregation in context-aware DBMSs is indispensable. Context-aware DBMSs account for various contexts to provide relevant information to the user. Contexts of the query issuers as well as the database objects determine the significance of the different pieces of information to be retrieved. Tailoring specialized DBMSs that manage data and answer queries related to one context type is not an easy task. We propose Chameleon, a context-aware DBMS that supports multiple contexts as well as user preferences. It has a generic interface to define and process context information. Chameleon not only eliminates the need to tailor specialized engines towards a certain context, but also enables instantiating systems that support user-defined complex and composite contexts. As a proof of concept, we present two instances of Chameleon. One instance treats identity as a context to realize a privacy-aware (Hippocratic) database server that limits the disclosure of data to authorized parties. The other instance treats space as a context to realize a spatial database server. Since rank aggregation is at the core of context-aware DBMSs, we propose three methods for rank aggregation while composing contexts into a composite context; ordering, ranking, and skylining. We propose a new binary query operator, SkylineJoin, that not only joins two relations but also marks the skyline joined tuples. SkylineJoin is a pipelined operator designed to be incorporated in a traditional query evaluation pipeline model. It is a progressive operator that returns the initial results fast. SkylineJoin may be cascaded to compute the skyline of objects when the skyline dimensions span more than two relations. We modeled the execution of the SkylineJoin operator as a finite state machine that is built on top of an order-preserving join operator. We proved the correctness of SkylineJoin. Extensive experiments showed that SkylineJoin can exhibit an order of magnitude performance gain over the current state of the art. As the surrounding environment changes, context-aware DBMSs should adjust when the contexts of the objects dynamically change. We illustrate this feature as it affects rank aggregation. We propose the adaptive processing of ranking queries when the data do not reside locally on the same node that computes the query, and hence experiences delays. When the query engine is notified that the currently executing plan is sub-optimal or a disconnection has occurred, a new execution plan is generated and the execution proceeds without reevaluating the ranking query. We propose an aggressive reuse of the old ranking state from the current plan in building the state of the new plan. The experimental evaluation shows significant performance gain by changing sub-optimal execution strategies in run-time.

Degree

Ph.D.

Advisors

Aref, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS