Query Processing Techniques for Compliance with Data Confidence Policies
Data integrity and quality is a very critical issue in many data-intensive decision-making applications. In such applications, decision makers need to be provided with high quality data on which they can rely on with high confidence. A key issue is that obtaining high quality data may be very expensive. We thus need flexible solutions to the problem of data integrity and quality. This paper proposes one such solution based on four key elements. The first element is the association of a confidence value with each data item in the database. The second element is the computation of the confidence values of query results by using lineage propagation. The third element is the notion of confidence policies. Such a policy restricts access to the query results by specifying the minimum confidence level that is required for use in a certain task by a certain subject. The fourth element is an approach to dynamically increment the data confidence level to return query results that satisfy the stated confidence policies. In particular, we propose several algorithms for incrementing the data confidence level while minimizing the additional cost. Our experimental results have demonstrated the efficiency and effectiveness of our approach.
data integrity, quality, data intensive decision making applications, lineage propagation, dynamically increment
Date of this Version