NADEEF: A Generalized Data Cleaning System

Abstract

We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements prede ned classes. These classes uniformly dene what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by de ning new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to e ectively involve users in the data cleaning process.

Keywords

data initialization, dashboard, user interaction, data auditing

Date of this Version

2013

Comments

QCR - Qatar Computing Research Institute

Share

COinS