Heterogeneous information integration using an object-oriented knowledge framework

Wonhee Sull, Purdue University

Abstract

To develop a system that enables intelligent access to, and integration of, information spread over various independent information sources is an important issue in data intensive applications. While typical heterogeneous database management systems provide only for the integration of conventional databases, notably relational databases, we aim at a knowledge framework which can incorporate existing heterogeneous databases with nonconventional data types, as found in object-oriented databases, or rulebases. In this thesis, we investigate the self-organizing knowledge representation aspects of the schema integration involving object-oriented databases, relational databases, and rulebases. We consider a facet of self-organizability which sustains the structural semantic integrity of an integrated schema regardless of the dynamic nature of local schemata. To achieve this objective, we propose an overall scheme for schema translation and schema integration with an object-oriented data model as common data model, and it is shown that integrated schemata can be maintained effortlessly by propagating updates in local schemata to integrated schemata unambiguously. As an interface to the integrated system, ODML (Object Data Manipulation Language) is proposed as the global query language which is based on the nested relational data model, and is extended with object-oriented features and quantificational tags. With these extensions, schema and data do not have to be differentiated in formulating queries. In addition, set-based path expressions can be mixed with singular path expressions. In decomposing a global query into locally executable subqueries, an algorithm is presented which guarantees search of all the relevant objects within the federation of information systems. This issue has seldom been considered, but it is important in environments where heterogeneous objects are integrated using generalization and aggregation abstraction mechanisms. It is the case that there are large number of constraints often associated with data, or object classes. In this environment, when an integrity constraint is to be added, or when existing databases and knowledge bases are integrated, resolving conflicts among constraints from different information sources are not well understood even though it is one of the most important problems in integration. In the first stage, we identified the necessity of acquiring only those constraints which are necessary and consistent with respect to the current knowledge base. Then, an algorithm is presented to determine whether an incoming rule is consistent with respect to a target rulebase. For efficiency, a set of rules relevant to the new rule is to be collected from the rulebase. For this purpose, a data structure dependency graph is defined. Also, a logic theorem proving method is used for classifying the incoming rule.

Degree

Ph.D.

Advisors

Kashyap, Purdue University.

Subject Area

Electrical engineering|Computer science|Artificial intelligence

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS