Knowledge discovery in scientific databases

Vassilios Verykios, Purdue University

Abstract

Complex problems, whether scientific, engineering, or societal, are most often modeled and solved using complex scientific software systems. The efficient realization of such systems involves the selection of software components from existing alternatives along with their software/hardware parameters. In this thesis we provide a complete open-ended realization of the selection methodology required in the solution process of scientific problems. The implementation of the selection methodology is viewed as a generalized Knowledge Discovery in Databases (KDD) process for scientific databases that is evaluated together with the existing algorithmic infrastructure for the implementation of the KDD phases on a set of software for partitioning geometric data. These partitionings are used for the parallel processing of field problems. The integrated design of high performance computing and communication systems for large scale applications from component models (simulated, experimental, or analytical) can be obtained by the end-to-end performance modeling of such systems. The selection of the component models and their parameters as well as the classification of the proposed designs with respect to predefined performance objectives and features is another research issue addressed in this thesis. For this, we adopt the proposed KDD process and we test it by modeling the performance behavior of an ASCI application running on the SP2 and a cluster of workstations. This thesis has shown the feasibility of the KDD process in discovering the behavior of software/machine pairs from performance databases. Moreover, we have demonstrated that the proposed KDD process can accurately reproduce the results of human analysis and can be applied to large databases where human analysis is not feasible. The presented implementation framework of the KDD process has been shown to be easily adaptable to different scientific performance databases with a variety of targeted objectives. The results of this thesis should be valuable in addressing the identification and selection of software/hardware resources in the context of the meta computing paradigm.

Degree

Ph.D.

Advisors

Houstis, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS