Interactive visual data mining modeling to enhance understanding and effectiveness of the process

Yan Liu, Purdue University

Abstract

Data mining (DM) modeling is a process of transforming information enfolded in a dataset into a form amenable to human cognition. Most current DM tools only support automatic modeling, during which users have little interaction with computing machines other than assigning some parameter values at the beginning of the process. Arbitrary selection of parameter values, however, can lead to an unproductive modeling process. Automatic modeling also downplays the key roles played by humans in current knowledge discovery systems. Within this context, an interactive visual data mining modeling (IVDMM) process has been developed in this dissertation; it aims to facilitate the DM modeling process by enhancing users' understanding and improving the effectiveness of the process via combining the flexibility, creativity, and general knowledge of humans with the enormous storage capacity and computational power of computers, A conceptual model of the IVDMM process is proposed, which captures how model visualization and data visualization should be applied to support DM modeling and enable more involvement of users during the process. In this research, the IVDMM process has been developed for the two most popular DM tools---decision tree classification and association rules modeling---in which original designs of data visualization by combining parallel coordinates and mosaic displays, icicle plots of decision trees, and item-to-rule matrix views of association rules have been implemented. Four experiments were conducted to test four hypotheses derived from the conceptual model of the IVDMM process. The experimental results indicated that for both decision tree classification and association rules modeling tasks, compared to automatic model construction, interactive visual model construction through integrated visualization of model-building process and data visualization significantly improved the effectiveness of modeling (by 5% and 26%, respectively), enhanced users' understanding of the applied algorithms (by 76% and 103%, respectively), and brought them greater satisfaction with the task (by 53% and 21%, respectively). Also, compared to nonintegrated visualizations of models and data, integrated visualizations of models and data resulted in significantly better understanding of DM models (by 83% and 43%, respectively).

Degree

Ph.D.

Advisors

Salvendy, Purdue University.

Subject Area

Industrial engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS