Department of Electrical and Computer Engineering Technical Reports

Pruning Decision Trees with Misclassification Costs

Jeffrey P. Bradford, Purdue University School of Electrical and Computer Engineering
Clayton Kunz, Silicon Graphics, Data Mining and Visualization
Ron Kohavi, Silicon Graphics, Data Mining and Visualization
Cliff Brunk, Silicon Graphics, Data Mining and Visualization
Carla E. Brodley, Purdue University School of Electrical and Computer Engineering

Abstract

decision tree classifiers in two learning situations: minimizing loss and probability estimation. In addition to the two most common methods for error minimization, CART'S cost-complexity pruning and C4.5'~ errorbased pruning, we study the extension of cost-complexity pruning to loss and two pruning variants based on Laplace corrections. We perform an empirical comparison of these methods and evaluate them with respect to the following three criteria: loss, mean-squared-error (MSE), and log-loss. We provide a bias-variance decomposition of the MSE to show how pruning affects the bias and variance. We found that applying the Laplace correction to estimate the probability distributions at the leaves was beneficial to all pruning methods, both for loss minimization and for estimating probabilities. Unlike in error minimizat,ion, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in ternis of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of 10. While no method dominated others on all datasets, even for the same domain different pruning mechanisms are better for different loss matrices. We show this last result using Receiver Operating Characteristics (ROC) curves.

Date of this Version

February 1998

Download

COinS

Department of Electrical and Computer Engineering Technical Reports

Pruning Decision Trees with Misclassification Costs

Abstract

Date of this Version

Search

Links

Links for Authors

Browse

Department of Electrical and Computer Engineering Technical Reports

Pruning Decision Trees with Misclassification Costs

Authors

Abstract

Date of this Version

Share

Search

Links

Links for Authors

Browse