Iterative purification and effect size use with logistic regression for DIF detection

Brian Forrest French, Purdue University

Abstract

Educational and psychological tests frequently are used for program placement, diagnosis of cognitive impairments, certification, and as a graduation requirement. Detection of item bias or differential item functioning (DIF) in these tests is essential to ensure that ability (e.g., achievement, intelligence) is measured equally across identified subgroups. Statistical DIF detection methods require an estimate of ability for matching persons. When biased items comprise the estimate, inaccurate or unpurified ability is used in the matching process. The use of unpurified ability estimates has led to inaccurate identification of DIF across many methods. However, the effect of unpurified ability with logistic regression (LR) methods is not known. As this method gains attention, it is imperative that the effects of purification be examined. Therefore, the primary purpose of this study was to evaluate if differential item functioning (DIF) detection errors were reduced with the use of (a) an iterative purification process of the matching criterion, (b) the combination of an effect size and a statistical significance test, and (c) the combination of purification, an effect size, and the statistical significance test to identify DIF items with the use of the logistic regression procedure. Specifically, this study evaluated (a) power, and (b) Type I errors across iterative and non-iterative procedures with the manipulation of the use of an effect size criterion. The manipulated factors included (a) sample size, (b) DIF magnitude, (c) percent of DIF, and (d) ability distribution differences. The results revealed that, in general, purification may not be worth the extra time and cost, as the proportion of conditions meeting the power criterion with purification was not greater than without purification. Moreover, purification did not reduce the Type I error rate. However, purification was beneficial under certain conditions. Evaluation of the use of the effect size criterion revealed that it was beneficial in keeping Type I errors below the .01 criterion. However, these low Type I error rates came at a substantial loss of power. In conclusion, under these simulated conditions, the use of the statistical test with LR for DIF detection preformed as well as the other combinations of classification criteria, and is most likely the practical choice for the majority of situations.

Degree

Ph.D.

Advisors

Maller, Purdue University.

Subject Area

Educational evaluation

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS