An evaluation of a post-entry test: An item analysis using Classical Test Theory (CTT)

Suthathip Thirakunkovit, Purdue University


This study is an analysis of test reliability of two screening tasks (C-test and cloze-elide) in the Assessment of College English-International test (ACE-In), a post-entry test developed at Purdue University. The study uses Classical Test Theory (CTT) to assess the reliability of these test items. CTT is selected because this theory is the standard comprehensive procedure for developing, evaluating, and scaling test items (DeVellis, 2006). This reliability analysis is important because it is a prerequisite to the test validation process. This study has three major research questions: 1. What is the item characteristics of C-test and cloze elide? 2. What are the average values of item difficulty and item discrimination of C-test and cloze elide items? 3. What are the internal consistency coefficients for and correlation coefficient between the C-tests and cloze elide tests? The results of the pilot study showed that the average score of C-test is 77.8 (SD = 9.98), and that of cloze-elide test is 36.59 (SD = 14.86). Considering the average values of item difficulty and item discrimination of both tasks, C-test items are generally considered easy (item difficulty > 0.7), while cloze-elide items are of medium difficulty (item difficulty ≈ 0.6). Even though C-test items have acceptable discrimination i.e., the average biserial correlation indices (rpb) are 0.3, cloze-elide items are shown to have much better discrimination values on average i.e., rpb indices are higher than 0.5. The Cronbach’s alpha coefficients, a measure of internal consistency, of C-test and cloze-elide are .88 and .96, respectively. The Pearson product-moment correlation analysis revealed that the correlation between the C-test and cloze-elide is high (r = .66), and it is significant with the p-value less than .01. These analyses of test reliability indicated that the test items were measuring the same underlying construct – generally language proficiency. Even though the key results of the item analyses showed that C-test did not meet the standard of item difficulty and discrimination, it does not necessarily mean that C-test cannot sufficiently serve its intended purpose as a preliminary screening tool. After examining the score distributions of both C-test and cloze-elide scores, the scores of both tasks range widely. With fairly wide standard deviations, there is a potential to combine the scores of these two screening tasks to identify the students who had a uniformly low performance across both tasks.




Ginther, Purdue University.

Subject Area


Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server