Understanding and Addressing Misconceptions in Introductory Programming: A Data-Driven Approach

Yizhou Qian, Purdue University


With the expansion of computer science (CS) education, CS teachers in K-12 schools should be cognizant of student misconceptions and be prepared to help students establish accurate understanding of computer science and programming. This exploratory design-based research (DBR) study implemented a data-driven approach to identify secondary school students' misconceptions using both their compilation and test errors and provide targeted feedback to promote students' conceptual change in introductory programming. Research subjects were two groups of high school students enrolled in two sections of a Java-based programming course in a 2017 summer residential program for gifted and talented students. This study consisted of two stages. In the first stage, students of group 1 took the introductory programming class and used an automated learning system, Mulberry, which collected data on student problem-solving attempts. Data analysis was conducted to identify common programming errors students demonstrated in their programs and relevant misconceptions. In the second stage, targeted feedback to address these misconceptions was designed using principles from conceptual change and feedback theories and added to Mulberry. When students of group 2 took the same introductory programming class and solved programming problems in Mulberry, they received the targeted feedback to address their misconceptions. Data analysis was conducted to assess how the feedback affected the evolution of students' (mis)conceptions. Using students' erroneous solutions, 55 distinct compilation errors were identified, and 15 of them were categorized as common ones. The 15 common compilation errors accounted for 92% of all compilation errors. Based on the 15 common compilation errors, three underlying student misconceptions were identified, including deficient knowledge of fundamental Java program structure, misunderstandings of Java expressions, and confusion about Java variables. In addition, 10 common test errors were identified based on nine difficult problems. The results showed that 54% of all test errors were related to the difficult problems, and the 10 common test errors accounted for 39% of all test errors of the difficult problems. Four common student misconceptions were identified based on the 10 common test errors, including misunderstandings of Java input, misunderstandings of Java output, confusion about Java operators, and forgetting to consider special cases. Both quantitative and qualitative data analysis were conducted to see whether and how the targeted feedback affected students' solutions. Quantitative analysis indicated that targeted feedback messages enhanced students' rates of improving erroneous solutions. Group 2 students showed significantly higher improvement rates in all erroneous solutions and solutions with common errors compared to group 1 students. Within group 2, solutions with targeted feedback messages resulted in significantly higher improvement rates compared to solutions without targeted feedback messages. Results suggest that with targeted feedback messages students were more likely to correct errors in their code. Qualitative analysis of students' solutions of four selected cases determined that students of group 2, when improving their code, made fewer intermediate incorrect solutions than students in group 1. The targeted feedback messages appear to have helped to promote conceptual change. The results of this study suggest that a data-driven approach to understanding and addressing student misconceptions, which is using student data in automated assessment systems, has the potential to improve students' learning of programming and may help teachers build better understanding of their students' common misconceptions and develop their pedagogical content knowledge (PCK). The use of automated assessment systems with misconception identification components may be helpful in pre-college introductory programming courses and so is encouraged as K-12 CS education expands. Researchers and developers of automated assessment systems should develop components that support identifying common student misconceptions using both compilation and non-compilation errors. Future research should continue to investigate the use of targeted feedback in automated assessment systems to address students' misconceptions and promote conceptual change in computer science education.




Lehman, Purdue University.

Subject Area

Educational technology|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server