Multiple-Choice Testing Using Immediate Feedback — Assessment Technique ( IF AT ® ) Forms : Second-Chance Guessing vs . Second-Chance Learning ?

Multiple choice testing is a common but often ineffective method for evaluating learning. A newer approach, however, using Immediate Feedback Assessment Technique (IF AT, Epstein Educational Enterprise, Inc.) forms, offers several advantages. In particular, a student learns immediately if his or her answer is correct and, in the case of an incorrect answer, has an opportunity to provide a second response and receive partial credit for a correct second attempt. For a multiple choice question with five possible answers, the IF AT form covers spaces labeled A through E with a thin opaque film; when the film is scratched away, a star indicates the correct answer. This study was conducted in order to assess learning after an initial incorrect answer. Based on random chance, students should have mathematically a 25% chance of guessing a correct second answer (i.e. 1 of 4 remaining answers on the IF AT form). Analysis of second responses for 8775 questions on IF AT forms in 22 classes over 3 years showed that the percent of correct second answers was 44.9%, significantly higher than one might expect from random guessing. This indicates that students learned from an incorrect answer and, possibly by re-reading the problem, were able to demonstrate some level of mastery of the material. This data leads us to conclude that IF AT forms are useful assessment tools.


Introduction
Multiple-choice exams are advantageous with large class sizes, the desire to test frequently, and the desire to return corrected exams quickly to students.Multiple-choice exams, however, have several disadvantages for testing mastery of course material including having only one opportunity to respond to a question.The Immediate Feedback Assessment Technique (IF AT ® ) form, introduced by M. Epstein, B. Epstein and Brosvic (2001), attempted to circumvent these issues.When using the form, a student scratches off an opaque film corresponding to the answer to a multiple choice question and observes either a star for a correct answer or a blank for an incorrect answer; instructors have the codes for different IF AT ® forms so as to write multiple choice questions where the correct answers correspond to the position of the stars on the IF AT ® form.Thus, with the IF AT ® form, a student receives an immediate response for correct answers; well-prepared students are particularly encouraged by this procedure (Dibattista & Gosse, 2006).But, if the area under the patch is blank (an incorrect answer), the student can immediately reconsider the question and give a second-chance answer, for which some partial credit may be awarded.More information about IF AT ® forms can be found at the Epstein Educational Enterprise, Inc. web pages (http://www.epsteineducation.com/home/), and in an excellent overview by Smith (2013).Nicol and Macfarlane-Dick (2006) explore the importance of learning through assessment and explain the need for a shift in focus from simply assessing a student's knowledge to a focus on a lifetime of learning.Fischer (1999) advocates that lifelong learning is fostered best utilizing self-directed learning environments.Fisher (1999) further explains that this self-directed learning should utilize authentic, complex problems and be embedded in a rewarding endeavor.For students to become self-directed learners they must monitor and adjust their approaches to learning (Ambose, Bridges, DiPietro, Lovett, & Norman, 2010).The Immediate Feedback Assessment Technique allows for students to monitor and adjust their approach as they solve a problem (if they get their first answer wrong) and do so under the umbrella of a rewarding endeavor.Although students benefit by knowing immediately whether a mistake has been made in answering a question and by having the opportunity to receive partial credit for a correct second answer (Epstein et al., 2002;Dihoff, Brosvic, Epstein, & Cook, 2004), it is less clear if a correct second answer is indeed the result of learning from a mistake or good guessing.Which leads to our primary research problems: How accurately can students respond to a question in which they have already responded incorrectly?Can we ascertain whether students, when given a second-chance, are merely guessing or are utilizing the learning triggered by the knowledge of an incorrect answer?In a problem-solving situation, immediate knowledge of an incorrect response to a problem should trigger student's reappraisal of the information explicitly and implicitly embedded in the question.This ideally should result in a cognitive reevaluation of the question, assisting the student to find a more plausible answer to the same question on the second attempt.Whether the credits earned through the second attempts are attributed to learning from mistakes or random guessing needs an answer.Starting with the assumption that informed second chance responses would gain more partial credit points than random-chance-guess responses, we attempted to empirically examine whether the students additional credits earned in the present study were significantly greater than the points students might have earned from utilizing blind-guessing for the second chance responses.This statistical study was taken on to assess learning after an initial incorrect answer using the IF AT ® format.

Methods
IF AT ® forms were purchased from Epstein Educational Enterprise, Inc. (Cincinnati, OH, http://www.epsteineducation.com/home/).Of several types of forms, this study used forms with 5 possible answers.Different versions of the forms come with instructor-only answer keys to construct tests with correct answers in the correct positions.New forms have come with opaque backing to insure even greater resistance to any possible "see through".
The study was conducted over a 3-year period in second-year Organic Chemistry courses.A total of 22 classes used the IF AT ® forms for three hourly exams and a final given to a total of 1,449 students.The multiple-choice portion of each test consisted of between 15 and 25 questions with 5-part answers.There were a total of 26,175 questions and 8,775 questions were incorrectly answered on the first attempt; the latter were evaluated for correct second attempts.P-Values were obtained in order to determine the percentage of students that utilized the opportunity for partial credit at a greater than random chance frequency.These p-values were obtained for each individual test as well as utilizing the combination of all tests into a large pool.Each p-value was obtained by taking the difference of the normal distribution and the value of 1.0.
Further analysis was conducted through the use of a t-test in order to determine whether there was any significance in the difference between the total amount of partial credit actually earned throughout the study versus the total amount of partial credit that might have been earned had students only earned partial credit through random chance.Finally, a Pearson's Correlation was conducted to show the strength between the two variables of a student who scored with partial credit and the amount of partial credit that student earned.

Results
Using the 5-answer IF AT ® forms, the likelihood of a student getting a question correct on the second try by random guessing would be 25% (1 correct answer/4 remaining choices).In the current study with 8,775 such questions, one could determine that random guessing would give a correct answer 2194 times (See Figure 1).In reality, students answered correctly on a second attempt a total of 3938 times or 44.9% of the time.While it is not possible to rule out that guessing had any effect on this number, our data suggests that students utilized some technique and prior knowledge to decipher the correct answers.It is important however to note that our data is only indicative of learning and not a confirmation that second-chance correct responses are learning-driven.

Rate of Accuracy of Response
Figure 1.Rate of accurate of response actually obtained versus random chance A t-test was run to determine the level of difference between the total partial credit actually earned (N=8775, Mean=179.00,SD=46.18,CI=1.27) and the value for 25% of the total possible credit that would hypothetically be obtained through random guessing (N=8775, Mean=99.72,SD=20.53,CI=0.56.The t-test returned a value of 9.08 x 10 -12 (N=8775, SD=53.44,CI=1.47).This value suggests that students were utilizing some form of discernment to determine which of their remaining possibilities was the correct choice.Of the 22 tests, the greatest value for p-hat obtained was 4.117 x 10 -8 .For this specific test (referred to as exam F), there were 15 multiple-choice questions; students were given the opportunity to earn 3 points for a correct first answer, 1 point for a correct second answer, or zero points for two incorrect attempts.That would result in a total possible score of 45 points.Exam F had an average score without partial credit of 21.34 and a median score without partial credit of 21.When partial credit was included for Exam F the average score was 24.30 and the median score was 25.00.The close proximity between the average and the median suggests that there are no dramatic outliers affecting the calculations.The average test grade without the addition of any partial credit was 47.42%.The average test grade after the addition of partial credit was 53.99%.This alone brought the average grade up by nearly 6.57%, which is over half of a letter grade for the multiple-choice portion of the exam (See Figure 2 for a comparison of average test scores without partial credit, with partial credit obtained at random chance, and with the partial credit that was actually obtained for each exam).A significance level of 0.01 was used to determine if there was a significant difference between the scores students earned with their actual partial credit and the scores student would have earned under random chance.The t-test for exam F (which, while still being significant, was the least significant of the 22 tests) displayed a value of <0.001 (N=44, SD=7.60,CI=3.09).This value strongly suggests that students were able to obtain partial credit and earn a higher grade at a statistically significant level.
The results for the Pearson's Correlation was -0.1703, meaning that there is a weak negative correlation between the grade a student earned and the amount of partial credit that was obtained by that student.This was expected due to the understanding that to get a perfect score a student would have to earn zero partial credit and get every answer correct.Therefore, the higher grades that could be obtained would require a lower amount of incorrect initial choices.
Looking at our complete data set, the amount of partial credit points that would have been earned by guessing was 2194.In reality, students earned a total of 3938 partial credit points, resulting in an additional earning of nearly 80% more partial credit points than mere random guessing.
An analysis was completed between the average exam scores without partial credit, with partial credit obtained at random chance, and with the partial credit that was actually obtained for each exam (See Figure 2 for a visual comparison of this analysis).This analysis shows that the average grades obtained in relation to no partial credit, partial credit at random chance, and the actual partial credit earned for exams F and Q are extremely close to one another, with exam F displaying 21.34, 24.30 and 23.31 respectively; while exam Q received average exam grades of 21.17, 24.58, and 23.16 respectively.While these differences are small, and while both exams had partial credit use in statistically significant excess when compared to that of random chance, a t-test for exam Q resulted in a significance of 2.3 x 10 -11 (N=52, SD=6.22,CI=2.31).

Discussion and Conclusion
The goals of teaching are to facilitate learning, to increase retention of knowledge and to expand understanding.Ideally, testing should be designed to assess learning.In today's pedagogy, established methods of lecturing and testing are being challenged with new approaches aimed at increasing students' understanding of course material.Methods that permit immediate feedback to students during lectures and tests have been shown to increase more effective long-term understanding (Roediger & Butler, 2011).Classroom response systems, for example, have gained considerable acceptance in engaging students during lectures in large classes (Schell, Lukoff, & Mazur, 2013;Heaslip, Donovan, & Cullen, 2014).In addition, continual "retrieval practice" enhanced with rapid feedback has been shown to assist persistent learning (Roediger & Butler, 2011).Clearly, rapid feedback is a critical element of all testing to ensure that students' mistakes do not persist (Attali, 2011).
Designing good tests is a challenge for instructors.With increasing class sizes or the desire to present multiple exam opportunities, multiple-choice questions are often used to speed up grading.Little, E. Bjork, R. Bjork and Angello (2012) note that multiple-choice tests can be useful learning tools that foster productive retrieval learning.Mathematical analysis of multiple choice tests shows that the fairest method for grading is to give credit for the number of correct answers with no penalty for a wrong or missing answer (Scharf & Baldwin, 2007); this approach, however, does not reduce credit given for blindly guessing an answer (Bush, 2015).In most cases, multiple choice questions pose two critical problems: one, students must choose from a fixed set of answers without displaying their learning process; and, two, common to all testing methods, there is a delay in learning whether the answer is correct or not.
Epstein et al. designed IF AT ® forms to overcome some of these issues (Epstein et al., 2001;Epstein et al., 2002;Dihoff, Brosvic, & Epstein, 2003;Brosvic, Epstein, Cook, & Dihoff, 2005).For the professor, the immediate feedback assessment technique offers a benefit in that the grading of the multiple-choice portion of an exam is nearly completed during the actual testing process.The grading of these exams consists of a student writing the amount of points they received for each question to the side of the question after they complete the exam.In addition, many researchers have found that, when compared to other testing methods (traditional format, end-of-test feedback, delayed feedback (feedback given 24 hours after testing), and Scantron testing formats), the IF AT ® provided better recall of material when asked again on a future, final exam (Epstein & Brosvic, 2002;Dihoff et al., 2003;Dihoff et al., 2004).
IF AT ® forms can be used to give partial credit for a correct answer after an initial incorrect response, and some faculty use IF AT ® forms to allow students to answer until he/she reaches a correct answer (Attali, 2011).One problem with giving partial credit is promoting guessing.Slepkov (2013) suggested that second answers on IF AT ® forms were better than guessing.Our analysis suggests that Slepkov's assertion is valid.We found that the percent of correct second answers was 44.9%, almost 25% higher than random guessing.With rare exceptions, we and others (Dibattista & Gosse, 2006;Epstein & Brosvic, 2002) have found that students liked using the IF AT ® forms and reported informally that it reflected what they learned better than standard multiple-choice formats.Therefore we can conclude that IF AT ® forms are an effective and practical assessment tool and encourage more broad adoption by educators.More specifically, we encourage science/chemistry faculty who, in the author's experience, typically shy away from multiple-choice assessments because of the lack of ability to award partial-credit points.Finally, we would like to caution readers that, while our data is strongly indicative that our students' second-chance responses were grounded in learning, it is important that future research make efforts to probe the nature of learning triggered through immediate feedback in a testing situation.
Credit vs. Random Chance vs. Actual) Test Grade Without Partial Credit Test Grade with Partial Credit at Random Chance Test Grade with Partial Credit Actually Earned

Figure 2 .
Figure 2. Comparison of average test percentage