The impact of systematically repairing multiple choice questions with low discrimination on assessment reliability: an interrupted time series analysis
DOI:
https://doi.org/10.36834/cmej.77596Abstract
At our centre, we introduced a continuous quality improvement (CQI) initiative during academic year 2018-19 targeting for repair multiple choice question (MCQ) items with discrimination index (D) < 0.1. The purpose of this study was to assess the impact of this initiative on reliability/internal consistency of our assessments. Our participants were medical students during academic years 2015-16 to 2020-21 and our data were summative MCQ assessments during this time. Since the goal was to systematically review and improve summative assessments in our undergraduate program on an ongoing basis, we used interrupted time series analysis to assess the impact on reliability. Between 2015-16 and 2017-18 there was a significant negative trend in the mean alpha coefficient for MCQ exams (regression coefficient -0.027 [-0.008, -0.047], p = 0.024). In the academic year following the introduction of our initiative (2018-19) there was a significant increase in the mean alpha coefficient (regression coefficient 0.113 [0.063, 0.163], p = 0.010) which was then followed by a significant positive post-intervention trend (regression coefficient 0.056 [0.037, 0.075], p = 0.006). In conclusion, our CQI intervention resulted in an immediate and progressive improvement reliability of our MCQ assessments.
References
Messick S. Validity. 3rd ed. New York, NY: American Council on Education and Macmillan, 1989.
Kane MT. Validation. In: Brennan RL, ed. Educational measurement. 4th ed. Westport.: Praeger; 2006:17-64.
Messick S. The interplay of evidence and consequences in the validation of performance assessments. Education Researcher 1994;32:13-23. https://doi.org/10.2307/1176219 DOI: https://doi.org/10.2307/1176219
Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane's framework. Med Educ 2015;49(6):560-75. https://doi.org/10.1111/medu.12678 DOI: https://doi.org/10.1111/medu.12678
De Champlain AF. A primer on classical test theory and item response theory for assessments in medical education. Med Educ 2010;44(1):109-17. https://doi.org/10.1111/j.1365-2923.2009.03425.x DOI: https://doi.org/10.1111/j.1365-2923.2009.03425.x
Thorndike RL, Hagen E. Measurement and evaluation in psychology and education. New York: John Wiley and Sons Inc, 1961.
Richardson MW. Notes on the rationale of item analysis. Psychometrika 1936;1:69-76. https://doi.org/10.1007/BF02287926 DOI: https://doi.org/10.1007/BF02287926
Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334. https://doi.org/10.1007/BF02310555 DOI: https://doi.org/10.1007/BF02310555
Glass GV, Hopkins, K.D. Statistical methods in education and psychology. 3rd ed. Needham Heights, MA: Allyn and Bacon, 1995.
Chiavaroli N. Negatively-worded multiple choice questions: an avoidable threat to validity. Pract Assessment Res Eval 2017;22:1-14. https://doi.org/10.1201/9780203739976-1 DOI: https://doi.org/10.1201/9780203739976-1
Schuwirth LW, van der Vleuten CP, Donkers HH. A closer look at cueing effects in multiple-choice questions. Med Educ 1996;30(1):44-9. https://doi.org/10.1111/j.1365-2923.1996.tb00716.x DOI: https://doi.org/10.1111/j.1365-2923.1996.tb00716.x
Rodriguez MC, Kettler RJ, Elliott SN. Distractor functioning in modified items for test accessibility. SAGE Open 2014;4(4). https://doi.org/10.1177/2158244014553586 DOI: https://doi.org/10.1177/2158244014553586
Office of Educational Assessment UoW. Understanding item analyses. Available from https://www.washington.edu/assessment/scanning-scoring/scoring/reports/item-analysis/
McDowall D, McCleary R, Meidinger EE, Hay RA. Interrupted time series analysis. Newbury Park, CA: Sage Publications, 1980. https://doi.org/10.4135/9781412984607 DOI: https://doi.org/10.4135/9781412984607
15. Mandin H, Harasym P, Eagle C, Watanabe M. Developing a "clinical presentation" curriculum at the University of Calgary. Acad Med 1995;70(3):186-93. https://doi.org/10.1097/00001888-199503000-00008 DOI: https://doi.org/10.1097/00001888-199503000-00008
Ali SH, Carr PA, Ruit KG. Validity and reliability of scores obtained on multiple-choice questions: why functioning distractors matter. J Schol Teach Learn 2016;16:1-14. https://doi.org/10.14434/josotl.v16i1.19106 DOI: https://doi.org/10.14434/josotl.v16i1.19106
Hudson J, Fielding S, Ramsay CR. Methodology and reporting characteristics of studies using interrupted time series design in healthcare. BMC Med Res Methodol 2019;19(1):137. https://doi.org/10.1186/s12874-019-0777-x DOI: https://doi.org/10.1186/s12874-019-0777-x
Linden A. Conducting interrupted time-series analysis for single- and multiple-group comparisons. Stata J. 2015;15:480-500. https://doi.org/10.1177/1536867X1501500208 DOI: https://doi.org/10.1177/1536867X1501500208
Jiang S, Wang C, Weiss DJ. Sample size requirements for estimation of item parameters in the multidimensional graded response model. Front Psychol 2016;7:109. https://doi.org/10.3389/fpsyg.2016.00109 DOI: https://doi.org/10.3389/fpsyg.2016.00109
Strauss V. The real problem with multiple-choice tests. The Washington Post2013
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Janeve Desy, Adrian Harvey, Sarah Weeks, Kevin D Busche, Kerri Martin, Michael Paget, Christopher Naugler, Kevin Mclaughlin
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Submission of an original manuscript to the Canadian Medical Education Journal will be taken to mean that it represents original work not previously published, that it is not being considered elsewhere for publication. If accepted for publication, it will be published online and it will not be published elsewhere in the same form, for commercial purposes, in any language, without the consent of the publisher.
Authors who publish in the Canadian Medical Education Journal agree to release their articles under the Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 Canada Licence. This licence allows anyone to copy and distribute the article for non-commercial purposes provided that appropriate attribution is given. For details of the rights an author grants users of their work, please see the licence summary and the full licence.