(In)Stability of Test Scores
Mots-clés :
large-scale testing, G-theory, educational policy, test reliabilityRésumé
Both school and district administrators use the results of standardized, large-scale tests to inform decisions about the need for, or success of, educational programs and interventions. However, test results at the school level are subject to random fluctuations due to changes in cohort, test items, and other factors outside of the school’s control. This study examined year to year changes in school level results on standardized tests delivered in Ontario, Canada. G-theory analyses found that test scores are not stable enough for meaningful conclusions to be made based on year to year changes in school level results. For small and medium sized schools, years of data need to be collected before defensible decisions can be made about trends in test scores. The authors introduce a ‘bounce’ statistic that provides a simple, easy to interpret measure of test score stability.
Références
Alberta Ministry of Education. (2021). Student learning assessments. https://www.alberta.ca/student-learning-assessments.aspx
Anderson, J. O., Lin, H. S., Treagust, D. F., Ross, S. P., & Yore, L. D. (2007). Using large-scale assessment datasets for research in science and mathematics education: Programme for International Student Assessment (PISA). International Journal of Science and Mathematics Education, 5(4), 591-614. https://doi.org/10.1007/s10763-007-9090-y
Artuso, A. (2016, February, 28). School rankings raise many questions. The Toronto Sun.
http://www.torontosun.com/2016/02/27/school-rankings-raise-many-questions.
Bolden, B., Christou, T., DeLuca, C., Klinger, D. A., Kutsyuruba, B., Pyper, J., Shulha, L. M., & Wade-Woolley, L. (2014). Collaborative inquiry in Ontario schools. An evaluation report for the Ontario Ministry of Education. Literacy and Numeracy Secretariat.
Brennan, R. L. (2010). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1-21. https://doi.org/10.1080/08957347.2011.532417
Briesch, A. M., Chafouleas, S. M., & Johnson, A. (2016). Use of generalizability theory within k–12 school-based assessment: A critical review and analysis of the empirical literature. Applied Measurement in Education, 29(2), 83-107. https://doi.org/10.1080/08957347.2016.1138955
British Columbia Minstry of Education. (2021). Foundation skills assessment. https://www2.gov.bc.ca/gov/content/education-training/k-12/administration/program-management/assessment/foundation-skills-assessment.
Broglio, S. P., Zhu, W., Sopiarz, K., & Park, Y. (2009). Generalizability theory analysis of balance error scoring system reliability in healthy young adults. Journal of Athletic Training, 44(5), 497-502. https://doi.org/10.4085/1062-6050-44.5.497
Calder, M. (2015). Board working to improve grade 9 EQAO math scores. http://www.ucdsb.on.ca/ucdsbnews/2015-2016SchoolYear/October/Pages/UCDSBGrade9MathEQAOScores.aspx
Canadian Language and Literacy Research Network. (2008). The impact of the literacy and numeracy secretariat: Phase 2 program evaluation. University of Western Ontario.
Cowley, P., & Emes, J. (2020). Report card in Ontario’s elementary schools 2020. Fraser Institute. https://www.fraserinstitute.org/sites/default/files/ontario-elementary-school-rankings-2020-13385.pdf
Earl, L. (2008). Leadership for evidence-informed conversations. In L. M. Earl & H. Timperley (Eds.), Professional learning conversations: Challenges in using evidence for improvement (Vol. 1, pp. 43-52). Springer Science & Business Media.
Earl, L., & Katz, S. (2006). Leading in a data rich world: Harnessing data for school improvement. Corwin.
Educational Quality and Accountability Office. (2017). Ontario student achievement: EQAO’s provincial elementary school report: Results of the assessments of reading, writing and mathematics, primary division (grades 1–3) and junior division (grades 4–6), 2016–2017. https://www.eqao.com/provincial-report-elementary-2017/
Educational Quality and Accountability Office. (2020). About EQAO. https://www.eqao.com/about-eqao/
Gagnon, R., Charlin, B., Lambert, C., Carriere, B., & Van der Vleuten, C. (2009). Script concordance testing: more cases or more questions? Advances in Health Sciences Education, 14(3), 367-375.
Goren, P. (2012). Data, data, and more data—What’s an educator to do? American Journal of Education, 118(2), 233-237.
Hamilton Wentworth District School Board. (2019). HWDSB EQAO results leads to investment in people, practice and progress. https://www.hwdsb.on.ca/wp-content/uploads/2019/09/EQAO-Infographic-2019.pdf
Hastings Prince Edward District School Board. (2012). EQAO results for grades 3, 6 and 9 continue to improve. http://www.hpedsb.on.ca/archives/eqao-results-for-grade-3-6-and-9-continued-to-improve/
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
Hollingshead, L., & Childs, R. A. (2011). Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30(1), 36-43. https://doi.org/10.1111/j.1745-3992.2010.00198.x
Klinger, D. A., DeLuca, C., & Miller, T. (2008). The evolving culture of large-scale assessments in Canadian education. Canadian Journal of Educational Administration and Policy, 76(3), 1–34.
Klinger, D. A., & Rogers, W. T. (2011). Teachers’ perceptions of large-scale assessment programs within low-stakes accountability frameworks. International Journal of Testing, 11(2), 122–143. https://doi.org/10.1080/15305058.2011.552748
Klinger, D. A., Rogers, W. T., Anderson, J. O., Poth, C., & Calman, R. (2006). Contextual and school factors associated with achievement on a high-stakes examination. Canadian Journal of Education, 29(3), 771–797. https://doi.org/10.2307/20054195
Klinger, D. A., & Wade-Woolley, L. (2009). Supporting low performing schools in Ontario. Technical report prepared for the U. S. department of education. WestEd Organization.
Leithwood, K. (2011). School leadership, evidence-based decision making, and large-scale student assessment. In C. Webber & J. Lupart (Eds.), Leading student assessment (pp. 17-39). Springer.
Limestone District School Board. (2017). EQAO results show achievement in some levels continuing to improve. https://www.limestone.on.ca/news/news_releases_2017-2018/e_q_a_o_results_show_achievement_in_some_levels_co
Manitoba Ministry of Education. (n.d.). Assessment and evaluation. https://www.edu.gov.mb.ca/k12/assess/assess_program.html
McDonnell, L. M. (2005). Assessment and accountability from the policy maker’s perspective. In J. Herman & E. Haertel (Eds.), Uses and misuses of data for educational accountability and improvement (104th Yearbook of the National Society for the Study of Education) (pp. 35–54). Blackwell.
McNeish, D. (2017). Small sample methods for multilevel modeling: A colloquial elucidation of REML and the Kenward-Roger correction. Multivariate Behavioral Research, 52(5), 661-670. https://doi.org/10.1080/00273171.2017.1344538
Ontario Ministry of Education. (2010). Growing success: Assessment, evaluation and reporting in Ontario schools. Author. http://www.edu.gov.on.ca/eng/policyfunding/growSuccess.pdf
Prince Edward Island Ministry of Education. (2019). Provincial assessments. https://www.princeedwardisland.ca/en/information/education-and-lifelong-learning/provincial-assessments
Rainbow District School Board. (2016). School valuation framework. https://www.rainbowschools.ca/wp-content/uploads/2016/04/School_Information_Profile.pdf
Renfrew County District School Board. (2016). Board improvement plan for student achievement and well-being kindergarten to grade 12: 2016-2017. https://www.rcdsb.on.ca/en/resourcesGeneral/RCDSBBIPSA2016-2017-1.pdf
Rogers, W. T. (2014). Improving the utility of large-scale assessments in Canada. Canadian Journal of Education/Revue canadienne de l'éducation, 37(3), 1-22.
Scholarhood. (2017). Compare schools & neighbourhoods. We help families find homes in the boundaries of the best schools. www.scholarhood.ca
Toronto District School Board. (2018). Multi-year strategic plan. https://www.tdsb.on.ca/Portals/0/leadership/board_room/Multi-Year_Strategic_Plan.pdf
Ungerleider, C. (2006). Reflections on the use of large-scale student assessment for improving student success. Canadian Journal of Education, 29(3), 873–873. https://doi.org/10.2307/20054200
Upper Canada District School Board. (2018). Board improvement plan for student achievement and wellness 2018-2019. https://p16cdn4static.sharpschool.com/UserFiles/Servers/Server_148343/File/Our_Board/District%20Plans/BIPSAW/BIPSAW%20UCDSB%202018-2019%20Full%20Version.pdf
Volante, L. (2004). Teaching to the test: What every educator and policy-maker should know. Canadian Journal of Educational Administration and Policy, 35, 1-9.
Waterloo Region District School Board. (2016). Standardized test results show room to improve. https://cle.wrdsb.ca/2016/09/22/eqao-message-from-our-director/
Téléchargements
Publié-e
Numéro
Rubrique
Licence
© Stefan Merchant, Jessica Rich, Don Klinger 2022
Cette œuvre est sous licence Creative Commons Attribution 4.0 International.
Les auteurs dont les articles sont publiés dans la Revue acceptent les conditions suivantes :
-
Les auteurs conservent leurs droits d’auteur et accordent à la Revue un droit de première publication, les travaux faisant en même temps l’objet d’une licence d'attribution Creative Commons autorisant d’autres parties à diffuser les travaux, sous réserve d’une mention de l’auteur et de la publication initiale dans la RCAPE.
-
Il est permis aux auteurs de conclure des ententes contractuelles distinctes additionnelles en vue de la diffusion non exclusive de la version de travaux parus dans la Revue (p. ex. pour enregistrement dans un dépôt institutionnel ou inclusion dans un ouvrage), à la condition d’inclure une mention de la parution initiale dans la RCAPE.
-
Les auteurs sont autorisés et encouragés à faire paraître leurs travaux en ligne (p. ex. dans des dépôts institutionnels ou dans leurs sites Web) avant et pendant le processus d’évaluation, ce qui peut déboucher sur des échanges productifs et favoriser et faire que les travaux publiés soient cités plus tôt et plus fréquemment