(In)Stability of Test Scores


  • Stefan Merchant Queen's University
  • Jessica Rich Queen's University
  • Don Klinger Waikato University


large-scale testing, G-theory, educational policy, test reliability


Both school and district administrators use the results of standardized, large-scale tests to inform decisions about the need for, or success of, educational programs and interventions. However, test results at the school level are subject to random fluctuations due to changes in cohort, test items, and other factors outside of the school’s control. This study examined year to year changes in school level results on standardized tests delivered in Ontario, Canada. G-theory analyses found that test scores are not stable enough for meaningful conclusions to be made based on year to year changes in school level results. For small and medium sized schools, years of data need to be collected before defensible decisions can be made about trends in test scores. The authors introduce a ‘bounce’ statistic that provides a simple, easy to interpret measure of test score stability.

Author Biography

Stefan Merchant, Queen's University

Ph.D. Candidate | Faculty of Education | Queen's University


