Bias in Student Ratings of Instruction: A Systematic Review of Research from 2012 to 2021


  • Brenda M Stoesz The Centre for the Advancement of Teaching and Learning, The University of Manitoba
  • Amy E. De Jaeger University of Manitoba
  • Matthew Quesnel University of Manitoba
  • Dimple Bhojwani University of Manitoba
  • Ryan Los Univeristy of Manitoba


gender bias, postsecondary education, student evaluation of teaching (SET), teacher evaluations


Student ratings of instruction (SRI) are commonly used to evaluate courses and teaching in higher education. Much debate about their validity in evaluating teaching exists, which is due to concerns of bias by factors unrelated to teaching quality (Spooren et al., 2013). Our objective was to identify peer-reviewed original research published in English from January 1, 2012, to March 10, 2021, on potential sources of bias in SRIs. Our systematic review of 63 articles demonstrated strong support for the continued existence of gender bias, favoring male instructors and bias against faculty with minority ethnic and cultural backgrounds. These and other biases must be considered when implementing SRIs and reviewing results. Critical practices for reducing bias when using SRIs include implementing bias awareness training and avoiding use of SRIs as a singular measure of teaching quality when making decisions for teaching development or hiring and promotion.


*Al-Maamari, F. (2015). Response rate and teaching effectiveness in institutional student evaluation of teaching: A multiple linear regression study. Higher Education Studies, 5(6), 9–20.

*Alauddin, M., & Kifle, T. (2014). Does the student evaluation of teaching instrument really measure instructors’ teaching effectiveness? An econometric analysis of students’ perceptions in economics courses. Economic Analysis and Policy, 44(2), 156–168.

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA publications and Communications Board task force report. American Psychologist, 73(1), 3–25.

*Arnold, I. J. M., & Versluis, I. (2019). The influence of cultural values and nationality on student evaluation of teaching. International Journal of Educational Research, 98, 13–24.

Arreola, R. A. (2007). Developing a comprehensive faculty evaluation system: A guide to designing, building, and operating large-scale faculty evaluation systems. Anker Publishing Company.

*Arrona-Palacios, A., Okoye, K., Camacho-Zuniga, C., Hammout, N., Luttmann-Nakamura, E., Hosseini, S., & Escamilla, J. (2020). Does professors’ gender impact how students evaluate their teaching and the recommendations for the best professor? HELIYON, 6(10).

*Bacon, D. R., Johnson, C. J., & Stewart, K. A. (2016). Nonresponse bias in student evaluations of teaching. Marketing Education Review, 26(2), 93–104.

*Bahous, S. A., Salameh, P., Salloum, A., Salameh, W., Park, Y. S., & Tekian, A. (2018). Voluntary vs. compulsory student evaluation of clerkships: effect on validity and potential bias. BMC Medical Education, 18.

Becker, W. E., & Watts, M. (1999). How departments of economics should evaluate teaching. American Economic Review, 89(2), 344–349.

Benton, S. L., & Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In M. B. Paulsen (Ed.), Higher education: Handbook of theory & research (Vol. 29, pp. 279–326). Springer.

Benton, S. L., & Ryalls, K. R. (2016). Challenging misconceptions about student ratings of instruction. The IDEA Center, 58, 1-22.

*Bianchini, S., Lissoni, F., & Pezzoni, M. (2013). Instructor characteristics and students’ evaluation of teaching effectiveness: Evidence from an Italian engineering school. European Journal of Engineering Education, 38(1), 38–57.

*Blecich, A. A., & Zaninović, V. (2019). Insight into students’ perception of teaching: Case of economic higher education instititution. Journal of Contemporary Management Issues, 24(1), 137–152.

*Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of Public Economics, 145, 27–41.

*Borkan, B. (2017). Exploring variability sources in student evaluation of teaching via many-facet Rasch model. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 15–33.

Braskamp, L. A. ., & Ory, J. C. (1994). Assessing faculty work: Enhancing individual and institutional performance. Jossey-Bass Publishers.

Carrell, S. E., Page, M. E., & West, J. E. (2010). Sex and science: How professor gender perpetuates the gender gap. Quarterly Journal of Economics, 125(3), 1101–1144.

Centra, J. A. (1976). The influence of different directions on student ratings of instruction. Journal of Educational Measurement, 13(4), 277–282.

Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. Jossey-Bass.

Centra, J. A. (2003). Will teachers receive higher student evaluations by giving higher grades and less course work? Research in Higher Education, 44(5), 495–518.

Centra, J. A. (2009). Differences in responses to the student instructional report: Is it bias? Princeton: Educational Testing Service.

Centre for Teaching Support & Innovation. (2018). University of Toronto’s Cascaded Course Evaluation Framework: Validation Study of the Institutional Composite Mean (ICM).

*Chávez, K. (2020). Exploring bias in student evaluations: Gender, race, and ethnicity. PS: Political Science & Politics, 53(2), 270–274.

Clayson, D. E. (2009). Student evaluations of teaching: Are they related to what students learn?: A meta-analysis and review of the literature. Journal of Marketing Education, 31(1), 16–30.

*Dodeen, H. (2013). Validity, reliability, and potential bias of short rorms of students’ evaluation of teaching: The case of UAE university. Educational Assessment, 18(4), 235–250.

*Esarey, J., & Valdes, N. (2020). Unbiased, reliable, and valid student evaluations can still be unfair. Assessment and Evaluation in Higher Education, 45(8), 1106–1120.

*Estelami, H. (2015). The effects of survey timing on student evaluation of teaching measures obtained using online surveys. Journal of Marketing Education, 37(1), 54–64.

*Ewing, A. M. (2012). Estimating the impact of relative expected grade on student evaluations of teachers. Economics of Education Review, 31(1), 141–154.

Fairlie, R. W., Hoffmann, F., & Oreopoulos, P. (2014). A community college instructor like me: Race and ethnicity interactions in the classroom. American Economic Review, 104(8), 2567–2591.

*Fan, Y., Shepherd, L. J., Slavich, E., Waters, D., Stone, M., Abel, R., & Johnston, E. L. (2019). Gender and cultural bias in student evaluations: Why representation matters. Public Library of Science One, 14(2).

*Fassiotto, M., Li, L., Maldonado, Y., & Kothary, N. (2018). Female surgeons as counter stereotype: The impact of gender perceptions on trainee evaluations of physician faculty. Journal of Surgical Education, 75(5), 1140–1148.

*Feistauer, D., & Richter, T. (2018a). The role of clarity about study programme contents and interest in student evaluations of teaching. Psychology Learning and Teaching, 17(3), 272–292.

*Feistauer, D., & Richter, T. (2018b). Validity of students’ evaluations of teaching: Biasing effects of likability and prior subject interest. Studies in Educational Evaluation, 59, 168–178.

Ferguson-Patrick, K. (2011). Professional development of early career teachers: A pedagogical focus on cooperative learning. Issues in Educational Research, 21(2), 109–129.

*Fischer, E., & Hänze, M. (2019). Bias hypotheses under scrutiny: Investigating the validity of student assessment of university teaching by means of external observer ratings. Assessment & Evaluation in Higher Education, 44(5), 772–786.

*Flegl, M., & Andrade Rosas, L. A. (2019). Do professor’s age and gender matter or do students give higher value to professors’ experience? Quality Assurance in Education: An International Perspective, 27(4), 511–532.

*Fogarty, T. J., Jonas, G. A., & Parker, L. M. (2013). The medium is the message: Comparing paper-based and web-based course evaluation modalities. Journal of Accounting Education, 31(2), 177–193.

Galbraith, C., Merrill, G., & Kline, D. (2012). Are student evaluations of teaching effectiveness valid for measuring student learning outcomes in business related classes? A neural network and Bayesian analyses. Research in Higher Education, 53(3), 353–374.

*Gith, E. (2020). The impact of the Israeli-Palestinian conflict on thinking biases in teaching evaluations. Peace and Conflict: Journal of Peace Psychology, 26(1), 92–95.

*Goos, M., & Salomons, A. (2017). Measuring teaching quality in higher education: Assessing selection bias in course evaluations. Research in Higher Education, 58(4), 341–364.

Gravestock, P., & Gregor-Greenleaf, E. (2008). Student course evaluations: Research, models and trends.

*Griffin, T. J., Hilton III, J., Plummer, K., & Barret, D. (2014). Correlation between grade point averages and student evaluation of teaching scores: Taking a closer look. Assessment & Evaluation in Higher Education, 39(3), 339–348.

*Gupta, A., Garg, D., & Kumar, P. (2018). Analysis of students’ ratings of teaching quality to understand the role of gender and socio-economic diversity in higher education. IEEE Transactions on Education, 61(4), 319–327.

Heffernan, T. (2021). Sexism, racism, prejudice, and bias: A literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 0(0), 1–11.

Hoffman, F., & Oreopoulos, P. (2009). A professor like me: The influence of instructor gender on college achievement. Journal of Human Resources, 44(2), 479–494.

Hofstede, G. (1986). Culutral differences in teaching and learning. International Journal of Intercultural Relations, 10, 301–320.

*Jobu Babin, J., Hussey, A., Nikolsko-Rzhevskyy, A., & Taylor, D. A. (2020). Beauty premiums among academics. Economics of Education Review, 78, 102019.

*Laupper, E., Balzer, L., & Berger, J.-L. (2020). Online vs. offline course evaluation revisited: testing the invariance of a course evaluation questionnaire using a multigroup confirmatory factor analysis framework. Educational Assessment, Evaluation and Accountability, 32(4), 481–498.

Lindqvist, A., Sendén, M. G., & Renström, E. A. (2020). What is gender, anyway: A review of the options for operationalising gender. Psychology and Sexuality, 1–13.

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106.

*Liu, O. L. (2012). Student evaluation of instruction: In the new paradigm of distance education. Research in Higher Education, 53(4), 471–486.

Llamas, J. D., Nguyen, K., & Tran, A. G. T. T. (2021). The case for greater faculty diversity: examining the educational impacts of student-faculty racial/ethnic match. Race Ethnicity and Education, 24(3), 375–391.

Louie, D. W., Poitras-Pratt, Y., Hanson, A. J., Ottmann, J. (2017). Applying Indigenizing Principles of Decolonizing Methodologies in University Classrooms. Canadian Journal of Higher Education/Revue canadienne d'enseignement supérieur, 47(3), 16–33.

*Macfadyen, L. P., Dawson, S., Prest, S., & Gaševic, D. (2016). Whose feedback? A multilevel analysis of student completion of end-of-term teaching evaluations. Assessment & Evaluation in Higher Education, 41(6), 821–839.

*Magel, R. C., Doetkott, C., & Cao, L. (2017). A study of the relationship between gender, salary, and student ratings of instruction at a research university. NASPA Journal About Women in Higher Education, 10(1), 96–117.

*Maricic, M., Dokovic, A., & Jeremic, V. (2019). The validity of student evaluation of teaching: Is there a gender bias? Croatian Journal of Education, 21(3), 743–775.

Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319–383). Springer.

*Martin, L. L. (2016). Gender, teaching evaluations, and professional success in political science. PS: Political Science & Politics, 49(2), 313–319.

McPherson, M. A., & Jewell, R. T. (2007). Leveling the playing field: Should student evaluation scores be adjusted? Social Science Quarterly, 88(3), 868–881.

Medina, M. S., Smith, T., Kolluru, S., Sheaffer, E. A., & ViVall, M. (2019). A review of strategies for designing, administering, and using student ratings of instruction. American Journal of Pharmaceutical Education, 83(5), 753–764.

*Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. PS: Political Science & Politics, 51(3), 648–652.

Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Physical Therapy, 89(9), 873–880.

*Nargundkar, S., & Shrikhande, M. (2014). Norming of student evaluations of instruction: Impact of noninstructional factors. Decision Sciences Journal of Innovative Education, 12(1), 55–72.

Nicolaou, M., & Atkinson, M. (2019). Do student and survey characteristics affect the quality of UK undergraduate medical education course evaluation? A systematic review of the literature. Studies in Educational Evaluation, 62, 92–103.

*Okoye, K., Arrona-Palacios, A., Camacho-Zuniga, C., Hammout, N., Nakamura, E. L., Escamilla, J., & Hosseini, S. (2020). Impact of students evaluation of teaching: A text analysis of the teachers qualities by gender. International Journal of Educational Technology in Higher Education, 17(1).

Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan-a web and mobile app for systematic reviews. Systematic Reviews, 5(1), 1–10.

*Palali, A., van Elk, R., Bolhaar, J., & Rud, I. (2018). Are good researchers also good teachers? The relationship between research quality and teaching quality. Economics of Education Review, 64, 40–49.

*Park, E., & Dooris, J. (2020). Predicting student evaluations of teaching using decision tree analysis. Assessment & Evaluation in Higher Education, 45(5), 776–793.

*Park, H.-S., & Cheong, Y. F. (2018). Correlates of monotonic response patterns in online ratings of a university course. Higher Education, 76(1), 101–113.

*Peterson, D. A. M., Biederman, L. A., Andersen, D., Ditonto, T. M., & Roe, K. (2019). Mitigating gender bias in student evaluations of teaching. Public Library of Science One, 14(5).

*Protogerou, C., & Hagger, M. S. (2020). A checklist to assess the quality of survey studies in psychology. Methods in Psychology, 3(July), 100031.

*Punyanunt-Carter, N., & Carter, S. L. (2015). Students’ gender bias in teaching evaluations. Higher Learning Research Communications, 5(3), 28.

*Radchenko, N. (2020). Student evaluations of teaching: Unidimensionality, subjectivity, and biases. Education Economics, 28(6), 549–566.

Ray, B., Babb, J., & Wooten, C. A. (2018). Rethinking SETs: Retuning student evaluations of teaching for student agency. Composition Studies, 46(1), 34–56.

*Reisenwitz, T. H. (2016). Student evaluation of teaching: An investigation of nonresponse bias in an online context. Journal of Marketing Education, 38(1), 7–17.

*Risquez, A., Vaughan, E., & Murphy, M. (2015). Online student evaluations of teaching: what are we sacrificing for the affordances of technology? Assessment & Evaluation in Higher Education, 40(1), 120–134.

*Rodríguez, A. M., Capelleras, J.-L., & Garcia, V. M. G. (2014). Teaching performance: Determinants of the student assessment. Academia, 27(3), 402–418.

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641.

*Royal, K. D., & Stockdale, M. R. (2015). Are teacher course evaluations biased against faculty that teach quantitative methods courses? International Journal of Higher Education, 4(1), 217–224.

Rubin, D. B. (2009). Multiple imputation for nonresponse in surveys. WILEY.

Schiekirka, S., & Raupach, T. (2015). A systematic review of factors influencing student ratings in undergraduate medical education course evaluations. BMC Medical Education, 15(1), 1–9.

*Schönrock-Adema, J., Lubarsky, S., Chalk, C., Steinert, Y., & Cohen-Schotanus, J. (2013). “What would my classmates say?” An international study of the prediction-based method of course evaluation. Medical Education, 47(5), 453–462.

*Schueths, A. M., Gladney, T., Crawford, D. M., Bass, K. L., & Moore, H. A. (2013). Passionate pedagogy and emotional labor: Students’ responses to learning diversity from diverse instructors. International Journal of Qualitative Studies in Education, 26(10), 1259–1276.

Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., Stewart, L. A., & Group, P.-P. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: Elaboration and explanation. BMJ (Clinical Research Ed.), 349(3), g7647–g7647.

*Socha, A. (2013). A hierarchical approach to students’ assessments of instruction. Assessment & Evaluation in Higher Education, 38(1), 94–113.

Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642.

*Spooren, P., & Christiaens, W. (2017). I liked your course because I believe in (the power of) student evaluations of teaching (SET). Students’ perceptions of a teaching evaluation process and their relationships with SET scores. Studies in Educational Evaluation, 54, 43–49.

Stroebe, W. (2016). Why good teaching evaluations may reward bad teaching: On grade inflation and other unintended consequences of student evaluations. Perspectives on Psychological Science, 11(6), 800–816.

*Sulis, I., Porcu, M., & Capursi, V. (2019). On the use of student evaluation of teaching: A longitudinal analysis combining measurement issues and implications of the exercise. Social Indicators Research, 142(3), 1305–1331.

*Tarun, P., & Krueger, D. (2016). A perspective on student evaluations, teaching techniques, and critical thinking. Journal of Learning in Higher Education, 12(2), 1–13.

Theall, M., & Franklin, J. (2001). Looking for bias in all the wrong places: A search for truth or a witch hunt in student ratings of instruction? New Directions for Institutional Research, 2001(109), 45–56.

*Tomes, T., Coetzee, S., & Schmulian, A. (2019). Prediction-based student evaluations of teaching as an alternative to traditional opinion-based evaluations. Assessment & Evaluation in Higher Education, 44(8), 1222–1236.

*Treischl, E., & Wolbring, T. (2017). The causal effect of survey mode on students’ evaluations of teaching: Empirical evidence from three field experiments. Research in Higher Education, 58(8), 904–921.

*Valencia, E. (2020). Acquiescence, instructor’s gender bias and validity of student evaluation of teaching. Assessment & Evaluation in Higher Education, 45(4), 483–495.

*Wagner, N., Rieger, M., & Voorvelt, K. (2016). Gender, ethnicity and teaching evaluations: Evidence from mixed teaching teams. Economics of Education Review, 54, 79–94.

*Wang, L., & Gonzalez, J. A. (2020). Racial/ethnic and national origin bias in SET. International Journal of Organizational Analysis, 28(4), 843–855.

*Weidman-Evans, E., Hayes, S., & Bigler, T. (2020). Relationship between course evaluations and course grades in six allied health programs. Health Professions Education, 6(4), 612–616.

*Winer, L., DiGenova, L., & Costopoulos, A. (2016). Addressing common concerns about online student ratings of instruction: A research-informed approach. Canadian Journal of Higher Education, 46(4), 115–131.

Wolbring, T. (2012). Class attendance and students’ evaluations of teaching: Do no-shows bias course ratings and rankings? Evaluation Review, 36(1), 72–96.

*Wolbring, T., & Treischl, E. (2016). Selection bias in students’ evaluation of teaching: Causes of student absenteeism and its consequences for course ratings and rankings. Research in Higher Education, 57(1), 51–71.

Wright, S. L., & Jenkins-Guarnieri, M. A. (2012). Student evaluations of teaching: Combining the meta-analyses and demonstrating further evidence for effective use. Assessment and Evaluation in Higher Education, 37(6), 683–699.

*Yueh, H.-P., Chen, T.-L., Chiu, L.-A., Lee, S.-L., & Wang, A.-B. (2012). Student evaluation of teaching effectiveness of a nationwide innovative education program on image display technology. IEEE Transactions on Education, 55(3), 365–369.