Predictive value of grade point average (GPA), Medical College Admission Test (MCAT), internal examinations (Block) and National Board of Medical Examiners (NBME) scores on Medical Council of Canada qualifying examination part I (MCCQE-1) scores.

BACKGROUND
To determine whether the pre-medical Grade Point Average (GPA), Medical College Admission Test (MCAT), Internal examinations (Block) and National Board of Medical Examiners (NBME) scores are correlated with and predict the Medical Council of Canada Qualifying Examination Part I (MCCQE-1) scores.


METHODS
Data from 392 admitted students in the graduating classes of 2010-2013 at University of Manitoba (UofM), College of Medicine was considered. Pearson's correlation to assess the strength of the relationship, multiple linear regression to estimate MCCQE-1 score and stepwise linear regression to investigate the amount of variance were employed.


RESULTS
Complete data from 367 (94%) students were studied. The MCCQE-1 had a moderate-to-large positive correlation with NBME scores and Block scores but a low correlation with GPA and MCAT scores. The multiple linear regression model gives a good estimate of the MCCQE-1 (R2 =0.604). Stepwise regression analysis demonstrated that 59.2% of the variation in the MCCQE-1 was accounted for by the NBME, but only 1.9% by the Block exams, and negligible variation came from the GPA and the MCAT.


CONCLUSIONS
Amongst all the examinations used at UofM, the NBME is most closely correlated with MCCQE-1.


Introduction
have long debated about the appropriateness of various formats of assessment of undergraduate medical students. Starting from their admission to a Canadian medical school based on the MCAT and GPA scores, students undergo several pre-clerkship (Year 1 and Year 2 Block examinations) and clerkship examinations (Year 3 and Year 4), followed by the final licentiate examination MCCQE-1. Although NBME subject examinations at the end of clerkship rotations are administered in most of the Canadian medical schools, many schools do not use them. The primary objective of this article is to address the important issue of how the MCCQE-1 can be explained by various examination scores, especially NBME scores at the University of Manitoba (UofM). This study will explore the relationship between MCCQE-1 and the explanatory variables such as NBME scores, Year 1 and Year 2 Block scores, GPA and MCAT scores, thereby evaluating the comparative importance of these local assessment programs with an external standardized exam, MCCQE-1. The knowledge of this relationship will help to identify at-risk students in that specific assessment which is most closely related to MCCQE-1, thereby providing support to the students at risk of failing the MCCQE-1. This study makes a strong case to introduce NBME examinations in all the medical schools across Canada.
Similar to most medical schools in Canada and the United States, the UofM offers a 4-year medical curriculum which is divided into a pre-clerkship phase (first two years) and a clerkship phase (last two years). Each academic year of pre-clerkship is composed of three Blocks, at the end of which, an examination consisting of multiple choice and short answer questions is administered. The Year 1 Block average is the mean performance of Blocks 1, 2 and 3 while the Year 2 Block average is the mean performance of Blocks 4, 5 and 6. The National Board of Medical Examiners (NBME) subject examinations are administered at the end of the clerkship rotations in each of Internal Medicine, Surgery, Pediatrics, Psychiatry and Obstetrics/Gynecology. Combined NBME is the mean performance of these 5 major NBME examinations.
The MCCQE-1 is a summative examination that all medical graduates must pass as a component to obtaining licensure in Canada. Administered by the Medical Council of Canada, MCCQE-1 is a computerbased test comprising of two sections -Multiple Choice Questions (MCQ) (worth 75%) and shortmenu and short-answer case questions -Clinical Decision Making (CDM) (worth 25%). 1 The standard score scale is from 50-950 with a minimum passing score of 390.
The Medical College Admission Test (MCAT) examines the applicant's knowledge and problem solving skills in three multiple choice sections: Physical Sciences (Physics, Chemistry); Biological Sciences; and Verbal reasoning. This is a preadmission test taken by all medical school applicants at the UofM.
Grade point average (GPA) is the adjusted score for all undergraduate courses completed.
This study was conducted using the performance data of the medical students from the UofM graduating classes of 2010, 2011, 2012 and 2013. The purpose of this study was to determine whether the two yearly block average scores and the five NBME subject scores had any significant correlation with the MCCQE-I scores and to determine whether the pre-admission test measures GPA and MCAT had any significant correlation with the post admission performance of Block, NBME and MCCQE-1. NBME subject examinations are used by many Canadian medical schools to evaluate students during their clerkship rotations. Although several studies have shown a strong correlation between NBME subject scores and United States Medical Licensing Examination (USMLE) Step 1 and Step 2 scores, 2-7 there is little published research comparing NBME with MCCQE-1.

e49
According to the theory of "Context Specificity," academic performance in one subject/context does not necessarily correlate strongly with the performance in another subject/context for a given student. Using this framework, the following four hypotheses were stated: H 1: Performance of individual NBME subject exam scores will have a small-to-moderate correlation with overall performance on MCCQE-I examinations.
The combination of all NBME subject exam scores will account for a moderate-to-large amount of variance in MCCQE-I scores since the questions and syllabus topics of those NBME subjects match to a large extent with MCCQE-I.

Setting and sample size
The initial study sample consisted of data collected from 392 students who were admitted to graduating classes of 2010, 2011, 2012 and 2013. The final analysis was done using the complete data from only 367 students, which represented 94% of the initial population. Giving a year wise break-up, these 367 students represent 89, 85, 95 and 98 students respectively for the years 2010, 2011, 2012 and 2013. This overall decrease of 25 students was due to various reasons. Some students dropped out of the program or took a leave of absence after taking some of their examinations, while some have failed or skipped examination(s) due to illness or family emergency, thus resulting in incomplete data. Moreover, all of the recorded data are first time scores in NBME and MCCQE-1 and no remediated scores were included in the process. Since the missing cases account for just 6% of the initial population and seeing the trend of the recorded data in those missing cases, it is assumed that they would not impact the results of this study to any considerable extent.

Study design variables and statistical analysis
Data relating to the students' MCAT, GPA plus the performance on their six Block and five NBME examinations were collected for the four graduating classes of 2010-2013. Finally MCCQE-1 scores were collected for all the students.
The primary outcome variable (dependent variable) was the MCCQE-1 score while the explanatory variables (independent variables) were the five NBME subject exam scores, Year 1 block average, Year 2 block average, MCAT and GPA. Year-wise demographic characteristics were studied to assess the gender ratio and mean MCAT and GPA scores of admitted students. Descriptive statistics depicting the minimum, maximum, mean and standard deviation (SD) scores of all the students for the study variables were determined, followed by Karl Pearson's bivariate correlation coefficient matrix between MCCQE-1 score and several explanatory variables.
The bivariate correlation matrix quantifies the degree to which two variables are related. Correlation does not fit a line through the data points. The computation of the pair-wise correlation coefficient (r) tells us how much one variable tends to change when the other one does. When r is positive, there is a trend that performance in one variable goes up as the performance in the other one goes up. Linear regression finds the best line that predicts the dependent variable from a set of independent variables. In other words, correlation helps to determine whether students who are good at one subject tend to be good at another subject as well, while regression allows one to determine whether the marks in one subject can be predicted for a given mark in another subject. Taking advantage of the correlation matrix which illustrated a very negligible pairwise correlation of MCAT and GPA with MCCQE-1 (reason for not considering them in the multiple linear regression), a multiple linear regression model was fitted to the data to estimate the probable MCCQE-1 score based on the NBME combined and the combined yearly block average score. R 2 was computed to find the overall fit of the e50 model. Finally, stepwise linear regression was conducted to further investigate the association between the explanatory variables and the only outcome variable, MCCQE-1.
Prior to data analysis, an examination of the regression model assumptions indicated a satisfactory level of homoscedasticity, linearity and normality. Multicollinearity is a problem that occurs with regression analysis when there is a high correlation of at least one independent variable with a combination of the other variables. Some people suggest to look at the correlation matrix and see if any independent variable correlate above some level (may be 0.75 or even 0.90) with one another. Ideally, this view doesn't go far enough to recommend a presence of multicollinearity since intuitively this correlation describes a bivariate relationship, whereas collinearity is a multivariate phenomenon.
Certainly if one has variables correlated above 0.90, one should not include both in the regression equation. But even with values around 0.70, one should proceed. Although high pair-wise correlations could be the first indicator of collinearity problems, one should not confirm the presence of multicollinearity just by examining the correlation matrix. One should compute the collinearity diagnostics comprising of the Tolerance (T) statistic and the Variance Inflation Factor (VIF). To compute a Tolerance statistic for an independent variable, a multiple regression is performed with that variable as the new dependent variable and all of the other independent variables in the model as independent variables. Since R 2 is the amount of variance in a dependent variable in a multiple regression explained by a combination of all independent variables, the tolerance statistic T is equal to 1-R 2 . T < 0.20 is generally considered cause of concern. It means that the multiple correlation of the other independent variables with the independent variable is at least 0.90 (because 0.9 x 0.9 = 0.81).
The other statistic, Variance Inflation Factor (VIF) is the reciprocal of the tolerance statistic. A VIF of value greater than 5 is generally considered evidence of multicollinearity. In this study, T > 0.2 and VIF < 5 indicated absence of multicollinearity. Moreover, if the sole purpose of the regression analysis is prediction or forecasting (as in this study), then multicollinearity is not a serious problem while in cases of reliable estimation of parameters (which this study does not aim to do), higher pairwise correlation coefficients can pose serious issues. Another important regression model assumption is the test for "Residual Analysis." Because a linear model is not always appropriate for the data, one should assess the appropriateness of the model by defining residuals and examining residual plots. The residual (e) is the difference between the observed value (y) and the predicted value (ŷ), i.e. e = y -ŷ, with e = 0. A residual plot is a graph of the residual on the vertical axis and each independent variable (x) on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate. Alternatively, a test for linear first order autocorrelation using the Durbin-Watson test can be performed to check whether the residual terms are correlated with each other. The Durbin-Watson test statistic ranges from 0 to 4. A value close to 0 indicates strong positive correlation while a value of 4 indicates strong negative correlation. As a general rule, the residuals are uncorrelated (no autocorrelation) if the Durbin-Watson statistic is approximately 2 (an acceptable range is 1.50 -2.50). For this study, the Durbin-Watson test statistic is 1.91, approximately close to 2, indicating no serial correlation.
Based on the finding that NBME results were more correlated to MCCQE-1 than the yearly Block average scores, individual NBME subject exam scores were first entered in the stepwise regression model followed by yearly Block average scores. Next, the pre-admission scores MCAT, GPA were entered. Significance was set at p < 0.05. The entire statistical analysis was done using SPSS version 21.0 (IBM SPSS Statistics).

Reliability and validity of measurement instruments
An assessment of data integrity plays a significant role in determining if the data are appropriate for statistical analysis. The reliability of a measure, explained as the basis of consistency and accuracy, can be described as the extent to which the instrument yields the same results on repeated trials. The average GPA scores of the UofM students in the graduating classes (2010-2014) of study was consistently measured around 4 while the average MCAT scores of these students was measured e51 around 10.50. For the measuring instrument of Block average, Year 1 Block average had a consistent measure of around 76.81 while Year 2 Block average has a consistent measure of around 75.34 over the four years of study. Although the admission criteria (a minimum score for GPA and MCAT) was consistent across the graduating classes, this analysis of the mean GPA and MCAT scores provides a measure of the fairly consistent intellectual capabilities of the admitted students in the four graduating classes, which gets further reinforced by the students' consistent averages in the Block scores. The reliability of the outcome measure "MCCQE-1" is best captured by studying the internal consistency using Cronbach's alpha, which measures the degree to which the independent variables GPA, MCAT, Block and NBME affect the dependent variable MCCQE-1. When assessing the internal consistency reliability, one is not actually assessing the reliability of the measurement tool, but the reliability of the study data in the context of the measurement tool. There is a general agreement that the Cronbach's alpha rating must be at least 0.70 to be considered acceptable, [9][10][11] although there are many reports that recommend a higher number with acceptable alpha values ranging from 0.70 to 0.95. 12 In this study, the Cronbach's alpha rating is an acceptable internal consistency of 0.82.
An examination of a study measurement tool's validity considers whether a tool collected sufficient data to infer that the trait or construct being surveyed was the actual construct measured. GPA, MCAT and Block scores appear to have face validity since they are inherently designed to measure the traits, capabilities, and knowledge that the candidates require to complete their program of study. For the purpose of this study however, these scores lacks content validity as they are heavily based on basic science and pre-clinical content, and are therefore different from NBME subject examinations and the MCCQE-1 examination which measure candidates' knowledge in the specialised clerkship rotations. Therefore, by undertaking these measurements of reliability and validity, the required checks of data integrity have been satisfied.
In the current study, the Sensitivity (= a / (a + c) x 100%) is the ability of the model to correctly identify students who pass the MCCQE-1 exam. The

Results
After removing the 25 students who missed on some of the requisite exams on schedule, this study was based on 367 (94%) students out of an initial sample of 392 students in the graduating classes of 2010-2013.        Table 6 shows the evaluation of the prediction effects of the multiple regression model to study sensitivity, specificity, positive predictive value, negative predictive value and the accuracy of the model.

Discussion
Considering that passing the MCCQE-1 is a necessary component for physicians to obtain medical licensure and that many of the medical schools across Canada utilize the NBME subject examinations, there is value in demonstrating a relationship between the MCCQE-1 examinations and the NBME subject examinations. The sample data collected from 367 students in the graduating classes of 2010-2013 at the University of Manitoba suggests: 1] A moderate-to-high degree of correlation within the NBME subject exam scores.

2]
A moderate-to-high degree of correlation between the NBME subject exam scores and MCCQE-1 exam scores.
3] A low degree of correlation between MCCQE-1 and MCAT/GPA.
Also, there is a relatively high correlation between MCCQE-1 and NBME combined score as compared to Block average scores.
The stepwise regression analysis was quite interesting. NBME subject exam scores explained about 60% of the variation in MCCQE-1 exams, whereas Block exam scores was only 2% and MCAT/GPA was 0.2%. The most likely reason for this is that NBME and MCCQE-1 exams assess similar content, namely a candidate's knowledge in the specialized clerkship rotations, while Block examinations and MCAT/GPA essentially asses preclinical learning.
One of the strengths of this study is the large sample of students with access to each score throughout their medical school program, as well as their MCAT score and their undergraduate GPA. Moreover, the data integrity has been checked with reference to the various regression model assumptions, especially that of multicollinearity and residual analysis.
This study has several limitations. The data in this study has been collected from a single academic institution (University of Manitoba), but it would have been a stronger study if the data could have been included from several other institutions. Nonetheless, similar studies can be done in other institutions also. Since NBME and MCCQE-1 are national standardized exams, these correlations will likely be reproducible in other institutions. The MCAT and GPA results do not show any significant contribution to MCCQE-1 and hence a multiple regression equation to predict MCCQE-1 has not been involved. Although 60% of the variance in MCCQE-1 performance is being explained by NBME subject scores, it is important not to over-interpret their contributions, as nearly 40% of variance was still unaccounted for. Other factors that could affect the performance in the MCCQE-1 exam, which were not considered in this study include test-taking strategies, fatigue, anxiety, clerkship length and /or timing in the academic year 13-18 as well as the content of electives. Furthermore, it is important to note that the specific variables and the order in which these were entered into the Stepwise regression model can sometimes affect the percentage of variance explained by those variables to MCCQE-1. However, in order to specifically determine the variance accounted for by a combination of Block and NBME exams, the NBME is elected to enter the variables first due to their strong correlation with MCCQE-1 followed by the Yearly Block averages.
In conclusion, the findings of the study strongly suggest that NBME subject exams exhibited moderate to large positive correlation with MCCQE-1 scores and also explain a considerable amount of variance in the performance of MCCQE-1 exams at the University of Manitoba. Since NBME is most e56 closely related to MCCQE-1, undergraduate medical educators interested in predicting performances on national standardized examinations could use this information to plan remediation for students who have struggled on their NBME examinations. This study could also assist individual students to take pro-active remediation measures, by seeking extra learning opportunities and assistance for subjects in which they had poor NBME exam results. Postgraduate medical educators or program directors may be interested in using this NBME information to aid in their selection of residents. Finally, this research initiative may serve to encourage more medical schools in Canada to undertake these standardized NBME examinations in pursuing the broader goal of improving medical education in Canada. This broad objective can be achieved specifically by preparing students with extra help prior to challenging the MCCQE-1 based on their NBME scores wherein schools will want to add them as measures to predict the success on the MCCQE-1.
Since the Medical Council of Canada (MCC) assesses the competence of physicians seeking to practise medicine in Canada by verifying their core skills and knowledge, patients can be confident that their physicians meet the same demanding, consistent standards across the country and hopefully higher MCCQE-1 scores translates into better enhanced knowledge for the future physicians in providing better patient care.