Comparing the traditional and Multiple Mini Interviews in the selection of post-graduate medical trainees.

BACKGROUND
The traditional, panel style interview and the multiple mini interview (MMI) are two options to use in the selection of medical trainees with each interview format having inherent advantages and disadvantages. Our aim was to compare the traditional and MMI on the same cohort of postgraduate applicants to the Department of Otolaryngology - Head & Neck Surgery at the University of Toronto.


METHOD
Twenty-seven applicants from the 2010 Canadian Residency Matching Service selected for interview at the University of Toronto, Department of Otolaryngology - Head & Neck Surgery were included in the study. Each applicant participated in both a traditional interview and MMI.


RESULTS
Traditional interviews marked out of a total maximum score of 570. On the traditional interview, scores ranged from 397-543.5 (69.6 - 95.3%), the mean was 460.2. The MMI maximum score was out of 180. MMI scores ranged from 93 - 146 (51.7 - 81.1%) with a mean of 114.8. Traditional interview total scores were plotted against MMI total scores. Scores correlated reasonably well, Pearson Correlation = 0.315 and is statistically significant at p = 0.001. Inter-interview reliability for the two interview methods was 0.038, with poor overall agreement 0.07%.


CONCLUSIONS
MMI and traditional interview scores are correlated but do not reliably lead to the same rank order. We have demonstrated that these two interview formats measure different characteristics. One format may also be less reliable leading to greater variation in final rank. Further validation research is certainly required.


Introduction
A growing body of knowledge is attempting to describe and substantiate the optimal means for the selection of candidates into a medical training program. Much of the selection of medical trainees, particularly at the post-graduate level, depends on the selection committee's assessment of that applicant's attributes both so-called cognitive and non-cognitive.* The traditional panel style interview that has long been the mainstay of assessing interpersonal and behavioural attributes and characteristics has been scrutinized with limitations cited due to interviewer bias, 1 the lack of psychometric robustness and questionable reliability and validity. 2 As a result, selection panels have sought a more optimal platform for the assessment of applicants' non-cognitive traits. The advent and first description of an alternative medical admissions interview, the Multiple Mini-Interview (MMI), comes from work done at McMaster University. 3 Cognitive attributes have traditionally been assessed through performance on written tests and grade point average while measures of other attributes have been assessed by means of letters of reference and traditional interviews. 4 It is often difficult to assess cognitive attributes during medical school as most Canadian medical schools have adopted pass/fail or honors/pass/fail systems. Until recently, medical educators largely relied on a panel-style interview to assess the inter-personal and behavioural attributes and characteristics of an applicant. Since its inception in 2004, there has been mounting evidence supporting the use of the MMI in medical school admissions in place of the TI. Early feasibility data from the MMI demonstrated a reliability of 0.65 on a cohort of undergraduate medical applicants and that this statistic was consistent with other admissions criteria. 3 Evidence suggests that the MMI is a superior assessment tool as compared to the TI because of its ability to hone in on specific skills and attributes. Furthermore, it is regarded favorably by both applicants and interviewers with significant potential for cost saving. 3,5,6 The MMI is also associated with The optimal assessment of attributes other than those embodied in grades and performance reviews of medical trainees applying for residency training is not yet known.
Our objective was to compare the MMI and TI in the same cohort of applicants applying for postgraduate training in the Department of Otolaryngology -Head & Neck Surgery at the University of Toronto. We aimed to objectively compare the two major modalities of non-cognitive assessment of medical trainees at the postgraduate level. We focused on whether these result in correlated interview scores and whether these lead to congruent rank lists.

Methods
For the 2010 Canadian Residency Matching Service (CaRMS) postgraduate application process, the Department of Otolaryngology -Head & Neck Surgery at the University of Toronto introduced for the first time a dual interview system for the selection of postgraduate trainees. Ethics approval was obtained to develop and administer the dual interview procedure. All interviews were conducted on a single day and the order in which applicants participated in either the TI or MMI was randomly assigned.
The TI consisted of three stations that were each fifteen minutes in duration. There were two raters per traditional station; two stations had two faculty each and one station had two residents (PGY3 and 5). Traditional interviewers were provided with the application packages of the prospective students prior to the interview. Each traditional interviewer could award an applicant a maximum score of 95 points, thus making each traditional interview worth a maximum of 570 points. The resident station also included a 5 mark surgical skills task. For the 2010 cohort, students were tasked with completing a simple interrupted and horizontal mattress suture on a piece of synthetic skin. Questions in the traditional interview focused mainly on applicant background, interest and motivation to pursue a career in otolaryngology -head & neck surgery, extracurricular activities and research activities.
The objectives of the MMI stations were similar to those previously described; evaluation of communication and presentation skills, decision making, and the ability to think critically and to debate a complex issue (skills that clearly required higher level thinking and were not in any way "noncognitive"). 11 Assessors were also given an opportunity to raise a "red flag;" 2 an opportunity to express severe concerns about a candidate's suitability. Scenarios for the 2010 MMIs were based on the following themes: interprofessionalism, the ethical use of the internet, discussion of the CanMEDS competencies, managing an awkward situation, a controversial cancer drug, and preferential access to health care. The MMI portion of the interview consisted of six ten-minute stations, each with a single rater. MMI interviewers were blinded to the applicants' backgrounds and other application material and had received only the name of the student they would be interviewing. Average traditional total scores were plotted against total MMI scores for each candidate (Figure 1). A Pearson correlation coefficient calculation yielded a statistically significant moderate correlation (r = 0.315; p = 0.001). Table 4 categorizes candidates by MMI rank in descending order and their respective rank on the traditional interview. Although the two scoring methods were moderately correlated (Figure 1) there was a very poor inter-interview agreement on final rank (Table 4) as demonstrated by a kappa statistic of 0.038. The interview survey responses were categorized according to morning and afternoon MMI and a single TI sitting (Table 1).

Discussion
The selection of medical trainees both at the undergraduate and postgraduate levels can be challenging and is not an exact science. Educators experience great difficulty in selecting the "best" candidates from a relatively homogenous pool of highly qualified applicants. Ultimately, selection relies on the assessment of the so-called cognitive and non-cognitive attributes of the applicant. y = 0.2642x + 46.353 R² = 0.3153 Our study is the first to compare the MMI and TI in the same postgraduate applicant cohort. In doing so, we have observed that MMI and TI techniques are correlated in score but do not reliably lead to the same rank order. We have shown that these two interview techniques measure different characteristics as demonstrated by the variations in rank order. Alternatively, one technique may be less reliable leading to greater variation in final rank. Using factor analysis to correlate scores between their MMI stations, Lemay et al. 6 showed that the attributes of advocacy, ambiguity, collegiality and collaboration, empathy, ethics, honesty and integrity, responsibility and reliability, and selfassessment could be independently evaluated in the MMI setting. We believe that many of these core qualities were evaluated in our MMI as well. In the TI, candidates were also given the opportunity to discuss their educational background, extracurricular activities and desire to be an otolaryngologist -head & neck surgeon.

Correlation between MMI and Average Traditional Interview Scores
Interviewee responses to the two interview types were generally similar with some important exceptions. All TI respondents felt that this type of interview allowed for accurate portrayal of their abilities, compared with a smaller proportion feeling this way about the MMI. In addition, most felt that the MMI was anxiety provoking as compared to the TI. As this is the first such dual interview process at our institution, it is difficult to determine whether these differences observed are due to inherent differences in the interview types, or whether this dichotomy exists secondary to the unfamiliarity and lack of experience with the MMI style interview.
The development of the final rank order list of the candidates warrants discussion. Candidates are evaluated by several mechanisms. The interview portion, as previously mentioned, consists of the TI, MMI and surgical skills station. The final ranking committee reviewed the interviewee application files: the details of electives, medical school transcripts, reference letters, curriculum vitae and letter of intent. Following the interviews, resident and faculty comments about the interviewed candidates were reviewed. All of this information was reviewed by the rank committee and the final rank order list was subsequently generated. Final candidate ranking is a combination of file reviews, interviews, comments and debate. e12 The present study has several limitations. Firstly, our sample size of 27 is small when compared to studies assessing undergraduate applicants. Our sample is from a single CaRMS cohort applying to a relatively small postgraduate training program. When compared to other otolaryngology -head & neck surgery programs in Canada, this would actually represent the largest applicant cohort in the country. Secondly, bias likely affects the final ranking of candidates. Traditional interviewers are not blinded to candidates, in that personal letters and applications are thoroughly reviewed prior to interviews. There is likely less bias with the MMI because these interviewers are only given the candidates' names. Final rank order lists are generated through selection committee deliberation and analysis of discordant interview scores. These results are not included in our study as our goal was not to assess whether MMI or TI predicted rank but rather whether they were correlated and whether they led to the same rank order. Another weakness is the lack of objective results of how these applicants perform during residency through cognitive measures, for example through National In-Training Exams, and non-cognitive measure, for example, rotation evaluations. This, however, would be very difficult to remedy because only those with the highest MMI/TI scores gained entry to the program and thus we would lose data on the remaining 22 applicants in such an analysis.
Several questions also arise from the interview analysis that leads to further investigation. For instance, the specific MMI scenarios evaluate specific traits and skills, but the scenario is selected arbitrarily, without a test blueprint or overall plan. An evaluation of each specific scenario is warranted to determine if all MMIs are equal and what happens when different MMI scenarios are used to arrive at a final MMI score. Furthermore, what is the effect that previous experience with the MMI has on performance? Are those students who participated in an MMI for undergraduate medicine likely to do better on the postgraduate interview? In addition, we continue to speculate on how to combine the data from the two interview types if indeed candidates will be asked to participate in both. More research is needed to determine which information is important from each of the interview types and what selection committees are to do if there are discordant interview scores.
Despite the above limitations, this study adds to the medical education literature. Firstly, we have replicated a correlation between the MMI and TI previously reported in the literature. Secondly, this is the first study to test the two interview types on the same postgraduate applicant cohort. Finally, we offer caution to medical educators about the appropriateness of using one interview type over the other: the two may actually be measuring different attributes and synergistically provide more information to a selection committee than either interview alone.