Sustainability of physical exam skills in a resident-led curriculum in a large internal medicine program with competency based medical education

Background Competency Based Medical Education (CBME) designates physical examination competency as an Entrustable Professional Activity (EPA). Considerable concern persists regarding the increased time burden CBME may place on educators. We developed a novel physical examination curriculum that shifted the burden of physical examination case preparation and performance assessment from faculty to residents. Our first objective was to determine if participation led to sustainable improvements in physical examination skills. The second objective was to determine if resident peer assessment was comparable to faculty assessment. Methods We selected physical exam case topics based on the Objectives of Training in the Specialty of Internal Medicine as prescribed by the Royal College of Physicians and Surgeons of Canada. Internal Medicine residents compiled evidence-based physical exam checklists that faculty reviewed before distribution to all learners. Physical exam practice sessions with whole-group demonstration followed by small-group practice sessions were performed weekly. We evaluated this pilot curriculum with a formative OSCE, during which a resident peer and a faculty member simultaneously observed and assessed examinee performance by. Results Participation in the novel curriculum practice sessions improved OSCE performance (faculty score mean 78.96 vs. 62.50, p<0.05). Peer assessment overestimated faculty scores (76.2 vs. 65.7, p<0.001), but peer and faculty assessments were highly correlated (R2 = 0.73 (95% CI 0.50-0.87). Conclusion This novel physical examination curriculum leads to sustainable improvement of physical examination skills. Peer assessment correlated well with the gold standard faculty assessment. This resident-led physical examination curriculum enhanced physical examination skills in a CBME environment, with minimal time commitment from faculty members.


Introduction
With the advent of Competency Based Medical Education (CBME), physical examination is a core skill designated as a consistent milestone of many Entrustable Professional Activities (EPA). 1 However, since the 1960s, physical examination proficiency amongst trainees continues to remain below expectation, [2][3][4][5][6][7] and its importance and emphasis have waned over the decades. 8 Numerous educational interventions have shown variable success, including structured curriculum, 8 multimedia-assisted teaching, 9 simulation training, 10 feedback, 11 and instructor variation. [12][13][14] The Objectives of Training in Specialty of Internal Medicine of the Royal College of Physicians and Surgeons of Canada (RCPSC) include that trainees should be able to perform "a focused physical examination that is relevant and accurate for... diagnosis and/or management." However, there is no standardized curriculum defining the breadth of scope and depth of knowledge. 15 Thus, trainees continue to have varying expectations for an ill-defined standard. Furthermore, creation of a new physical examination curriculum to be concordant with CBME has the practical challenge of being sustainable given limitations in manpower, organization, finances, and faculty time. Faculty remain concerned that CBME demands greater time investment, when they have other competing priorities. [16][17][18] Thus, how to teach physical examination skills effectively in a CBME environment, in a way that minimizes time commitments for faculty educators, remains uncertain yet a pressing concern.
The authors constructed, implemented, and evaluated a pilot, two-phased, structured, residentled physical examination focused curriculum for the Core Internal Medicine and the General Internal Medicine Fellowship trainees (PGY 1-4) to address CBME requirements. The curriculum was dependent on resident learners, with minimal faculty involvement.
The first objective of our study was to determine if participation in the pilot curriculum led to sustainable improvements in physical examination skills measured by performance on a formative OSCE examination. The secondary objective was to determine if peer assessments were comparable to faculty assessments, determined by simultaneous peer and faculty assessments in a formative OSCE.

Setting
We developed a physical examination curriculum for the Internal Medicine Training Program (PGY 1-4) of Queen's University (Kingston, Canada) which consists of 67 residents.

Development of physical examination curriculum
We selected physical exam topics based on the Objectives of Training in the specialty of Internal Medicine as prescribed by the RCPSC. Internal Medicine residents in their second or third postgraduate year volunteered to compile evidencebased physical exam checklists from a number of recommended physical exam references including The Rational Clinical Examination: Evidence-based clinical diagnosis, 19 Evidence-Based Physical Diagnosis, 20

and Clinical Examination: A Systematic
Guide to Physical Diagnosis. 21 They organized checklists into general inspection, then system-based examination (e.g., Cardiac, Neurological, Abdominal, etc.), and finally evidence-based special tests. Special tests were not considered part of the standard system-based physical examination. We described special tests in the checklist document, with references provided. Checklists were evaluated, refined and finalized by Internal Medicine faculty prior to distribution to all learners (Appendix A).

Implementation of physical examination curriculum
Practice physical examination sessions were incorporated into weekly academic half-day sessions. A single physical examination checklist for a specific physical exam topic (e.g., physical exam for Myasthenia Gravis) was distributed to all learners at least two days prior to the learning activity, with paper copies provided during the activity. These physical exam practice sessions were conducted first as a large group demonstration. This was facilitated by one to two faculty members who were Fellows of RCPSC in Internal Medicine. The faculty facilitator introduced a clinical scenario pertinent to the topic, then the resident checklist creator demonstrated the physical exam scenario with a resident peer as the standardized patient. Follow-up questions were then discussed with the group, and feedback was provided e80 on the performance by the faculty members. The checklist was also reviewed after the physical exam scenario was performed, to assure all physical exam maneuvers were demonstrated correctly. This was followed by small-group breakout practice sessions, subdivided to one of eight available examination rooms, that were facilitated by faculty. Twenty residents contributed one topic each, so twenty topics were taught using this pilot physical examination curriculum, from July 2016 to February 2017.

Assessment of physical examination curriculum
In March 2017, a voluntary formative Objective Structured Clinical Examination (OSCE) was organized, to evaluate physical examination performance on four of the twenty topics that had been taught and practiced during the pilot curriculum. There were 36 residents who participated in the voluntary formative OSCE. Scoring sheets for these stations aligned with the checklists developed for the practice sessions with follow-up questions added to each station to assess critical thinking and knowledge application. This checklist and follow-up questions were used to calculate a raw score. A 10point rubric was used to assign a global, general impression score, with the highest score being 10 and the lowest score being 0 (Appendix B). The raw and global scores were converted to percentages and added to determine combined scores.
Two residents were paired to complete the examination circuit, consisting of four stations. The two residents alternated between being an examiner, marking physical examination performance on the checklist, or the examinee who performed the physical exam upon the resident examiner. No standardized patients were used in any of the OSCE stations. Specific instructions were provided at each station, with two minutes to read the instructions and stem, 10 minutes for the station, and three minutes for feedback. Three identical circuits of four OSCE stations ran simultaneously. In one of the circuits, we performed faculty assessment at each station, by observing through a one-way window that had an unimpeded view of the entire room, and listening via headphones to all sounds produced within the same room. Faculty members at each examination station used the same scoring sheet as resident peer examiners; and there was no communication between faculty and resident peer examiners.
Residents who created checklists were permitted to participate in the formative OSCE. However, if a resident had previously created the checklist for an OSCE physical exam station, he or she was to be the examiner, but not to be examined in such a scenario.

Data Analysis-Objective 1
Multivariable regression models were used to determine if participation in the novel curriculum physical exam practice sessions improved examinee performance on the formative OSCE, after adjusting for the clinical scenario and PGY level of the examinee. Performance on the formative OSCE was defined as the peer raw score (regression model 1) or the faculty raw score (regression model 2).

Data Analysis-Objective 2
Raw, global and combined scores were calculated as percentages. Peer and faculty scores were reported as means and standard deviations (SD) by examinee/examinee PGY level and the station topic.
As the scores were normally distributed groups' scores were compared using paired t-test. Statistical significance was set at p<0.05. The 95% confidence intervals of these comparisons were derived. The correlation between peer and faculty scores was determined using Pearson's Correlation Coefficients.

Ethics
Ethics approval was obtained through Queen's University Health Sciences Research Ethics Board, Identification number 6022756. Informed consent was waived as per the Research Ethics Board approval.  In a multivariate model adjusted for the clinical scenario and PGY level of the examinee, residents who participated in novel curriculum practice sessions showed a non-statistically significant trend to improvement in peer raw score (p=0.06, difference = + 6.5 points). Resident participation in novel curriculum practice sessions were associated with higher faculty raw scores on the formative OSCE in another multivariate model with similar adjustments (p=0.01, difference = +19.8 points) (Table 3).

Objective 2: Peer versus Faculty Assessments
In the formative OSCE exam, peer assessment scores exceeded faculty assessment scores for raw ( (Figure 1). The peer assessment scores were not less than faculty assessment scores in any post-graduate year or clinical scenario. Faculty and peer assessment scores did not differ between post graduate year (data not shown, p>0.05 for all comparisons).
Peer and faculty raw scores were highly correlated, with intra-class correlation (ICC) of 0.73 (95% confidence interval (CI) or 0.50-0.87). Peer and faculty global scores were not well correlated (ICC = 0.39).

Discussion
The advent of CBME by RCPSC-certified internal medicine programs has made physical examination skills as EPAs. How to teach physical examination skills effectively remains a challenge with a number of techniques potentially showing promise: structured curriculum, 8 multimedia-assisted teaching, 9 simulation training, 10 feedback, 11 and instructor variation. [12][13][14] Considerable concerns persist that adoption of CBME will increase demands for faculty time, [16][17][18] and thus physical curriculum, to be compatible with CBME in a sustainable fashion, would benefit from decreasing time demands on faculty. This study evaluated the feasibility of a novel pilot curriculum designed to decrease burden on faculty. This study showed a strong correlation between peer and faculty assessments for the raw, but not global scores. The raw score markings were based upon the combination of an evidence-based physical examination checklist and questions that assessed critical thinking and knowledge application, whereas the 10-point Likert scale was a subjective global rating of the candidate. Considerable bias has been reported in peer assessment in both medical and nonmedical settings, including halo, horns, leniency, strictness, and similar-to-me biases. 13,22,23 Within the medical education literature, the halo and friendship marking effects inflate peer assessment scores by peers compared to faculty, [24][25][26] and inflated peerassessment scores may be the norm in high stakes settings such as medical schools. 24 The magnitude of peer assessment scores' inflation was greater in the subjective global (13.0 points), than the more objective raw (7.0 points) scores. This was likely because of a reduction of the halo and friendship marking effects. On the other hand, the view of the faculty assessor into the formative OSCE room was farther than, and intermittently blocked by the peer assessor, so it is possible that this impaired the faculty assessor's viewpoint to provide an accurate assessment of physical examination maneuvers. However, none of the four faculty members in this trial thought the view was ever hindered sufficiently to impair faculty assessment.
Medical literature confirms correlation between faculty and peer assessment may be low, 27,28 medium, 26,29-32 or high. 33 The extent of correlation is largely predicted by the effect of biases in the assessment tool. In this study, peer raw scores overestimated faculty raw scores to less of an extent than global scores, and peer raw scores (but not global scores) strongly correlated with the faculty scores. Thus, implementation of an evidence-based physical examination checklist may neutralize common biases associated with peer assessment. Therefore, wider adoption of this curriculum and subsequent physical examination assessment should be based upon raw, rather than global, scores. This study confirms that physical examination skills can be learned in the novel curriculum in a sustainable fashion with improved performance on a formative OSCE for those trainees who had previously been taught the physical examination topic. There are a number of reasons why trainees' performance may have improved. Firstly, trainees created the physical examination evidence based checklist. This was done to decrease the workload for faculty physicians, but this act may in itself enhance knowledge retention. 34 Secondly, the pilot curriculum sessions included both single trainee demonstration with faculty, and multiple peers practicing, a combination that has been shown to enhance physical examination skills. 13 Thirdly, physical examination teaching by persons other than faculty physicians may be equally or more effective. [35][36][37] Consequently, adoption of this curriculum should include all components so both intended and unintended benefits are mobilized. It is plausible that additional improvements in physical examination skills may be realized by supplementing this curriculum with other interventions such as physical examination videos: this remains a topic for ongoing research.
It is well established that learners tend to overestimate their skills, in self-assessment, yet the extent of this overestimation decreases with experience. [38][39][40] The same is also true for peer assessment, which overestimated faculty score by 7.0 (peer-raw), 13.0 (peer-global) and 10.5 (peercombined) scores in this study. Peer assessment tends to approximate faculty assessment scores as peer assessors become more experienced. 41,42 This has important implications for curriculum design; peer assessment alone could lead to false confirmation of physical examination competency. Two potential solutions may address this. Firstly, a correction factor can be derived to adjust the peer-

Figure 1. Peer and faculty assessment scores for consanguineous physical exam stations on formative OSCE
e84 raw score to approximate the faculty-raw score. However, this will require intermittent faculty assessment to validate the correction factor prospectively. The alternative solution is to hold frequent faculty assessment without peer assessment. The first option may be more feasible if faculty time is in short supply as is the reality in many academic institutions. However, the required frequency of intermittent validation would warrant further study.
There are a number of important strengths in this study. Firstly, this is a strong report of a physical examination curriculum that can be adopted within the CBME framework that leads to sustainable improvements in trainee physical examination skills. Secondly, this study suggests that many of the biases of peer assessment may be diminished by use of raw scores from an evidence-based checklist, rather than subjective global scores. Thirdly, the curriculum could be generalized to other academic centers within the CBME framework, without substantial impairment of academic resources, including faculty time. In terms of weaknesses, firstly, this is a single center study consisting of one internal medicine program's trainees. However, the Queen's Internal Medicine program includes residents with diverse cultural backgrounds and subspecialty interests, and, thus, the results are likely generalizable to other large internal medicine programs. Secondly, the number of formative OSCE scenarios that were evaluated by both peers and faculty was low. On the other hand, the conclusions found in this study were both meaningful, normally distributed and statistically significant, and thus this did not decrease the importance of the study. Thirdly, this study was unable to determine which of the components of the curriculum led to the improvements in physical examination skills. However, the curriculum was designed to combine multiple educational methods, (self-directed learning, small group learning, and faculty led demonstration) while maintaining longterm sustainability within restricted faculty and department resources. Thus, the determination of which specific component is responsible is not as important as knowing that the entire curriculum can be replicated and delivered in other internal medicine training programs.

Conclusion
This study confirms that peer assessments using raw scores, based on an evidence-based checklist, correlate well to faculty assessments, leaving an allowance for overestimation. A physical examination curriculum using checklist and demonstrations is sustainable in the CBME framework, with minimal faculty time commitment, leading to sustainable improvements in physical examination skills. Further research is required to determine how other cooperative peer run educational interventions could improve physical examination skills even further in this setting.
Conflicts of interest: There are no conflicts of interest for any of the authors.
Funding: There was no funding for this study.
e88 If considering pneumonia, radiographic imaging is required in addition to history and physical exam to reliably rule in/out pneumonia.