Multisource feedback to assess pediatric practice: a systematic review.

INTRODUCTION
The assessment and maintenance of competence for pediatricians has recently received increased attention. The aim of the present study was to investigate further the use of multisource feedback for assessing pediatricians in practice.


METHODS
A systematic literature review was conducted using the electronic databases EMBASE, PsycINFO, MEDLINE, PUBMED, and CINAHL for English-language articles.


RESULTS
762 articles were identified with the initial search and 756 articles were excluded for a total of six studies that met the inclusion criteria for this systematic review. Internal consistency reliability was reported in five studies with α ≥ 0.95 for both subscales and full scales. Generalizability was also reported in two studies with Ep (2) generally ≥ 0.78. These adequate Ep (2) coefficients were achieved with different numbers of raters. Evidence for content, criterion-related (e.g., Pearson's r) and construct validity (e.g., principal component factor analysis) was reported in all 6 studies.


CONCLUSION
Multisource feedback is a feasible, reliable, and valid method to assess pediatricians in practice. The results indicate that multisource feedback system can be used to assess key competencies such as communication skills, interpersonal skills, collegiality, and medical expertise. Further implementation of multisource feedback is desirable.


Introduction
Challenges for pediatrics as a specialty started early in the 20 th century, when it was first accepted as a unique specialty -one that was defined and developed by physicians with the conviction that children and their illnesses require special attention and interest from staff who are highly skilled in their care. Pediatrics as a separate specialty has led to many advances in child health, including eradication of serious diseases such as rickets and scurvy. The establishment of this unique specialty also led to understanding the importance of high standards of child care and better medical education. 1 In recent years, pediatricians continue to face challenges in identifying the best method to evaluate, and provide feedback to, their trainees in order to maintain high standards for graduating pediatricians. Physicians in general and pediatricians in particular have very little opportunity to receive systematic feedback about their practice. This is particularly the case for competencies like professionalism, communication skills, medical knowledge, and interpersonal relationships. It would, of course, be a matter of concern if underperformers, particularly pediatricians, were not detected. This problem can be addressed by introducing an assessment method to identify underperforming trainees and to help them in recognizing their problems and enhancing their performance. 2 Multisource feedback (MSF) has emerged as a common method for assessing communication, professionalism, collaboration, and competence in the workplace. 3 The feasibility, validity, and reliability of this assessment method was demonstrated by research in both industry and healthcare. 3 The use of MSF has gained widespread acceptance and is seen as formative for reflecting on where change is required. Pediatricians complete a selfassessment instrument and receive feedback from medical colleagues (peers), co-workers (e.g., nurses, pharmacists), and patients (or patients' parents or guardians). 4,5 This feedback system using questionnaires by different personnel (the assessed person as well as colleagues, peers and clients) provides a more global perspective than can be provided by one or a few sources alone. 6 Certain characteristics of health professionals such as clinical  skills, personal communication, and patient or client  management,  combined  with improved performance, can be assessed by MSF.
Multisource feedback is gaining acceptance and credibility as a method of providing pediatricians with the required information that helps them in monitoring and improving their performance and maintaining competence. Some studies of MSF have been conducted with pediatricians 7 but there is not yet conclusive evidence about its effectiveness for assessing various competencies such as professionalism, communication skills, medical knowledge, clinical skills and interpersonal relationships.
The main purpose of the present study, therefore, was to conduct a systematic literature review to describe the use of MSF in pediatric settings and to determine its psychometric characteristics and evidence of its validity based on the published literature.

Methods
The guidelines of the Preferred Reporting Items for Systematic (PRISM) reviews and meta-analysis were followed for this systematic review. 8

Information sources and search
A systematic literature search was conducted of English-language studies published from 1975 to October 2012 for the following databases: MEDLINE, EMBASE, CINAHL, PubMed, and PsychINFO. The reference lists of selected articles were searched as well for potential articles about MSF. The following terms were used in the search: multisource feedback, multisource feedback in pediatric settings, 360 degree evaluation, and 360 degree evaluation in pediatric settings.

Study selection criteria
Studies were included if they met the following criteria: published in English, peer review journals, identified factors measured by the instruments, applied to pediatricians or pediatric practice, included information on at least one of feasibility, e88 reliability, generalizability, and validity of the MSF measure used, and described the instrument design. We excluded studies in non-pediatric specialties such as surgery, family medicine, anesthesiology etc., provided only general application and guidelines for MSF without empirical data, reported only about the process of MSF, only reported changes in performance after feedback.

Data collection process
Each article in this study was evaluated by 2 authors (SA, AA) independently based on the title and abstract. Any disagreements between the two coders were solved by retrieving the full article and reviewed by a third coder (AR, SAL). Based on discussions among the four coders, we achieved 100% agreement on studies to be included.
The initial search yielded 762 articles as described in Figure 1. Of these, 103 were duplicates, 405 articles were excluded based on the title, a further 176 articles were excluded based on the abstract and another 72 were eliminated after reading the full article. Finally we agreed on 6 articles to be included in the present study.

Results
As summarized in Figure 1, of the 762 initial articles only 6 met the inclusion criteria and 756 were excluded. One study was published prior to 2005 (in 2004). The remaining five studies were published between the years of 2005 -2010. Two studies were conducted in the USA, another two studies in the UK, and the last two studies in Canada ( Table 1).

Type of assessment instruments
Different instruments were used in the studies. Two studies used the Physician Achievement Review (PAR) 9,10 instrument and another two used the Sheffield Patient Assessment Tool (SPRAT) 11,12 to assess pediatricians. The remaining two studies used single questionnaires with variable numbers of items ranging from 10 to 14 across the instrument. 13,14 The details of the studies are summarized in Tables 1 and  2. The instruments were designed to assess a range of competencies including communication skills, diagnostic and treatment skills, patient relationships, collegiality, leadership, decision making, system based practice, probity, professionalism, and knowledge and judgment (Table 1).

Feasibility
In most of the studies, the response rates were more than 90%, which indicates the feasibility and acceptability of applying such assessment methods.
Most of the studies used the response rate as an indication of feasibility. High response rates support the feasibility of the MSF process. Other papers demonstrated the feasibility of MSF by the time needed to complete the MSF forms (  12 reported that the mean time taken to complete the questionnaire by raters was six minutes. Feedback analysis and preparation of reports took an average of 30 minutes indicating that it is a feasible tool in real practice. In several studies (especially those from Canada), participation in the MSF process is mandated by the regulatory or licensing authorities and, therefore, all pediatricians must participate to continue their medical practice (Table 2). In other studies,(e.g., in the UK and the US) MSF has been developed to assess pediatric residents and pediatricians by licensing authorities and by training programs. It appears feasible, therefore, to employ MSF for both trainees (e.g., residents) and practicing pediatricians.

Reliability and generalizability
Reliability refers to the consistency of the measurement. Reliability coefficients are typically reported as Cronbach's alpha (α) and reflect the internal consistency of the items. MSF instruments should have an α > 0.90, which is typically achieved by most of the MSF instruments. Violato et al. 9 reported reliability coefficients of α = 0.98, 0.98, 0.95, and 0.99, for self, medical colleague, coworker, and patient instruments respectively.
e89 Articles searched through electronic database n = 757 Studies identified from references n = 5 Titles screened for eligibility n = 659 Abstracts screened for eligibility n = 254 Full-text studies assessed for eligibility n = 78 Articles included n = 6 e90  Professionalism covers: Psychosocial skills, psychosocial management, Humanistic qualities, compassion, attitude, professional development, teaching, and professional responsibilities and professional management.
Clinical competence covers: Clinical care, good medical practice, patient care, safe practice, clinical performance, Knowledge, critical thinking, diagnosis, and management of complex problem.
Communication covers: Communication with staff, and interpersonal communication skills, Manager covers: Reporting, self-management, administrative skills, office personnel, access to doctor, practice process, physical office, and physical space.
Interpersonal relationship covers: Relationship with patients, with colleague, with family member, collegiality, collaborator, patient education, information provision, and patients interaction), and the last factor is overall assessment. reliability coefficients of α = 0.90, and 0.96 respectively for parents and co-workers questionnaires. Alternately, the calculation of a 95% CI for mean ratings by varying numbers of raters using generalizability theory is done to determine the number of raters needed to achieve a stable score, if the intent is to determine whether or not the person's performance is satisfactory. 12 In general, to achieve a standard error of measurement (SEM) ≤ 0.40 with the SPRAT instrument, a minimum of 8 raters is required. 15 In the assessment of the SPRAT instrument for 577 pediatricians in training, Archer and associates determined that eight raters using a 24-item survey at a 95% CI provided ratings of a satisfactory level (SEM ≤ 0.40). 11 Several researchers investigated the number of raters and the number of items required to provide stable data on the individual being assessed. This can achieved by employing generalizability theory to derive generalizability coefficients (Ep 2 ). 15 Ep 2 provides a measure of the dependability of the MSF instruments as a function of the various factors that can influence the physicians' ratings. Studies showed that it is possible to achieve adequate Ep 2 > 0.78 with a moderate number of observers. 11 Generalizability was reported in only two studies and it was found that generalizability coefficients ranged from Ep 2 = 0.78 to 0.87 with minimum of 8 peers and about 20 or more patients.  Table 2 did not report generalizability analyses.

Validity
Of the 6 studies included in the present systematic review (Table 1), only one reported evidence of content validity by determining if the content of the instrument was an adequate sample of the domain it was supposed to represent. Enhancing content validity of instruments (sampling of appropriate content and skills) can be achieved by using a table of specifications based on a list of core competency areas and methods to assess them and by having experts systematically review items to ensure that each competency is adequately assessed. 7 Archer et al. 12 reported the content validity for the SPRAT. Two authors wrote the questions, which were field tested in two pilot studies at the Sheffield Children's Hospital. After modification following feedback, the final form contained 24 questions covering five domains, thus achieving content validity.

NR
The aim of this study was to determine if non-faculty ratings of resident's professionalism and interpersonal skills differ from faculty rating. Overall, the 360 degree evaluation ratings for the pediatric residents were high and provided guidance to them their interpersonal and communication skills. NR * NR = not reported e93 Criterion-related-validity was reported as well. Criterion validity refers to the relationship between scores obtained using the instrument and scores obtained using one or more other instruments or measures. Two studies (Table 1)  Evidence for construct validity, which refers to the nature of the psychological construct or characteristic being measured by the instrument, was reported in all of the studies. 9,14 Establishing construct validity can be achieved by studying the relationships among the latent variables or constructs. To do so, exploratory factor analysis can be used to determine the relationship among the variables. Violato et al. 9 conducted a principal component factor analysis to derive a four factor solution for the medical colleague questionnaire accounting for 67.6% of the variance, a three-factor solution for the co-worker questionnaire, accounting for 63.8% of the variance, and a four-factor solution for the patient questionnaire, accounting for 77.6 % of the variance. Lockyer et al. 10 also investigated the construct validity of the MSF instruments with very similar results.
In addition, the mean score was calculated between self-assessment and medical colleague assessment  Brinkman et al. 13 examined the construct validity by comparing the mean score between a control group and an MSF group. The group that received feedback in the form of MSF scored higher than the control group. In addition, the mean score was calculated between time-one co-worker ratings M = 61 (SD = 5.25) and time-two co-worker ratings M = 68 (SD = 5.25) showing that ratings increased from time 1 to time 2.
Chandler et al. 14 examined the construct validity in a different way. The mean score was calculated between self-assessment and assessment by medical colleagues. The mean ratings on the medical colleague instrument (approximately M = 4.85, SD = 0.32) are considerably higher than the self ratings (M = 4.44, SD = 0.43) by more than one standard deviation (p <.01). This is a typical finding as is found in much other MSF research where self ratings are below ratings by others. 9

Discussion
The main findings of the present study are : 1) MSF can be applied to pediatric practice both in residency and for licensing recertification; 2) MSF can assess various competencies such as diagnostic and treatment skills, patient relationships, collegiality, leadership, decision making, system based practice, probity, professionalism, knowledge and judgment, e94 and communication; 3) different raters can be employed, such as medical colleagues, co-workers, supervisors, patients and self-assessment; 4) the MSF system is feasible with typically high response rates to questionnaires which require only a brief period of time to complete; 5) high internal consistency reliability of the instruments can be achieved; 6) as few as 8 raters and 23 patients can achieve an Ep 2 coefficient ≥ 0.78, and 7). There is evidence of validity (content, criterion-related, construct) for the use of MSF in the assessment of pediatric practice.
A number of non-technical competencies such as leadership, decision making, system based practice, probity, professionalism, knowledge and judgment, and communication can effectively and feasibly be assessed using MSF for both pediatric trainees and independently practicing pediatricians. A full MSF model should include data from a self-assessment, medical colleagues (e.g., other pediatricians, referring physicians, anesthesiologists), co-workers (e.g., nurses, office staff), and patients (or patients' relatives or parents).
Across the several studies reviewed, the internal consistency reliability reported is high and typically in excess of α = 0.98. Furthermore, the number of peer or co-worker raters required to assess a pediatrician is around 8. In particular, with welldesigned MSF questionnaires in excess of about 17 items, the accepted standard for a generalizability coefficient of Ep 2 ≥ 0.70 can be achieved.
Nevertheless, approximately 25 patients are required to achieve a similar Ep 2 coefficient.
Evidence for several sources of validity was examined. These include evidence of content, criterion-related and construct validity. Most of the construct validity evidence comes from factor analysis studies that identify the basis of constructs or domains (e.g., communication skills, professionalism, etc.) measured with the different MSF questionnaires. Future research may well include confirmative factor analyses which can provide stronger construct validity evidence. 16 The present systematic review has some limitations. MSF assessments are entirely questionnaire-based and rely on judgment and inference by the assessors and respondents, which are known to be subject to a variety of influences and heuristics. 17 Therefore, criterion-related validity studies of correlations between direct observations of behavior or performance and MSF scores are required to add further evidence of validity. MSF approaches fail to assess aspects of clinical competence reflecting pediatricians' knowledge and skills; these may be more accurately obtained through other methods (e.g., chart reviews, traditional examinations). This systematic review is based on a relatively small number of studies (6) that were published in peerreviewed, English-language journals. Further research should be done to replicate and extend some of the empirical findings, especially generalizability and validity evidence. Meanwhile the current empirical evidence is promising.

Conclusion
This systematic literature review has shown that MSF is a feasible, reliable and valid method in assessing pediatricians in practice as well as pediatric trainees. The results indicate that multisource feedback systems can be used to assess key competencies such as communication skills, interpersonal skills, collegiality, and medical expertise. This feedback system can provide information to pediatricians for future professional development beyond that which can be provided by one or a few sources alone. 6 Although reliability and validity challenges remain, MSF is a promising method for assessing pediatricians across a broad range of competences.