Abstract
Credible methods for assessing competency in basic perioperative transesophageal echocardiography examinations have not been reported. The authors’ objective was to demonstrate the collection of real-world basic perioperative transesophageal examination performance data and establish passing scores for each component of the basic perioperative transesophageal examination, as well as a global passing score for clinical performance of the basic perioperative transesophageal examination using the Angoff method.
National Board of Echocardiography (Raleigh, North Carolina) advanced perioperative transesophageal echocardiography–certified anesthesiologists (n = 7) served as subject matter experts for two Angoff standard-setting sessions. The first session was held before data analysis, and the second session for calibration of passing scores was held 9 months later. The performance of 12 anesthesiology residents was assessed via the new passing score grading system.
The first standard-setting procedure resulted in a global passing score of 63 ± 13% on a basic perioperative transesophageal examination. The global passing score from the second standard-setting session was 73 ± 9%. Three hundred seventy-one basic perioperative transesophageal examinations from 12 anesthesiology residents were included in the analysis and used to guide the second standard-setting session. All residents scored higher than the global passing score from both standard-setting sessions.
To the authors’ knowledge, this is the first demonstration that the collection of real-world anesthesia resident basic perioperative transesophageal examination clinical performance data is possible and that automated grading for competency assessment is feasible. The authors’ findings demonstrate at least minimal basic perioperative transesophageal examination clinical competency of the 12 residents.
The American Society of Echocardiography and the Society of Cardiovascular Anesthesiologists have established pathways to achieve basic perioperative transesophageal echocardiography (TEE) certification
The objective of this study was to demonstrate the collection of real-world basic TEE examination resident performance data and establish passing scores for each component of the basic TEE examination, as well as a global passing score for clinical performance of the basic TEE examination using the Angoff method
These data suggest that collection of real-world anesthesia resident basic perioperative transesophageal echocardiography clinical performance data is possible and that automated grading for competency assessment is feasible
THE American Society of Echocardiography (ASE; Morrisville, North Carolina) and the Society of Cardiovascular Anesthesiologists (SCA; Chicago, Illinois) have established pathways to achieve basic perioperative transesophageal echocardiography (TEE) certification.1 In addition to passing a written test, a clinician can take one of three pathways to fulfill the basic TEE certification requirements: Supervised Training Pathway, Practice Experience Pathway, or the recently introduced Extended Continuing Medical Education Pathway. Each pathway requires the practitioner to interpret 150 transesophageal echocardiography (TEE) studies and to perform personally at least 50 basic TEE exams under appropriate supervision.2 At present, methods to measure basic TEE clinical competency have not been reported.
As such, there is a need to create a valid and credible model of basic TEE competency, as has been done for other clinical skills.3,4 Such a model would allow for the practice performance component of basic TEE certification to be competency based rather than number based, and it would also provide for clear identification of trainees that may require further instruction before they embark on independent practice. Assessment of procedural competency can be both formative and summative, and the need for it is substantiated by the fact that measured knowledge acquisition on written tests does not reflect clinical competency.5,6 The literature examining basic TEE competency is sparse. Recent reports describe the development, implementation, and efficacy of multimodal nonclinical basic TEE educational strategies for anesthesia residents.7–9 The next step is to complement these educational developments with clinical competency assessment.
Therefore, development of defensible educational policy and a practical, credible strategy to assess competency is warranted. It should include the elements of a formative competency assessment that provides feedback to the trainee, as well as a summative assessment that serves to mark the achievement of basic-TEE competency.9 Given the significant commitment required to implement a substantive basic TEE training program, the assessment method must be pragmatic and not add excessive administrative burden. We chose the Angoff method because it has the “longest history of successful use, even in high-stakes testing situations.”10,11
The Angoff method is commonly used to set passing scores (PSs) for examinations. In this method, as illustrated in figure 1, a group of subject matter experts discusses the minimally competent candidate. The experts independently estimate what percentage of minimally competent candidates would correctly answer each item of an examination. This is the PS for each item. In the original Angoff method, all items are weighted equally; thus, the mean of the PSs for each item provides a PS for the examination.10 The mean of the experts’ examination PSs is the final PS.10,11
The standard error of the experts’ PSs is equal to the SD.10 A low standard error denotes better agreement among the judges and less uncertainty about where the true PS should lie.10 One common modification of the Angoff method that typically improves interexpert agreement involves an additional session wherein the experts receive data regarding real-world performance on the examination and are permitted to modify their estimation of performance of minimally competent candidates on each item, after which an ultimate PS is calculated as above.10
Materials and Methods
The Vanderbilt University Medical Center Institutional Review Board (Nashville, Tennessee) has determined that this investigation did not qualify as research and therefore did not require approval (status of exempt, considered quality improvement project, IRB 140356).
Echocardiography Reporting, Mapping, and Grading
In order to collect anesthesiology resident basic TEE performance data, we created an electronic training report using the Vanderbilt Research Electronic Data Capture (REDCap; Vanderbilt University, USA) database platform (appendix 1).12 The training report is composed of categorical structured data elements, all of which must be complete for successful submission of the report. For each basic TEE exam performed, the resident completes a REDCap report. The resident enters the report during the completion of a full basic TEE exam, without faculty clinical instruction or intervention. Separately, an advanced TEE-certified faculty member completes an advanced clinical TEE report composed of required structured data elements. The advanced report is maintained as a part of the medical record (appendix 2).
To allow grading of resident basic TEE reports, a mapping process was necessary whereby the components of the basic TEE report were matched to the components of the more granular advanced TEE report that our advanced TEE-certified faculty complete for all echocardiogram exams that they perform (table 1). The findings in the basic TEE report are simplified and compared to those in the advanced TEE report. In the basic TEE report, for example, practitioners characterize global ventricular function qualitatively as normal, mildly reduced, or severely reduced rather than quantifying it, and they report regional left ventricular function for the midpapillary segments only. The basic TEE practitioner is expected to recognize moderate or severe valve lesions, which are recorded simply as significant in the training report. The ASE and SCA consensus statement clearly justifies this reporting paradigm with regard to basic TEE.1 The REDCap system compares each assessment in the resident’s echocardiography report with the corresponding assessment in the advanced clinical TEE report and automatically grades it as correct or incorrect based upon whether it agrees with the faculty member assessment. Additionally, although the aforementioned consensus statement discusses the assessments of hypovolemia, venous air embolism, and pulmonary embolism, the advanced TEE report does not routinely capture these assessments.1 Therefore, they are not included in this analysis. The system then calculates global percentage of correct assessments, and a learner dashboard in Tableau (Tableau Software, USA; fig. 2) displays the results. Of note, the system updates this dashboard daily so that the displayed values reflect a resident’s cumulative averages over time rather than the values for an individual exam only. This helps learners to see areas where they may need to invest more effort or gain further instruction as they progress through the TEE training.
All trainees were senior residents who had previously completed at least two rotations of adult cardiac anesthesia. Each received an introduction to the basic TEE components, basic views, and access to a TEE simulator (Heartworks, Inventive Medical, United Kingdom), which was available throughout the rotation. In addition, the residents were given access to a basic TEE textbook and an online self-paced TEE learning platform.13,14 Residents also attended a weekly echocardiography conference where three to four cases with notable echocardiographic findings from the previous week were reviewed.
Residents performed basic TEE on patients with diverse pathology undergoing a wide variety of procedures, including coronary revascularization, valve repair and replacement via open and transcatheter approaches, heart transplantation, ventricular assist device insertion, and postoperative reexploration. These exams were performed under supervision of advanced TEE diplomates some of whom were subject matter experts for this study. After submitting each basic TEE report, the resident received clinical instruction from the faculty concerning the findings on the exam.
Development of PS Standards
In order to determine the PS for the 21 itemized assessments, as well as the global percentage of correct assessments, we convened two Angoff standard-setting sessions. We held the initial PS session before any data analysis of resident performance and the follow-up PS session 9 months later. This two-stage process allowed for further calibration of the initial standard-setting results with actual learner performance. Faculty (Drs. Bick and McEvoy) familiar with the Angoff methodology moderated the sessions.
Seven anesthesiologists who are diplomates of the National Board of Echocardiography with advanced TEE board certification volunteered as subject matter experts. The goal of the initial standard-setting session was to establish an initial PS where the subject matter experts had no access to resident basic TEE performance data. The follow-up session aimed to review and calibrate the initial PS standards in a setting where the subject matter experts had the opportunity to review the performance of 12 anesthesiology residents who performed a total of 371 basic TEE exams.
As in our case, a follow-up standard-setting session is generally necessary when using the Angoff method, as others have noted that “absent all performance data, [subject matter experts] tend to set unrealistically high PSs, which will fail an unreasonably high proportion of students.”11 The follow-up session allows subject matter experts to compare the residents’ performance to the initial PS.
Both sessions began with a formal overview of basic TEE as described by the ASE and SCA consensus statement.1 Next, the standard-setting procedures began with a group discussion of the characteristics of the borderline or minimally competent basic TEE practitioner9 (table 2). Then the subject matter experts completed individual surveys in which they estimated the percentage of assessments the borderline or minimally competent basic TEE practitioner would obtain and interpret correctly while performing the elements of a basic TEE exam (table 3). Each subject matter expert was given time to ask any questions concerning the Angoff method. The follow-up session was essentially identical to the first, with the addition of resident performance data presented to the subject matter experts for review in tabular form (table 3, columns labeled Percentage Correct Assessments ± SD and Initial PS ± SD [%]). The subject matter experts then completed individual surveys identical to those completed in the initial session concerning the borderline practitioner with the option of revising their PS estimates. No further parameters were given regarding revisions. We used this method in order to define minimal competency but not proficiency or expertise. Due to scheduling conflicts, individual follow-up with some subject matter experts was required.
Statistical Analysis
We compared the data generated between standard-setting sessions via a paired Student’s t test for both the global score and individual components. Presence of basic TEE component failures (percentage of correct assessments for a given component less than PS percentage for that component) and global failures (average percentage correct of all component assessments for a given resident less than global PS) using the initial and follow-up PS standards was compared using Fisher exact test. P < 0.05 was considered significant. We report data as mean ± SD unless otherwise noted. We calculated survey averages, SDs, and P values using a paired Student’s t test with Microsoft Excel 2011 (Microsoft Inc., USA).
Results
Our automated system assessed a total of 7,791 individual basic TEE components from 371 basic TEE exams twice (15,582 total comparisons), using the two PS standards from the two Angoff sessions. The itemized mapping scheme for grading is displayed in table 1.
The mean number of basic TEE exams was higher than both the initial and follow-up global PS (table 3). Anesthesia residents performed between 9 and 54 basic TEE exams. The mean number of basic TEE exams performed per resident was 31 ± 13. Table 3 shows the itemized mean Angoff percentages from both standard-setting sessions, as well as the itemized resident mean percentage correct assessments and the calculated global PSs.
Initial PSs
The initial mean global PS for the basic TEE was 63 ± 13%, which correlates to an average of 13 of 21 items being correctly assessed on a basic TEE. Items with a PS greater than 10% above the mean were aortic valve stenosis, aortic valve regurgitation, mitral valve regurgitation, aortic dissection, and circumferential pericardial effusion. Assessment components with the least variability (SD less than or equal to 15%) were left ventricular ejection fraction, aortic stenosis, aortic regurgitation, mitral valve stenosis, mitral valve regurgitation, tricuspid regurgitation, and circumferential pericardial effusion. Portions of the basic TEE with a PS 10% lower than the mean included pulmonic valve stenosis and regurgitation.
Follow-up PSs
The mean follow-up global PS increased to 73 ± 9%, which would correlate to correct assessment of 15 of 21 items. This represents the increasing stringency of requirements for passing and not a change in resident performance (the raw number of correct basic TEE assessments). The follow-up PS increased more than 10% from initial PS (follow-up PS percentage minus initial PS percentage) for a number of components, including motion of all of the six midpapillary ventricular segments, systolic anterior motion of the mitral valve, tricuspid regurgitation, pulmonic stenosis, pulmonic regurgitation, chamber impingement, and atrial shunt.
Of note, when comparing the initial and follow-up PSs, there was no statistically significant difference for any of the 21 components of the basic TEE or for the global PS set by the group (P > 0.05). This comparison involved the scores of the seven raters in session 1 versus session 2. Additionally, when we assessed the number of resident basic TEE performances with the initial PS and then the follow-up PS, the number of individual component failures was significantly increased (table 4; P = 0.01). In specific terms, this means that for an average of 31 basic TEE exams per resident, the cumulative mean of resident performance was assessed as failing in 2 of 252 components (12 residents × 21 specific basic TEE components) with the initial PS and in 12 of 252 with the follow-up PS.
Discussion
Establishing a valid, reliable, and credible PS for assessment of clinical competency in basic TEE is an important step in the evolution of echocardiography training and evaluation. Furthermore, the generation of near real-time anesthesia resident basic TEE performance data is an additional imperative to facilitate clinical competency assessment. Accordingly, we performed a two-stage Angoff standard-setting procedure with seven advanced TEE National Board of Echocardiography-certified subject matter experts. Our study produced a credible itemized and global PS for basic TEE competency assessment for anesthesiology residents. This approach has been successful in several other medical disciplines for standard setting in clinical skills assessments.4,15,16 We have several important findings, which we discuss below and place in the context of the current literature.
First, as described by others, the subject matter experts in our study set a higher bar for passing in their follow-up PS standard-setting session than in the initial session.4,15,16 Due to the normal process of having a small number of experts perform the Angoff method (normally 6 to 10; we had 7), our results were not statistically significant by each component. However, when grouped together as representing all of the component assessments for both sessions, there was a significant increase in PS, representing a more stringent standard for obtaining a PS.
Additionally, the PS for each of the six components of the basic TEE report that correspond to regional wall motion assessment increased by 13 to 14% from the first standard-setting session, albeit with P values less than 0.055, suggesting a trend of increased performance expectations from our experts concerning these components. In other studies, such changes have been reported to be due to increased familiarity with the evaluation scheme.3 We believe that increased familiarity, as well as knowledge of the actual graded clinical performance of residents on several hundred basic TEE exams, led to a clearer characterization of the borderline resident, which is a key component of the Angoff method and a cornerstone of this demonstration project. Each subject matter expert had been involved in training residents and fellows for years before the initial session. Thus, we expect the continued exposure to trainees in the interim to have been less important than the access to quantitative performance data in increasing the PS at the second standard-setting session.
Second, our findings demonstrate that the PSs of both sessions were below the average basic TEE residents’ performance over the course of a dedicated basic TEE training curriculum. This is despite the fact that the global PS after the follow-up standard-setting session was increased to more than 70%. The average resident basic TEE global performance exceeded all of the expert faculty basic TEE performance standards determined with both sessions using the Angoff method. However, one resident who performed 54 basic TEE exams failed to meet the global left ventricular function PS. Another resident who performed 28 basic TEE exams failed to meet the circumferential pericardial effusion and global left ventricular function PSs. If these data were generated from a written test as opposed to clinical skill performance measurement, educational policymakers may be inclined to conclude that the test is insufficiently stringent. However, in this case, the content of the test (incorporated into our basic TEE training report) has been prescribed by the ASE and the SCA.1 Therefore, we conclude that this group of residents may be at least minimally competent in basic TEE according to our expert consensus using the Angoff PS setting method. In fact, given the mapping matrix that we used for the performance assessment, all of the residents would have passed even if the global and component PS was 80% or higher.
Third, our findings demonstrate the feasibility of collecting itemized basic TEE reports, automatically grading them, and automatically displaying learner dashboards with minimal administrative oversight by the rotation director. This granularity of data collection with immediate access for ongoing review allowed the faculty to evaluate summary statistics from the performance of 12 residents in 371 basic TEEs in order to perform the follow-up Angoff session. During this session, the greatest adjustments to the itemized follow-up PS were for the six midpapillary ventricular segments, as well as systolic anterior motion of the mitral valve, tricuspid regurgitation, pulmonic stenosis, pulmonic regurgitation, chamber impingement, and atrial shunt. These adjustments, if used for actual clinical competency assessment, would have changed a summative assessment from passing to failing for a small percentage of residents on some items. However, all residents still met the global PS even with the score adjustments. Therefore, we conclude that the follow-up Angoff PS setting session did not impact the standards for identifying the minimally competent basic TEE practitioner on a global level, but this did impact such identification of competency on specific components of the basic TEE.
Finally, our findings also demonstrate heterogeneity in the expectations of performance on certain components, even among experts. For instance, the subject matter experts set higher itemized initial PS (greater than 1 SD above the global PS) for identification of aortic valve stenosis, mitral valve regurgitation, pericardial effusion, and aortic dissection. These potentially life-threatening pathologies have definitive surgical and/or minimally efficacious medical therapies and must be addressed with haste. Our subject matter experts’ determination of these PSs is likely consistent with national practice guidelines, which strongly endorse the use of TEE to assess the cause of hypotension or when hypotension is anticipated.17 It is believed that competence in these basic TEE assessments among a greater number of anesthesiologists may improve the perioperative care of patients when hemodynamic instability is encountered or anticipated.18
One word of caution is offered in the interpretation of our results. The purpose of this study was to perform a rigorous standard setting of PS for basic TEE performance, as described in educational research literature and as is commonly used for national board exams. Accordingly, the statistical results that we report should be largely viewed from a descriptive perspective. That is, while we were able to detect a change in the PS determined by our group of experts in session 1 versus session 2 and while such results are important, no specific intervention was performed that led to this change. We attempt to identify the reasons for this change, and the study was not prospectively powered in any manner to define statistical significance. We believe that quantifying our results in each step of the process is of value, as these results could serve as a baseline for future generalizability testing of our findings with more experts and more trainees, but the interpretation of the numerical results must be properly understood in order to prevent incorrect inferences.
The current study has several limitations. First, while the expert group size is appropriate for many standard-setting procedures, it was performed with a limited number of subject matter experts (n = 7) who may or may not fully represent the opinions of expert echocardiographers at large. Second, this study occurred with a limited number of anesthesiology resident trainees (n = 12). Third, while board certification in basic TEE requires passing a rigorously constructed and proofed written exam, personally performing at least 50 basic TEE exams, and reviewing at least 100 additional exams, our results apply only to the personally performed exams. The same process could be applied to the interpreted exams. As trainees were scored based on comparison between their interpretation and that of an advanced TEE–certified echocardiographer, it is possible that some of the discordances are due to borderline cases or inaccuracies in the attending anesthesiologists’ reports. Fourth, the lack of comparative assessment of hypovolemia, venous air embolism, and pulmonary embolism may be a source of bias for which we did not account. Finally, trainees’ exposure to alternative information regarding the patient’s cardiac function from other sources may have altered their interpretation of the basic TEE findings (e.g., the operating room scheduling board, which may signify valve disease, information stated during the presurgery time-out, the medical record, and other healthcare professionals). It is difficult, if not impossible, in the routine daily workflow to proctor the resident continually in order to ensure there are no alternative exposures to information about the patient’s cardiac function. However, this also provides an opportunity to teach the residents to confirm pathology rather than assume its existence.
In conclusion, to our knowledge, this is the first demonstration that the collection of real-world anesthesiology resident basic TEE clinical performance data is technically feasible and that automated grading for competency assessment is possible with existing tools. Using the well-established Angoff PS setting method, we created credible PSs for the itemized basic TEE assessments, as well as a global PS for the basic TEE exam. These PSs will require external validation from the collection of basic TEE performance data and expert consensus from multiple institutions. The starting point for this will be the creation of a national scoring rubric in order to ensure the use of the same scoring system for all anesthesiology trainees undergoing basic TEE training to assure that competency accompanies certification. We believe that our current work can form the basis for such an evaluation system. This future research will enhance our knowledge and understanding of the acquisition of basic TEE and perhaps other areas of anesthesia procedural competency.
Research Support
Supported by a Research in Education grant from the Foundation for Anesthesia Education and Research (FAER, Schaumburg, Illinois; to Dr. Bick) and by funding from FAER and from Anesthesia Quality Institute’s (Schaumburg, Illinois) Health Service Research Mentored Research Training Grant (to Dr. Wanderer).
Competing Interests
The authors declare no competing interests.