Processed electroencephalographic indices, such as the bispectral index (BIS), are potential adjuncts for assessing anesthetic depth. While BIS® monitors might aid anesthetic management, unprocessed or nonproprietary electroencephalographic data may be a rich source of information for clinicians. We hypothesized that anesthesiologists, after training in electroencephalography interpretation, could estimate the index of a reference BIS as accurately as a second BIS® monitor (twin BIS®) (Covidien Medical, Boulder, CO) when provided with clinical and electroencephalographic data.
Two sets of electrodes connected to two separate BIS® monitors were placed on the foreheads of 10 surgical patients undergoing general anesthesia. Electroencephalographic parameters, vital signs, and end-tidal anesthetic gas concentrations were recorded at prespecified time points, and were provided to two sets of anesthesiologists. Ten anesthesiologists received brief structured training in electroencephalograph interpretation and 10 were untrained. Although electroencephalographic waveforms and open-source processed electroencephalograph metrics were provided from the reference BIS®, both groups were blinded to BIS values and were asked to estimate BIS.
The trained anesthesiologists averaged as close to or closer to the reference BIS® compared with the twin BIS® monitor for 34% of their BIS estimates versus 26% for the untrained anesthesiologists. Using linear mixed effects model analysis, there was a statistically significant difference between the trained and untrained anesthesiologists (P = 0.02), but no difference between the twin BIS® monitor and trained anesthesiologists (P = 0.9).
With limited electroencephalography training and access to clinical data, anesthesiologists can estimate the BIS almost as well as a second BIS® monitor. These results reinforce the potential utility of training anesthesia practitioners in unprocessed electroencephalogram interpretation.
What We Already Know about This Topic
Calculated electroencephalographic indices in combination with the raw electroencephalographic waveforms may be helpful for assessment of depth of sedation/hypnosis during general anesthesia.
What This Article Tells Us That Is New
After brief structured education, with access to data from a frontal electroencephalographic recording and coupled with relevant clinical data, anesthesiologists can estimate processed bispectral index fairly accurately.
THE goals of general anesthesia include ensuring that patients are physiologically stable and oblivious to the noxious stimuli of surgical intervention. Unfortunately, because we remain unable to detect unconsciousness reliably, we cannot guarantee that patients will not be awake and aware during surgery. However, important advances have been made in our ability to ensure that adequate anesthesia is administered, including routine measurement of exhaled anesthetic concentrations. Monitoring brain activity directly, first suggested in the 1930s, has garnered tremendous interest in the past decade. A variety of candidate depth-of-anesthesia monitors are now available for clinical use. Most of these monitors are based on brain electrical activity or electroencephalography.
Despite the fact that the brain is the target organ of general anesthesia, we do not have standard intraoperative brain monitors, as we do for other vital organs. If a patient receives insufficient anesthesia, he or she may experience unintended awareness during surgery, a complication that can have serious consequences, including posttraumatic stress disorder.1In an attempt to avoid patient awareness, anesthesia practitioners may give high doses of anesthesia, which can lead to increased drug costs, increased time spent in the operating room, increased time spent in the recovery area, and possibly even increased adverse effects, such as nausea and vomiting.
Some practitioners believe that using a brain monitor during general anesthesia may help to optimize anesthetic administration. One particular processed electroencephalography monitor, the bispectral index monitor (BIS® monitor; Covidien Medical, Boulder, CO), has gained widespread acceptance. Part of the motivation for developing a processed electroencephalographic monitor is the assumption that anesthesia practitioners are unable to interpret unprocessed electroencephalographic patterns in real time.2However, it has been shown that after a brief, structured education session, anesthesia practitioners can reliably recognize patterns in the electroencephalogram trace that indicate states such as wakefulness, light anesthesia, deep anesthesia, and brain quiescence.2Taken in a clinical context, these electroencephalogram patterns may help practitioners to appreciate whether patients are likely to be unconscious and appropriately anesthetized. There are several potential advantages of high resolution electroencephalographic assessment over the processed monitors, which include rapid response time, decreased costs, ability to appreciate artifacts, lack of reliance on proprietary algorithms, and the ability to distinguish among specific brain states (e.g ., seizures, isoelectricity).3–7
We hypothesized that anesthesiologists, appropriately trained in electroencephalography interpretation, could estimate the index of a reference BIS® as accurately as a second BIS® monitor (twin BIS®) when provided with relevant clinical and electroencephalographic data. To test this hypothesis, we designed an inverse Turing test to compare anesthesiologists to the twin BIS® monitor in ability to predict the BIS values of human patients undergoing general anesthesia.
Materials and Methods
This was a prespecified substudy of the BAG-RECALL clinical trial (NCT00281489)8at Washington University (St. Louis, Missouri). The trial aimed to assess practitioner ability to interpret unprocessed electroencephalogram trace during anesthesia. In brief, the BAG RECALL trial is a 6,000-patient multicenter trial that was designed to compare a BIS-based protocol with a protocol based on end-tidal anesthetic concentration monitoring for efficacy in preventing unintended intraoperative awareness with subsequent explicit recall in high-risk patients having surgery under general anesthesia. Patients provided written informed consent to participate in the BAG RECALL trial according to institutional standards.
The Human Research Protection Office at Washington University approved this substudy. Data from 10 patients who participated in the BAG RECALL clinical trial were included in this substudy. Patients were selected sequentially for this substudy during July 2009 after they consented to participate in the BAG RECALL trial and it was revealed that they had been randomized to the BIS protocol of the trial. This substudy did not impact clinical care and did not interfere with the conduct of the BAG RECALL trial because the blind of the study was not broken and no postoperative outcomes data were assessed. The substudy was not conducted in real time and patient data were deidentified.
Each of the 10 patients had two sets of electrodes connected to independent BIS® monitors placed on both sides of the forehead while undergoing general anesthesia. Identical settings were used for both monitors, including the smoothing rate, which was set at 15 s. Anesthesia practitioners responsible for patient care had access to information provided by only one of the BIS® monitors (reference BIS®). Three-second electroencephalography epochs were presented to anesthesiologists. The duration of the electroencephalography epoch required for calculation of the BIS has not been disclosed. It has been assumed to be between 20 and 40 s, however it may be longer than this because a 63-s epoch is used to calculate the burst-suppression ratio. An epoch of a minute is consistent with research showing time delays of the BIS in responding to a state change (e.g ., unresponsive to responsive).3Electroencephalography epochs that were shown to anesthesiologists would, therefore, have hypothetically represented the last 5–15% of the electroencephalography epoch from which the BIS algorithm was calculated. Practitioners caring for patients were blinded to all data on the other BIS® monitor (twin BIS®) and were not involved in the current study. BIS numbers, electroencephalogram traces, electroencephalogram parameters from both monitors, vital signs, and anesthetic gas measurements were recorded at prespecified time points during the anesthetic period. These time points included: preinduction wakefulness, induction, intubation, incision, maintenance, closure, extubation, and postanesthetic wakefulness. In total, there were 90 time points.
We devised the inverse Turing test to test the hypothesis that an anesthesiologist can obtain similar information using a single frontal electroencephalographic channel and clinical context to that he or she can obtain using a BIS® monitor. The Turing test was proposed by Alan Turing, Ph.D., O.B.E., F.R.S. (1912–1954), as a philosophical experiment to determine whether a machine displayed evidence of real intelligence or even consciousness. Conceptually, a human would be placed in a room and would converse (e.g ., keyboard and screen interface) with a machine (e.g ., robot, computer) in another room and a second human in yet another room. If, after questioning, the human could not distinguish between machine and human, the machine would have passed the Turing test and would have displayed intelligence.
For the purposes of this substudy, we devised an inverse Turing test, which was inspired by the experiment proposed by Turing. The concept behind the inverse Turing test is as follows. A reference BIS® monitor would be in one metaphorical room. The twin BIS® would be in a second metaphorical room. An anesthesiologist would be in yet another metaphorical room and would attempt to estimate the BIS number based on clinical context, electroencephalographic traces, and nonproprietary electroencephalographic parameters (fig. 1). If the anesthesiologist could estimate the reference BIS® value as well as the twin BIS® monitor, the anesthesiologist could be said to have passed the inverse Turing test. An important assumption on which this study depended was that there would be good, but not perfect, agreement between two concurrent BIS® monitors. Based on a previous study where we showed a degree of intrapatient variability between two concurrent BIS® monitor readings, we felt that this assumption was reasonable.9An important caveat is that there may be interhemispheric differences in BIS readings attributable to factors such as unilateral brain pathology or decreased regional cerebral blood flow.10,11To avoid such recognized potential confounders, patients with known brain pathology, seizure disorders, or cerebrovascular disease were not enrolled in this substudy (and were generally excluded from the BAG RECALL clinical trial).
Two groups of attending anesthesiologists (10 anesthesiologists per group) enrolled sequentially and consented to participate in this substudy. In one group, the 10 anesthesiologists received structured training on electroencephalographic interpretation. Training was based on that recommended by Barnard et al2and others and consisted of a 45-min, live presentation that included example electroencephalographic waveforms associated with different depths of anesthesia. We included characteristic changes that occur in the electroencephalogram with increasing concentrations of primarily γ-aminobutyric acid agonist anesthetics, including (1) a decrease in high frequency, low amplitude waves, (2) a concomitant increase in low frequency, high amplitude waves, (3) burst suppression, and (4) isoelectricity (fig. 2).4,12We also instructed anesthesiologists regarding the basic electrophysiologic parameters analyzed by BIS.13,14Nonproprietary parameters displayed by the BIS® monitor were described, including electromyography (power in decibels), spectral edge frequency95(frequency from 0.5–30 Hz below which 95% of the total electroencephalography power lies), signal quality index (signal quality of the electroencephalography signal between 0–100%), suppression ratio (percentage over a 63-s epoch that the electroencephalography is in a suppressed state), and total power (total power of the summated electroencephalography component waves in decibels).13,14Based on the work of Morimoto et al .,14we explained that, at surgical levels of anesthesia, spectral edge frequency and BIS are well correlated. We further suggested that spectral edge frequency could often be a useful surrogate for BIS (fig. 2). At lighter levels of anesthesia, higher β ratio (with increased high-frequency β waves) correlates with BIS (BIS 60–100); at very deep anesthesia, the extent of burst suppression (suppression ratio) correlates with BIS (BIS 0–30).13,14During wakefulness in patients who are not pharmacologically paralyzed, electromyographic values are typically in the 40s and 50s, whereas, during surgical anesthesia, they are typically in the 20s and 30s. Measured artifacts (e.g ., blinking and eye movement) were discussed as well as shape and variability of dose-response curves for processed electroencephalographic indices. None of the 20 anesthesiologists had previously received formal training in electroencephalographic interpretation and none of them had routinely used unprocessed electroencephalography as part of their standard practice.
All 20 anesthesiologists were presented with relevant patient histories, clinical information, anesthetic data, hemodynamic parameters, drug doses, anesthetic gas concentrations, and electroencephalographic traces and associated nonproprietary parameters (spectral edge frequency, electromyography, suppression ratio, signal quality index, and total power). Based on this information, practitioners were asked to estimate what they thought the BIS value was for each patient at each of the various prespecified time-points. Figure 3represents an example screen presented to anesthesiologists when estimating BIS values.
Descriptive statistics are presented as text, tables, and graphs. The primary hypothesis of this substudy was that structured electroencephalography training would result in improved estimation of the processed electroencephalograph index. We performed a power analysis using a simulation approach based on a mixed effect model. From pilot data, we assumed that trained anesthesiologists' estimates would be, on average, 5 absolute BIS units closer to the that of the reference BIS® monitor than the untrained anesthesiologists' estimates (i.e ., treatment effect size = 5). Other factors considered in power analysis were BIS measurements on 10 patients, patient random effect on the intercept having a variance of 20 (alternatively expressed as SD of the random intercept) and a measurement error variance of 100. These values correspond to a correlation of 0.2 between repeated measures. The result of the power analysis is that, with 10 replicates (that is, for each reference BIS® value, there are 10 repeated estimates from 10 trained and 10 untrained anesthesiologists), the study has 88% power to detect a difference (the specified treatment effect size) between the trained and untrained anesthesiologists at a significance level of 0.05.
We fitted a linear mixed effect model to our results. The following effects were considered:
1. Fixed Effect of the Device.
That is, who provided the measurement. This effect has four levels: reference BIS®, twin BIS®, trained anesthesiologist, and untrained anesthesiologist.
2. Random Effect for Experts in the Trained Group and the Untrained Group
3. Random Effect Due to Repeated Measurements for Each Patient.
In mathematical terms, the model was conceptualized as follows. Let i be the index for patient; j , the index for device (reference BIS®, twin BIS®, trained anesthesiologist, or untrained anesthesiologist); and yijk be the k th score given by the j th device for the i th patient. The outcome measure of the model was Yijk .
Next, let αi be the patient-specific (random) effect; βj , the device (fixed) effect; and γ(i)j , the nested device (random) effect within patient. Our model is then written as yijk =αi +βj +γ(i)j +ϵijk , where the last term (ϵijk ) denotes the random measurement error. By introducing αi and γ(i)j as random effects, we assume that the average response level and the device effect varies randomly across patients.
The linear mixed-effect model was fit using PROC MIXED in SAS (version 9.2; SAS Institute, Inc., Cary, NC). One potential issue in the dataset relates to missing observations. However, when using PROC MIXED, only missing observations were discarded but not the entire dataset. A post hoc Tukey test was performed for correction of multiple comparisons. A P value of less than 0.05 was considered statistically significant. All hypothesis testing was two-tailed. Statistical analyses were performed with SAS or Analyse-it® (Analyse-it Software, Ltd., Leeds, United Kingdom) statistical software.
Characteristics of the patients who were enrolled in this substudy and details pertaining to anesthesia are shown in table 1. There were 90 values for the reference BIS® and the twin BIS®. There were 1,794 values for the 10 trained and 10 untrained anesthesiologists, with six missing BIS estimates. Discordance rates of the twin BIS®, trained anesthesiologists, and untrained anesthesiologists when the reference BIS® was less than 40, 40–60, or more than 60 are shown in table 2. Notably, when the reference BIS® displayed values above 60, about 20% of anesthesiologists' estimates were below 60 and 9% of the twin BIS®'s estimates were below 60.
The effect of training was associated with a significant difference in performance in estimating the value of the reference BIS® (uncorrected P = 0.004, corrected P = 0.019). Concordance rates in estimating “depth of anesthesia,” defined by three prespecified “depths” (BIS >60, BIS 40–60, BIS <60), with the reference BIS® were 86% for the twin BIS®, 72% for the 10 trained anesthesiologists, and 62% for the 10 untrained anesthesiologists. Figure 4presents box-and-whisker plots showing the median, interquartile ranges, and 90% ranges for the differences among the twin BIS®, trained anesthesiologists, and untrained anesthesiologists compared with the reference BIS®. The median of the twin BIS® estimates was no different from the reference BIS®. The median of trained anesthesiologist BIS estimates deviated from the reference BIS® with a range of 1–4, depending on the observer. Median untrained anesthesiologist BIS estimates deviated from the reference BIS® with a range of 2–11, depending on the observer. On average, the 10 trained anesthesiologists were as close to or closer to the reference BIS® compared with the twin BIS® for 34% of their estimates. On average, the 10 untrained anesthesiologists were as close to or closer to the reference BIS® compared with the twin BIS® for 26% of their estimates. Figure 5graphically depicts all 90 BIS values for each of the 10 patients as displayed by the reference and twin BIS® monitors. In addition, median estimated BIS values are shown for trained and untrained anesthesiologists.
On aggregate, there was no statistically significant difference between the trained anesthesiologists' and twin BIS®s' performance in predicting the reference BIS® value (uncorrected P = 0.43, corrected P = 0.9). There was a significant difference in the uncorrected comparison of the untrained anesthesiologists and twin BIS® (P = 0.04) that was not significant after Tukey correction (P = 0.16).
These data demonstrate that basic training is probably associated with improvement in electroencephalographic interpretation and that a trained anesthesiologist can predict BIS values as well as or better than a second BIS® monitor approximately one third of the time. Furthermore, the finding in this pilot study that there was no significant difference between a trained anesthesiologist and a second BIS® monitor in predicting a reference BIS® index generates the intriguing hypothesis that anesthesiologists are capable of assessing anesthetic depth based on clinical data and basic electroencephalographic parameters as well as a processed electroencephalographic monitor. This hypothesis warrants further investigation. The relatively high discordance rates for both groups of anesthesiologists and the twin BIS® in relation to the reference BIS® for BIS values above 60 is potentially concerning. However, BIS values above 60 have been found to have a poor specificity for return of wakeful responsiveness.15
There has been a renewed focus on the electroencephalographic assessment of anesthetic effects.4,6It has previously been argued—and is now often assumed—that the use of unprocessed electroencephalography is impractical in the intraoperative setting. The presuppositions of such arguments may no longer be applicable. First, electroencephalographic devices are no longer cumbersome and can be easily accommodated in clinically used modules. Second, digitization of electroencephalographic signals allows for basic processing and more ready interpretation. Finally, our data suggest that trained anesthesiologists can recognize and interpret patterns in the electroencephalogram and can reach clinical conclusions that are generally similar to those they may have reached with the use of a processed algorithm. As with electrocardiography, structured education and practical experience in electroencephalography would probably instill even greater proficiency with its use and interpretation. We see no fundamental difference in the potential of an anesthesiologist to recognize the slowing frequency of an electroencephalographic waveform versus the slowing of a heart rate, a K complex versus a premature ventricular contraction, burst suppression versus a run of ventricular tachycardia, or a sleep spindle versus torsades de pointes.
It is of interest to note that the Accreditation Council for Graduate Medical Education (Chicago, IL) lists as a requirement for anesthesiology training that “the resident must either personally participate in cases in which electroencephalograph or processed electroencephalographic monitoring is actively used as part of the procedure or have adequate didactic instruction to ensure familiarity with electroencephalographic use and interpretation.”††The guideline goes on to say that “Bispectral index use and other similar interpolated modalities are not sufficient to satisfy this requirement.”††As it stands, there are probably few training programs in the United States that implement training in electroencephalography as a formal part of their didactic program, despite its perceived importance. Recent studies have tried to implement formal learning modules in collaboration with neurologists to improve familiarity and success with electroencephalography use and interpretation.16,17Our data suggest that improved skills relevant to clinical practice may also be achieved in the field of anesthesiology.
One of the major challenges in relation to studies dealing with the electrophysiologic monitoring of anesthetic depth is the lack of a “gold standard.” In using a reference BIS® monitor, we were able to judge improvement in performance as objectively as possible. It could be argued that what has been demonstrated with this study is that, with training, anesthesiologists can improve their ability at approximating an index whose value or meaning has not been sufficiently established. However, the BIS® monitor is one of several candidate depth-of-anesthesia monitors that has been approved by the US Food and Drug Administration. In several studies,15,18it has been shown to have a reasonable ability to discriminate between responsiveness and unresponsiveness. No method has been, nor indeed can presently be, validated to assess depth of anesthesia beyond loss of responsiveness. Although the algorithm used to calculate the BIS number is proprietary, the electroencephalographic components of the algorithm have been described.13,14The relative BetaRatio subparameter is the log ratio of power in two empirically derived frequency bands, 30–47 and 11–20 Hz.13The SynchFastSlow subparameter is the contribution from bispectral analysis, and reflects phase coupling between different frequencies. SynchFastSlow is also defined as a log ratio: the log of the ratio of the sum of all bispectrum peaks between 0.5–47 Hz over the sum of the bispectrum in the area 40–47 Hz.13Burst-suppression ratio is also incorporated in the proprietary BIS® algorithm.13,14These features were discussed in the training sessions provided to the trained group of anesthesiologists. The difficulty with establishing the reference BIS® as the gold standard in this study is evidenced by the fact that not all deviations of the anesthesiologist from the reference BIS® were actual errors. In one example, a patient had aroused, opened her eyes and was moving her arms, yet both the BIS and twin BIS® still read 40 (fig. 6). It is noteworthy that, for this time point, all trained anesthesiologists estimated a BIS value above 78, and all untrained anesthesiologists estimated a BIS value above 70. This particular discrepancy between the BIS® monitors and anesthesiologists may have, in part, been attributable to the time delay that occurs with the BIS and other processed electroencephalograph monitors when there is an abrupt change in state, such as an arousal.3In this case, the clinical context and the unprocessed electroencephalographic trace changed more rapidly than the depth-of-anesthesia index.
This study has several noteworthy limitations that should be emphasized. First, practitioners were not randomly allocated to the electroencephalography training group and no pretesting was done. Therefore, we cannot be certain that the two groups of anesthesiologists were well matched and that differences in performance were attributable to electroencephalography training. However, such differences would potentially be accounted for in the mixed-effect model. Second, the anesthesiologists in this study were not assessing the electroencephalogram trace during clinical administration of anesthesia. We should not infer that the results obtained in the comfort of the classroom are necessarily transferable to the pressurized clinical setting. Third, a major potential advantage of a processed index over an unprocessed electroencephalography trace is that an alert may be linked to a threshold value of an index (e.g ., BIS >60). The results of this study should therefore not be interpreted to suggest that trained anesthesiologists are equivalent or similar in terms of efficacy to a processed electroencephalography index. Fourth, all of practitioners had previously used BIS® monitors. We cannot exclude that previous experience in the use of processed electroencephalography might have helped practitioners estimate BIS on the basis of clinical data and the unprocessed electroencephalograph, which is generally displayed on the BIS® monitor and may have led to “informal training” in the past. Finally, parameters such as blood pressure, heart rate, and end-tidal anesthetic concentration could have influenced practitioner estimates of BIS.
In conclusion, we demonstrate that relatively brief training in electroencephalography improves the performance of anesthesiologists in assessing anesthetic depth, with reference to a target BIS value. These data suggest that anesthesiology practitioners would probably benefit from standardized, formal education in electroencephalography interpretation. Finally, if validated by future studies, our data also suggest that nonproprietary electroencephalographic measures in conjunction with clinical context may be as informative as commercially available depth-of-anesthesia monitors.
The authors dedicated this study to Alan Turing Ph.D., O.B.E., cryptographer, mathematician, logician, philosopher, and founder of computer science. He contributed much to science and to humanity. The authors thank the members of the BAG RECALL research team and the anesthesiologists who agreed to participate in this substudy.