When assessing the accuracy and precision of a new technique for cardiac output measurement, the commonly quoted criterion for acceptability of agreement with a reference standard is that the percentage error (95% limits of agreement/mean cardiac output) should be 30% or less. We reviewed published data on four different minimally invasive methods adapted for use during surgery and critical care: pulse contour techniques, esophageal Doppler, partial carbon dioxide rebreathing, and transthoracic bioimpedance, to assess their bias, precision, and percentage error in agreement with thermodilution. An English language literature search identified published papers since 2000 which examined the agreement in adult patients between bolus thermodilution and each method. For each method a meta-analysis was done using studies in which the first measurement point for each patient could be identified, to obtain a pooled mean bias, precision, and percentage error weighted according to the number of measurements in each study. Forty-seven studies were identified as suitable for inclusion: N studies, n measurements: mean weighted bias [precision, percentage error] were: pulse contour N = 24, n = 714: -0.00 l/min [1.22 l/min, 41.3%]; esophageal Doppler N = 2, n = 57: -0.77 l/min [1.07 l/min, 42.1%]; partial carbon dioxide rebreathing N = 8, n = 167: -0.05 l/min [1.12 l/min, 44.5%]; transthoracic bioimpedance N = 13, n = 435: -0.10 l/min [1.14 l/min, 42.9%]. None of the four methods has achieved agreement with bolus thermodilution which meets the expected 30% limits. The relevance in clinical practice of these arbitrary limits should be reassessed.
THERE is increasing interest in better hemodynamic management, incorporating cardiac output measurement, to achieve improvements in patient outcomes during major surgery.1–3A number of methods and technologies are now available for minimally invasive or noninvasive cardiac output monitoring in the perioperative period. These include pulse contour and esophageal Doppler devices, the partial carbon dioxide rebreathing (Pco2RB) method, and transthoracic electrical bioimpedance (TEB).3However, these methods have not achieved widespread use in routine practice.4The reasons for this include cost, of both the devices and their disposable components, invasiveness, and concerns about their accuracy, precision, and reproducibility.
Numerous publications5–87have examined the accuracy and precision of the various methods and devices currently available, by comparison with simultaneous paired measurements made using a commonly accepted clinical standard technique. This is usually a more invasive technique, such as right heart or transpulmonary thermodilution. Most such publications over the last decade have employed bias and precision statistics, as described by Bland and Altman,88providing the mean difference (bias) and SD of the difference between paired measurements, from which limits of agreement (bias ± 1.96 standard deviations) are obtained. These limits of agreement are often expressed as a proportion of the mean cardiac output (percentage error).
The acceptable limit of agreement in these comparison studies has been unclear. In a review paper published in 1999, Critchley and Critchley89suggested that acceptable agreement should be a percentage error of 30% or less, which has become a widely quoted criterion.5–15,17–19,25–30,46–49,63–66Numerous studies have been published in the field over the last 10 yr, which include newer methods that were not reviewed by Critchley and Critchley. It is unclear whether currently available methods are consistently achieving this level of agreement. More recent reviews have focused on a single method,90and/or have excluded relevant patient groups from the analysis.91In some reviews, pooling of data from studies where repeated measurements from patients are made makes the reliability of their conclusions uncertain.89,91
We conducted a 10-yr review of studies examining the agreement with bolus thermodilution of four currently available methods which are adapted to perioperative and critical care use, for minimally invasive cardiac output monitoring (pulse contour, esophageal Doppler, Pco2RB, and TEB). To get a global measurement of their accuracy and precision, all studies reporting data from a single measurement on each patient were included in a pooled weighted meta-analysis.
Materials and Methods
A PubMed and Medline search was conducted with search headings such as “cardiac output, pulmonary blood flow, thermodilution, pulse contour, PiCCO, LidCO, PulseCO, FloTrac, Vigileo, esophagal Doppler, carbon dioxide rebreathing, NICO, and thoracic electrical bioimpedence.” The search and subsequent bibliographic review was restricted to studies in adult humans, and to published papers (not correspondence or case reports) in English language peer-reviewed journals, in which results were expressed using bias and precision statistics (mean difference and either SD of agreement, 95% limits of agreement, or percentage error). Only studies using comparison with simultaneous measurements of cardiac output or cardiac index by bolus right heart or transpulmonary thermodilution were included. Studies comparing PiCCO (Pulsion Medical Systems, Munich, Germany) with transpulmonary thermodilution were excluded, because the method requires transpulmonary thermodilution for initial calibration and this was considered to bias the comparison.
Where not reported directly, percentage error (% error) for a study was calculated from the SD of agreement and mean cardiac output:
Where mean cardiac output was not provided in tables or text, it was estimated from graphs. The methodology employed was in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses, formerly QUOROM) Statement92issued by the CONSORT group.93,94
A total of 92 publications was found including 96 trials (4 publications made simultaneous comparisons of two methods) comparing one of the four methods against bolus thermodilution with results expressed using bias and precision. These comprised 55 trials for pulse contour, 9 trials for esophageal Doppler, 15 trials for Pco2RB, and 17 trials for TEB. Significant variations in methodology and statistical treatment were found among these. In the 9 publications where cardiac index was reported, this was converted to cardiac output using body surface area, and if the latter was not supplied, an assumed body surface area of 2.0 m2was used (the median value among the 22 publications where body surface area was reported). A large number of these trials conducted several studies of a method on each subject (across all publications, these totaled 146 studies for pulse contour, 21 for esophageal Doppler, 34 for Pco2RB, and 24 for TEB). Some of these publications reported these studies separately, but many presented only a single pool of data from all subjects at multiple time points, and many of these did not state that correction was made for multiple measurements on subjects when calculating overall bias and precision of agreement, as described by statistical authorities.95–97
In 47 of these studies, data from at least one single independent measurement on each subject was able to be distinguished, thus making them suitable for inclusion in a pooled, weighted meta-analysis. The first such measurement from each subject in each of these studies was included in this meta-analysis. The process is summarized in figure 1.
For each method, the reported bias (method – thermodilution), mean cardiac output, variance of agreement (SD of agreement squared), and correlation coefficient were weighted according to the number of subjects in each study, and a pooled weighted value for each was derived, according to
where niand xi are, respectively, the number of measurements and the variable to be pooled (bias, mean cardiac output, variance of agreement, or correlation coefficient) in study i among N studies for that method.
The pooled weighted precision (one SD) of agreement was calculated (square root of the pooled weighted variance) and pooled weighted percentage error then calculated according to Eq 1. Confidence limits for the bias and percentage error were calculated as described by Bland and Altman.88The pooled weighted correlation coefficient was calculated as described by Hunter and Schmidt.98
To help assess the generalizability of the meta-analysis, the distribution of percentage error among the single measurement studies included in the pooled weighted meta-analysis was compared with that of all sets of data for that method listed in table 1, using a two-sample Kolmogorov-Smirnov test. This was performed using OriginPro 8.1 statistical software (Origin Lab, Northampton, MA). The database was constructed and all pooled calculations performed using Microsoft Excel 2008 (Microsoft Corporation, Redmond, WA).
Table 1lists those studies included in the review for each of the methods, along with the location of the data collection for each study (operating theater, intensive care unit) and clinical situation where relevant. Where multiple studies at different time points were reported, they are listed separately. The number of data points n for each study, bias, and precision (defined as one SD of the difference between paired measurements by the method and thermodilution) are listed.
In the pooled weighted calculation of bias, precision, and percentage error, 24 studies were found to provide suitable data for the pulse contour method, 8 studies for Pco2RB, and 13 studies for TEB. Only two studies met the criteria for inclusion among those examining esophageal Doppler. These data are listed in bold type in table 1. Results for mean weighted pooled bias, precision, and percentage error are shown in table 2. Bias was negligible for all methods except esophageal Doppler. Percentage error was lowest for pulse contour methods (41.3%) and highest for Pco2RB (44.5%), but these differences did not reach statistical significance.
Of these 47 studies, slightly over half provided data suitable for a pooled weighted calculation of correlation: 12 studies for the pulse contour method, 5 studies for Pco2RB, 8 studies for TEB, and both studies for esophageal Doppler. Results are shown in table 3. The pooled weighted correlation coefficient was lowest for PCo2RB (0.57) and highest for TEB (0.79).
The distributions of percentage error for those studies included in the pooled weighted meta-analysis and for all data sets in all the studies listed in table 1are plotted in figure 2. Kolmogorov-Smirnov testing for each method revealed no significant differences between the distributions (pulse contour: [Kolmogorov-Smirnov statistic]D = 0.116, P = 0.91; esophageal Doppler: D = 0.429, P = 0.81; Pco2RB: D = 0.191, P = 0.94; TEB: D = 0.128, P = 0.99).
In a pooled weighted meta-analysis of 47 studies comparing agreement of four methods for minimally invasive cardiac output measurement with thermodilution, we found that none of the four methods met the criteria for acceptability of agreement suggested by Critchley and Critchley,89which is a percentage error of 30% or less.
There are some limitations to our meta-analysis which should be considered. Among the 47 studies that met the criteria for the pooled weighted meta-analysis, 34 (72%) were done in cardiac surgery patients. During development, many devices are tested in patients undergoing cardiac surgery, as this is a readily accessible patient subgroup in whom monitoring with pulmonary artery catheters is routine practice in many centers. Subsequent independent testing in the same patient subgroup does not provide information about the performance of the device in wider clinical practice. The potential for this to restrict the generalizability of the analysis was a concern. Figure 2and Kolmogorov-Smirnov testing revealed no significant differences between the distribution of percentage error among the single measurement studies included in the pooled weighted meta-analysis and the distribution of all sets of data for that method listed in table 1. This suggests that studies included in the pooled weighted meta-analysis provide a representative sample of the total number of studies in the field and that the pooled weighted percentage error for each method is a valid indicator of its precision across the full range of clinical situations in which they have been studied to date. The asymmetric nature of most of these distributions makes it clear that a simple nonparametric estimation (e.g. , a median) of overall percentage error would underestimate the pooled weighted percentage error significantly and give an unduly favorable estimate of precision for some of these methods.
Only two studies, incorporating 57 measurements, by esophageal Doppler were eligible for inclusion, which explains the wide confidence intervals for the percentage error and bias, and limited conclusions can be drawn from the pooled weighted data in table 2, although figure 2Bsuggests that these two studies are consistent with the broader body of published work on this method. Schober et al. recently reviewed studies on the accuracy and precision of esophageal Doppler measurement of cardiac output. Applying a nonparametric approach to pooling of their data, they found a median underestimate of 0.37 l/min, and an upper quartile for limits of agreement of 5.0 l/min, relative to a variety of other methods (predominantly thermodilution).90However, both their review and our analysis indicate that a negative bias is present for esophageal Doppler measurement, suggesting that the unmeasured proportion of cardiac output to the upper body that is assumed may need to be increased.
A further concern was the 10 yr time span of this review of a rapidly developing field. Improvements in available technologies may mean that our findings do not reflect current performance of these methods. We therefore contrasted data from studies published over the last 5 yr with the findings in table 2. The pooled weighted percentage error for pulse contour (16 studies) was 46.4%, for Pco2RB (2 studies) was 42.0%, and for TEB (6 studies) was 44.7% (unchanged for esophageal Doppler). Although numbers in this subanalysis are small, there is no evidence that precision of agreement with thermodilution has improved over the interval covered by our review. However, there is ongoing need for repeat review of the performance of all these technologies into the future, to determine whether incremental improvements in precision of agreement are being achieved. Development of newer and more precise “gold standards” for comparison should prompt further validation studies, and more reliable data for future comparisons.
A recent addition to the range of devices available is the Vigeleo FloTrac (Edwards Lifesciences, Irvine, CA) pulse contour device. The focus of this review was on the performance of four generic methods in agreement with a common reference standard. We deliberately did not stratify our analysis to examine the performance of individual devices, for simplicity and to avoid either a commercial or proprietary emphasis, or weakening of the statistical power of the analysis. However, our data can be compared with a recent review and meta-analysis of studies on the accuracy and precision of the FloTrac by Mayer et al. 91These authors found a percentage error of 44% for earlier versions of the device and 30% for later versions (v1.07+), but this review excluded studies involving patients with hemodynamic instability or vasodilatory states, thus restricting their analysis to cardiac surgery alone. Subanalysis of our data for studies on the FloTrac found a percentage error of 47.3% for earlier versions and 44.7% for v1.07+, but the latter contained two studies in septic or critically ill patients,99,100where high cardiac outputs and hemodynamic instability present greater challenges to the accuracy and precision of a measurement device. Therefore these results for the FloTrac still compared well with the other methods we have surveyed in the current review. The FloTrac system has the advantage of not requiring a calibration maneuver as is required by other commonly used pulse contour devices: PiCCO (Pulsion Medical Systems) which is calibrated by transpulmonary thermodilution, and PulseCO (LiDCO Ltd, Cambridge, United Kingdom) which uses an injected lithium bolus for indicator dilution cardiac output measurement. However, our results do not take into account data from recent case reports questioning the ability of FloTrac to accurately track cardiac output during dramatic intraoperative changes in hemodynamics.101,102
In 1999, Critchley and Critchley reviewed 25 studies comparing TEB and esophageal Doppler with thermodilution.89In an unweighted pooling of the data from these studies, they found a mean percentage error of 37% for TEB and 65% for esophageal Doppler. They went on to suggest a narrower limit of 30% as acceptable, which they derived from the theoretical scatter expected in agreement between two methods whose agreement is each ± 20% in relation to the true value. In this case, agreement between the two methods will average 28.3%, which they rounded up to 30% for simplicity. Their argument assumed that the precision of thermodilution as the reference method was no worse than ± 20% in relation to the real cardiac output. This they justified with reference to a review by Stetz et al. which examined the accuracy and reproducibility of measurement of cardiac output by thermodilution, and a study by Mackenzie et al. which compared three different devices for thermodilution measurement.103,104
However, there are significant reasons to question these assumptions in broader clinical practice. The studies included in the review by Stetz et al. 103examined the reproducibility of repeat measurement of cardiac output by thermodilution and were conducted in the cardiac catheterization laboratory or coronary/intensive care unit. They pointed out that measurements were invariably made during intervals of cardiovascular stability, so as to minimize the confounding effect of real variations in cardiac output on assessment of the reproducibility of measurement. The study by MacKenzie et al. 104was carried out in vitro on a circulation simulator and was not designed to be a test of accuracy and precision of thermodilution under clinical conditions. In contrast, the majority of the studies in our review were conducted intraoperatively or postoperatively, often in hemodynamically unstable patients, and deliberately sought to test the accuracy and precision of the various methods under sometimes difficult clinical conditions.
Recent studies are more revealing of the accuracy and precision of thermodilution in less tightly controlled perioperative conditions and during hemodynamic instability. Botero et al. compared bolus thermodilution in patients undergoing coronary artery surgery against an invasive in vivo gold standard technique in the form of an ultrasonic transit time flow probe positioned on the ascending aorta. Percentage error was 41.7% precardiopulmonary bypass and 46.1% postcardiopulmonary bypass.105Bajorat et al. compared bolus thermodilution with a similar flow probe in a pig model where hemodynamic instability was induced pharmacologically, and found a percentage error of 48.6% overall.106A number of the minimally invasive methods that we have reviewed here were also tested in parallel in these studies. Notably, thermodilution did not perform significantly better than any of them.
This raises questions about the appropriateness of imposing arbitrary limits on the acceptability of accuracy and precision of cardiac output measurement. Feldman, in a recent editorial, proposed a more dynamic approach to assessment of acceptability of agreement, based on receiver operating curve theory, and called Critchley and Critchley's 30% limits “a simplification that makes assumptions about the accuracy of thermodilution and does not consider the impact on decision-making.”4Indeed, few practicing clinicians would reject thermodilution via the pulmonary artery catheter as a valuable monitoring tool in appropriate patients such as in cardiac surgery, despite the evidence cited above of poorer precision than previously assumed. Nevertheless, of the 51 papers listed in table 1which were published within the last 5 yr, 63% quote Critchley and Critchley's criterion for acceptability in assessing the technique being tested in their study.
The efficacy of a clinical monitor involves many factors other than its absolute accuracy, and includes safety, convenience and adaptability, and cost. Each method reviewed has its practical limitations and advantages. A calibration maneuver is required for some pulse contour techniques but, in common with TEB, they can potentially be used in the awake patient. The Pco2RB method is entirely noninvasive in the intubated patient, but its use is restricted to this group. Pulse contour and Doppler devices can provide additional indices of volume status based on the shape of the measured waveform. Many of these devices require expensive single-use components (transducers, probes, or valves). The value of the information provided by these methods in influencing management and improving patient outcomes is currently debated,1–3and this is an evolving field. Clinicians may in fact be willing to accept lower accuracy in return for monitoring with less invasiveness than traditional methods like thermodilution via a pulmonary artery catheter, placement of which causes occasional serious injury to the patient, and which has been associated with poorer outcomes in some studies.107
Although often seen as a critical variable in studies in the field, the percentage error of agreement is only one marker of acceptability of a method, and it incorporates multiple components for both the method and the reference method: systematic alinearity of a method, interpatient variability, and intrapatient variability. The last is related to the task of tracking changes in cardiac output. In major surgery, reliable real time tracking of the direction of changes in cardiac output is arguably more important than the ability of the monitor to deliver a highly accurate single measurement under stable conditions.108,109
In our meta-analysis, the four methods achieved limits of agreement that were very similar. This is significant, as the various methods are based on quite different physical and physiologic principles. This suggests a fundamental limitation exists to the precision of agreement with a given reference standard like thermodilution that can be achieved in clinical practice, and which is independent of the particular method being tested. This level of precision of agreement remains well outside the 30% limits across a range of patient groups and clinical situations. Based on our empirical findings, a percentage error in agreement with thermodilution of ± 45% represents a more realistic expectation of achievable precision in clinical practice. Using the same mathematical theory as applied by Critchley and Critchley, this is consistent with percentage errors of approximately ± 30% for both thermodilution and the test method in their agreement with the real cardiac output.