Disagreement among many underpowered studies has led to an equivocal understanding of the efficacy of the 5-HT3 antagonist ondansetron in preventing the consequences of sympathectomy after subarachnoid anesthesia. The authors assessed the efficacy of ondansetron with respect to the overall quality and statistical power of the meta-analyses.
The authors used a standard and a newer method of meta-analysis, trial sequential analysis (TSA), to estimate adjusted CIs based on how much information has been accrued. They also used random-effects meta-analyses techniques, small trial bias assessment, selection models, sensitivity analyses, and the Grading of Recommendations on Assessment, Development, and Evaluation system. These results from the aforementioned techniques were compared, and importance of consideration of these factors was discussed.
Fourteen randomized placebo-controlled trials (1,045 subjects) were identified and analyzed. By using conventional meta-analyses, the authors determined that ondansetron was associated with reduction in the incidence of hypotension (relative risk = 0.62 [95% CI, 0.46 to 0.83], P = 0.001; TSA-adjusted CI, 0.34 to 1.12; I2 = 60%, P = 0.002) and bradycardia (relative risk = 0.44 [95% CI, 0.26 to 0.73], P = 0.001; TSA-adjusted CI, 0.05 to 3.85; I2 = 0%, P = 0.84). However, the authors found indications of bias among these trials. TSAs demonstrated that the meta-analysis lacked adequate information size and did not achieve statistical significance when adjusted for sparse data and repetitive testing. The Grading of Recommendations on Assessment, Development, and Evaluation system showed that the results had low to very low quality of evidence.
The analyses fail to confirm evidence that ondansetron reduces the incidence of hypotension and bradycardia after subarachnoid anesthesia due to the risk of bias and information sizes less than the required. As results from meta-analysis are given significant weight, it is important to carefully evaluate the quality of the evidence that is input.
This study assessed the efficacy of ondansetron using standard meta-analysis and more recently developed statistical techniques including small trial bias assessments, selection models, and trial sequential analyses. This study fails to confirm evidence that ondansetron reduces the incidence of hypotension and bradycardia after subarachnoid anesthesia.
Supplemental Digital Content is available in the text.
Disagreement among many underpowered studies has led to an equivocal understanding of the efficacy of the 5-HT3 antagonist ondansetron in preventing the consequences of sympathectomy after subarachnoid anesthesia.
This study assessed the efficacy of ondansetron using standard meta-analysis and more recently developed statistical techniques including small trial bias assessments, selection models, and trial sequential analyses. This study fails to confirm evidence that ondansetron reduces the incidence of hypotension and bradycardia after subarachnoid anesthesia.
SUBARACHNOID anesthesia is used for many abdominal and lower extremity surgeries. However, side effects from subarachnoid anesthesia carry significant mortality and morbidity, if untreated. Hypotension, particularly when accompanied by bradycardia, is the most serious manifestation of sympathectomy that very rarely may progress to cardiac arrest, if not promptly and properly treated. The incidence of hypotension and bradycardia has been reported to be 33 and 13%, respectively, in nonobstetric patients during spinal anesthesia.1,2 The incidence of hypotension is as high as 50 to 60% in obstetric patients.1,2
Hypotension after subarachnoid anesthesia results mainly from a decrease in systemic vascular resistance secondary to a blockade of sympathetic fibers.3 The Bezold–Jarisch reflex has been proposed as a mechanism for the accompanying bradycardia in this setting.4 This reflex is mediated by serotonin receptors (5-HT3 subtype) located on the vagus nerve and within the walls of the cardiac ventricles. The 5-HT3 receptors are activated in response to systemic hypotension,5 causing an increase in efferent vagal signaling, bradycardia, reduced cardiac output, and further exacerbation of hypotension.6
Ondansetron, a potent 5-HT3 receptor antagonist commonly used as an antiemetic drug,7,8 is potentially useful to attenuate this response. In the last few years, a number of studies have evaluated the effect of ondansetron to prevent hemodynamic changes resulting from subarachnoid anesthesia. The results of these studies are markedly variable, and all studies appear to be underpowered.
We evaluate the existing data on the efficacy of ondansetron to prevent hypotension and bradycardia due to subarachnoid anesthesia using standard meta-analysis and more recently developed statistical techniques including small trial bias assessments, selection models, and trial sequential analyses (TSAs).
Materials and Methods
A systematic search of the literature was undertaken in May 2015 and updated in September 2015. Databases included were MEDLINE through PubMed, the Cochrane Library, and Web of Science. The searches were limited to English language articles, but no date or patient age restrictions were imposed. Searches combined terms for ondansetron, anesthesia and analgesia, and the outcomes of interest, hypotension, and low blood pressure.* In addition, we searched www.clinicaltrials.gov for ongoing studies. Reference lists of included studies were evaluated by the investigators to identify additional relevant studies. Endnote X7 was used to combine and remove duplicate citations. We followed the Preferred Reporting Items for Reviews and Meta-Analyses guidelines for research reporting.9
Definition of Relevant Outcome
Primary outcomes: Incidence of hypotension and bradycardia after subarachnoid anesthesia.
Secondary outcomes: Vasopressor (phenylephrine and ephedrine) consumption.
Two authors (A.S.T. and R.S.T.) screened the literature and selected the relevant articles. Inclusion and exclusion criteria were established a priori. Inclusion criteria were clinical trials that studied the effect of ondansetron on hemodynamic changes after subarachnoid anesthesia. Exclusion criteria were studies that were not randomized, retrospective studies, and case reports.
Data Extraction and Synthesis
Two authors (A.S.T. and R.S.T.) independently extracted the relevant data from articles that met the selection criteria, and their results were compared to assure accuracy. Any differences were evaluated with repeated review, and disagreement was solved by consensus. Data collected include authors, year of publication, type of surgery, subarachnoid drug, definition of hypotension, hemodynamic monitoring method, preventative measures, incidence of hypotension and bradycardia, and amount of vasopressor (phenylephrine and ephedrine) consumed.
In studies that only reported medians with interquartile range or range, we assumed that the data distribution was normal, and the mean was close to the median. We interpreted the value of median as a mean and calculated the SD as interquartile range/1.35.10 For studies that did not report SDs, we imputed the SD using the average SD from the remaining studies with no missing SDs.11 We repeated our analysis keeping only those studies that reported on all necessary data to compute effect sizes as a sensitivity analysis.
Risk of Bias Assessment
We assessed the risk of bias using the Cochrane Collaboration’s risk of bias assessment tool.12 Two independent authors (A.S.T. and A.A.B.A.) assessed each trial, and any differences that arose where resolved by consensus. Factors that were assessed are as follows: (1) random sequence generation (selection bias), (2) allocation concealment (selection bias), (3) blinding of participants and personnel (performance bias), (4) blinding of outcome assessment (detection bias), (5) incomplete outcome data (attrition bias), (6) selective reporting (reporting bias), and (7) other bias.12 More details are available in Supplemental Digital Content 1, http://links.lww.com/ALN/B254.
A Priori Hypothesis for Sources of Variability in Effect Sizes
We considered that the following factors (moderators) could potentially affect the efficacy of ondansetron in attenuating the hemodynamic changes after subarachnoid anesthesia: (1) ondansetron dose, (2) subarachnoid bupivacaine dose, (3) subarachnoid opioid adjuvant, (4) utilization of colloid or crystalloid, (5) preload or co-load, and (6) patient weight. We tested whether any of these factors reduced (explained) the heterogeneity identified in the results.
We used random-effects models to determine the ondansetron effect a priori because random-effects models handle heterogeneity more robustly.13 We anticipated marked clinical heterogeneity on initial review because of extensive variability in spinal drug dose and definitions for the primary outcome variables. For dichotomous outcomes (e.g., incidences of hypotension and bradycardia), the relative effect sizes were calculated as relative risk (RR) with 95% CI. For continuous outcomes, we pooled estimates reported as mean difference with 95% CI. For both types of outcomes, we used inverse variance random-effects meta-analysis as primary analysis because we expected studies to be methodologically and clinically heterogeneous and that the heterogeneity would be reflected in the observed effect sizes being variant across studies.14
We assessed heterogeneity mainly using the I2 statistic, which is the proportion of total variance in the true effects across studies that can be attributed to true effect variability, rather than chance.15 The larger the I2, the more the heterogeneity. We considered that values from 0 to 40% might not be important, from 30 to 60% may represent moderate heterogeneity, from 50 to 90% may represent substantial heterogeneity, and from 75 to 100% considerable heterogeneity, as per Cochrane classification.11 We also computed the chi-square statistic for heterogeneity. Finally, for dichotomous outcomes, we assessed the magnitude of the estimated heterogeneity by comparing it with the corresponding empirical distribution derived by Turner et al.16 Values smaller than 0.1 were considered low, 0.1 to 0.5 were reasonable, 0.5 to 1.0 were fairly high, and above 1.0 represented fairly extreme heterogeneity.
We meta-analyzed all types of surgery together and then did subgroup analysis separating cesarean delivery and general surgery, as these two categories are known to have different incidence of hypotension and bradycardia.1
We used meta-regression analysis to explore whether the aforementioned (see A Priori Hypothesis for Sources of Variability in Effect Sizes) factors impact the relative effectiveness of ondansetron and whether these variables account for part of the heterogeneity observed. Results from meta-regression should be interpreted with caution because it is based on observational evidence, and we cannot randomize studies to have specific characteristics. It has been suggested that 5 to 10 studies are needed per predictor to assure valid results.11 The adjusted R2 shows the relative change in heterogeneity. We expected the inclusion of covariates to explain some of the between-study heterogeneity. A negative value suggests that covariates predict less heterogeneity than what would be expected by chance. This is common when there are a small number of trials and is an indication that results from the meta-regression should be treated with caution and not overinterpreted.17 Raw data and STATA commands used are presented in Supplemental Digital Content 1, http://links.lww.com/ALN/B254, and Supplemental Digital Content 2, http://links.lww.com/ALN/B255, respectively.
We explored for small-study effects both visually by drawing a funnel plot and statistically by applying Egger’s regression test.18 Both of these methods are erroneously considered to be highly specific means for detecting publication bias.19 We expect the probability of publication for a study to be related to the sample size. An asymmetric funnel plot is expected in the presence of publication bias. Publication bias is only one of the potential reasons for small-study effects. Small-study effects can be observed due to real differences in the relative effectiveness observed in small and large trials. For example, if an intervention works better for severely ill patients who are more difficult to recruit, they may be overrepresented in small trials. In this setting, we expect smaller trials to show larger effects. In this example, asymmetry is due to heterogeneity and not publication bias. To disentangle the effects of heterogeneity due to sample size and publication bias as a cause for small-study effects, we used a contour-enhanced funnel plot20 and a selection model that provides the correlation between the magnitude of effect and the probability of publication.21 A zero correlation implies that there is no publication bias. The selection model was estimated in OpenBUGS using Markov Chain Monte Carlo simulations.22 Publication bias is a missing data problem, and we have used assumptions regarding factors that affect the probability of a missing study (or a completed study remaining unpublished). A sensible and commonly used assumption that we employed was to assign probabilities of publication for the largest and smallest trials of those included in the meta-analysis. A sensitivity analysis was then employed, and we evaluated the robustness of the results according to the different assumptions employed. We assumed that the largest included trial has a probability of publication that ranged from 0.7 to 0.8, whereas the corresponding probability for the smallest trial ranged from 0.4 to 0.5. These probabilities have been used to infer a severe selection bias scenario as we are interested in exploring how robust results are to such a hypothesis.21
A recent study has shown that the estimation of treatment outcomes in meta-analyses differ depending on the strategy used, which can lead to major alteration in conclusions.23 Sensitivity analyses have been recommended to overcome this issue. According to the rational described in Data Analysis section, we used a random-effects model for primary analysis. We used a fixed-effects model for sensitivity analysis. The results from the two models were compared. If the results between the two models were similar, we planned to report the results from the random-effects analysis. If the results differed substantially, we planned to evaluate for small-study effects to determine whether differences between smaller and larger trials are genuine. We also consider the quality of the included trials. In the face of highly variable quality, we planned to repeat the analysis keeping only those studies that were assessed to have low risk of bias.23 Finally, we planned to repeat the analysis excluding those studies for which SDs were imputed by the largest SD from the observed studies as this method can be flawed due to the assumption of the underlying data distribution.
Trial Sequential Analyses.
We used TSA to infer whether the cumulative existing evidence is sufficient to draw firm conclusions or whether additional evidence is needed. TSA considers the addition of each trial in a cumulative meta-analysis as an interim analysis, controlling for type I and type II errors and making a judgment regarding whether a firm conclusion has been drawn. TSA can be employed to assess whenever the accumulated accrued information size in a meta-analysis falls short of the required information size estimated to address a specific realistic intervention effect size (relative risk reduction [RRR]).24 The method provides an estimation of the required information size or the accumulated meta-analytic sample size required to achieve prespecified levels of power and draws boundaries for benefit and harm as well as futility. Thus, it provides enticements for new high-quality trials if no boundaries have been surpassed and may stop trialists from doing further trials if boundaries for benefit, harm, or futility have been crossed.24 TSA can be used to prevent premature declaration of superiority of an intervention that is misled by a fortuitous low nominal P value. If the meta-analysis cannot reach the required information size, then results should be interpreted according to whether the trial sequential monitoring boundaries for benefit, harm, or futility have been surpassed.25
The assumptions made for TSA are analogous to the assumptions made for group sequential analysis (or interim analysis) in a single trial with group sequential design. The mathematical foundation of TSA is similar to that used in group sequential analysis with the trial sequential monitoring boundaries calculated by numerical integration pending a continuous α-spending function and the acquired information fraction at the time when the new trial data are added to the cumulated meta-analysis.26,27 In other words, the number of patients required for a meta-analysis to come to valid conclusion is influenced by the following factors: (1) type 1 error risk (α), (2) type II error risk (β), (3) control event proportion (the unweighted event rate in the cumulative control groups), (4) The heterogeneity between trials or the diversity, and (5) the anticipated intervention effect (preferably a priori and stated in the protocol).
TSA-adjusted CIs are derived considering the adjusted levels of statistical significance calculated by sequential monitoring boundaries, thereby taking into account how much information has actually been accrued considering the effect size originally anticipated. For the traditional 95% CIs to be trusted, it requires that enough information has been accrued to address a specific effect size. Ninety-five percent CIs may be deceptive when insufficient information is at hand. The TSA-adjusted CIs correspond to the more restrictive statistical significance level used as a criterion to stop or continue a single trial after an interim analysis.24,25
We used two-sided tests with type I error set at 5% and power set at 90%. Other data (e.g., the proportional outcome in each group and the heterogeneity D2 statistic) were derived from the meta-analyses results as a measure of diversity.28 The required information size for each meta-analysis was adjusted for diversity as the inconsistency factor underestimates the required information.28
The information size in a meta-analysis is the complete number of participants from all the included trials (the “meta-analytic sample size”). The information size is not one sample but an aggregated number of several samples from the included trials, each with their own randomization, and therefore, it perhaps actually should be called something other than “a sample size” emphasizing the importance of heterogeneity.28
Diversity (D2) is a measure of heterogeneity measuring the proportion of between-trial variance to the sum of the between-trial variance and the arithmetic mean of the within-trial variances from the included trials. The diversity essentially measures the degree-of-variance decrease going from a random-effects model to a fixed-effect model.28
Inconsistency factor (I2) is a measure of heterogeneity proposed by Higgins and Tompson29 in 2002, measuring the proportion of between-trial variance to the sum of the between-trial variance and a common within-trial variance. The common within-trial variance is estimated by the median of variances in the included trials.
We selected a power of 90% (and not 80%) because of the following considerations: (1) meta-analysis is expected to have more power and precision than individual trials. (2) Well-conducted meta-analyses from systematic reviews are ranked very high in the evidence hierarchy; therefore, if a meta-analysis of trials of a specific intervention fails to show an anticipated effect (crossing the TSA futility boundaries), it should only do so if it has a high power to detect a realistic intervention effect. (3) Large pragmatic trials now have been conducted estimating a priori sample size based on a power of 90%.30,31 There is in fact no good reason why results from both trials and meta-analysis should reject differences, which are real (type II error) in one of five trials or meta-analyses. Rejecting real differences in 1 of 10 trials is not overly conservative.
We planned to do TSAs only for the primary outcomes (i.e., hypotension and bradycardia). Analyses was performed as per manual guidelines.32 To handle zero-event trials, we used the empirical continuity adjustment method.33 We added an adjustment factor of 0.5 to the number of events in the ondansetron group and in the placebo group in case of zero events in one group.32
We examined our meta-analyses of incidence of hypotension and bradycardia (binary outcomes) applying TSA in three different scenarios24 :
The diversity-adjusted required information size based on an (a priori) anticipated intervention effect of 20% RRR of the outcome of hypotension and bradycardia using the diversity (D2) and the control event proportion (incidence) estimated among all the trials.
The diversity-adjusted required information size based on an anticipated intervention effect corresponding to the point estimate of the RRR estimated in the traditional meta-analyses using the D2 and the control event proportion estimated among all the trials.
As sensitivity analysis, we calculated the diversity-adjusted required information size based on an anticipated intervention effect corresponding to the point estimate of the RRR estimated in the traditional meta-analyses and using a diversity of 20% if the estimated diversity among all trials is 0. The control event proportion was estimated among all trials.
Grading of Recommendations Assessment, Development, and Evaluation System.
Grading of Recommendations on Assessment, Development, and Evaluation (GRADE) is a system for rating quality of evidence and strength of recommendations that takes into account the following factors: study limitations, inconsistency in results among included studies, indirectness of evidence, imprecision, and reporting bias.
Study design (e.g., randomized trial);
Risk of bias: Not serious, serious, and very serious, based on the Cochrane Collaboration’s risk of bias assessment;
Inconsistency: Widely differing estimates of the treatment effect across individual studies, graded as not serious, serious, and very serious;
Indirectness: Indirect comparisons and differences in populations, interventions, and outcomes of interest between the studies, graded as not serious, serious, and very serious;
Imprecision: When studies include relatively few patients and few events occur (in other words, estimates of the effect usually have wide CIs that include both important benefits or no important effects), graded as not serious, serious, and very serious; and
Other considerations: Publications bias (as discussed on the Risk of Bias Assessment section), large effect (defined as a relative risk [RR] of more than 2.0 or less than 0.5 “based on consistent evidence from at least two studies, with no plausible confounders” and a very large effect as a RR of more than 5.0 or less than 0.2 “based on direct evidence with no major threats to validity), plausible confounding (all plausible confounding would reduce the demonstrated effect or increase it if no effect was observed), and presence of a dose-response gradient.
Based on that, the system assigns each outcome meta-analysis one of the following for quality grades36 :
High quality: Further research is very unlikely to change the confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on the confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on the confidence in the estimate of effect and is likely to change the estimate.
Very low quality: Any estimate of effect is very uncertain.
We used the (1) Review Manager (RevMan) [Computer program], version 5.3 (The Nordic Cochrane Centre, The Cochrane Collaboration, Denmark, 2014) for the meta-analyses; (2) STATA version 13.0 (STATA Corp., USA) for the meta-regression and publication bias analyses; (3) the GDT software for developing the GRADE (McMaster University and Evidence Prime Inc., Canada); and (4) OpenBUGS release 2.3.2 (Cambridge Institute of Public Health, United Kingdom) for selection bias; and finally (5) the TSA software version 0.9 beta (Copenhagen Trial Unit, Denmark), 2011, for TSAs.
Studies Selection and Characteristics
A total of 189 citations were identified in our initial search: 48 citations in PubMed, 132 in Cochrane Library, and 9 in Web of Science. A Preferred Reporting Items for Reviews and Meta-Analyses flow chart is presented in figure 1. Fourteen studies were included in our analysis. All studies were published between 2008 and 2015. Attempts to contact authors to clarify the results and acquire unpublished data were uniformly unsuccessful except for our own trial. As such only information available from the publication was used. Table 1 summarizes the demographic and clinical characteristics of the included studies.
Risk of Bias Assessment
All trials included were double blinded. Only one trial scored “high risk” for one of the assessment parameters. Cochrane risk of bias analysis is illustrated in figure 2.
Traditional (Conventional) Meta-analyses Results
Fourteen randomized placebo-controlled trials (1,045 patients: 602 received ondansetron and 443 received placebo) were analyzed, 9 studies reported cesarean delivery (711 patients: 435 received ondansetron and 276 received placebo), and 5 reported on different types of general and orthopedic surgery (334 patients: 167 received ondansetron and 167 received placebo).
Effect of Ondansetron on the Incidence of Hypotension According to Meta-analysis.
According to meta-analysis using traditional methods, the estimated effect of ondansetron on the incidence of hypotension showed a statistically significant preventive effect (RR = 0.62 [95% CI, 0.46 to 0.83]; P = 0.001); however, there was substantial heterogeneity (I2 = 60%; P = 0.002). We conducted a subgroup analysis by separating trials of obstetrical from other types of surgery. No statistically significant differences were identified between the two groups. Trials of cesarean delivery demonstrated a reduced risk of hypotension among those pretreated with ondansetron (RR = 0.62 [95% CI, 0.44 to 0.87]; P = 0.006); however, there remained substantial heterogeneity (I2 = 69%; P = 0.001). A similar magnitude of effect (0.58; 95% CI, 0.29 to 1.15) was found when we pooled data only from studies not including cesarean delivery (i.e., general surgery), but results were statistically insignificant. Because only five small studies were included, there were not enough subjects provided the power to find a true difference between ondansetron and placebo (95% CI, 0.36 to 1.40; fig. 3A). The test for subgroup differences showed no differences between the two groups (P = 0.86), which was anticipated, as there is a perfect overlap between the CIs for the two subgroups. Number needed to treat (average number of patients who needs to be treated with ondansetron to prevent one bad outcome that would have occurred if treated with placebo) is 5.85 (95% CI, 3.95 to 11.11) in all types of surgery and 4.55 (95% CI, 2.94 to 11.11) in cesarean delivery.
Effect of Ondansetron on the Incidence of Bradycardia According to Meta-analysis.
The estimated effect of ondansetron on the incidence of bradycardia showed statistically significant preventative effect (RR = 0.44 (95% CI, 0.26 to 0.73); P = 0.001), with no statistical heterogeneity (P = 0.87) using traditional meta-analysis. It should be noted that there are many trials with zero events in at least one of the arms, and this may have invalidated the results. Pooling results for cesarean delivery gave RR = 0.40 (95% CI, 0.22 to 0.72); P = 0.002, without statistically significant heterogeneity. Only three of the general surgery studies reported the ratio of incidence of bradycardia between patients treated with ondansetron and placebo as RR = 0.56 (95% CI, 0.20 to 1.58). However, the sample size was inadequate to detect any difference between the subgroups (fig. 3B). Number needed to treat is 23.3 (95% CI, 12.7 to 166.7) in all types of surgery and 18.9 (95% CI, 10.3 to 111.1) in cesarean delivery.
A visual inspection of the funnel plot showed asymmetry (fig. 4, A and B). This finding was further evaluated by conducting Eggers test for small-study effects (regression line showing the association between SE and effect size embedded in the funnel plot), which gave a P value of 0.022 when all studies were considered, clearly suggesting the presence of small-study effects and P value of 0.064 when only cesarean delivery studies were included. The marginally nonsignificant value for the cesarean studies may be caused by lack of power. Small-study effects may be caused by publication bias or true differences between small and large studies. By drawing a contour-enhanced funnel plot, we see that there are small nonpositive studies missing (from the white area), suggesting that the omission of small negative studies may be the cause for the asymmetry in the funnel plot (fig. 4, C and D). Applying a selection model showed an association, although marginally not significant, between probability of publication and magnitude of effect, which is the definition of publication bias, with a correlation of −0.69 (95% CI, −0.99 to 0.01) when all studies were considered, whereas when only cesarean delivery studies were evaluated, the corresponding number was −0.53 (95% CI, −0.98 to 0.41). There is probably a publication bias operating on the meta-analysis (correlation marginally includes 0 when all studies are considered), but once again lack of power does not allow us to draw definite conclusions (especially in the cesarean subgroup). The pooled risk ratio for hypotension adjusted for publication bias is 0.70 (95% CI, 0.49 to 0.94) when all studies were considered, whereas when only cesarean delivery studies were evaluated, the pooled risk ratio is 0.66 (95% CI, 0.41 to 0.97).
For bradycardia, the correlation between magnitude of effect and propensity for publication was −0.31(95% CI, −0.95 to 0.73) when all studies were considered and −0.44 (95% CI, −0.97 to 0.70) for cesarean studies only. The interval is very wide reflecting the lack of evidence, and it suggests a null correlation. Events are rare, and larger studies are needed to derive powerful conclusions adjusted for publication bias; effect estimate is 0.53 (95% CI, 0.19 to 1.88) for all studies and 0.51 (95% CI, 0.16 to 1.19) for cesarean studies only.
Heterogeneity variance was estimated, using the method of restricted maximum likelihood, to be 0.132, which lies in the median of Turner et al.’s16 empirical distribution for a pharmacologic intervention versus placebo (lognormal [−2.13,1.58^2]), suggesting a reasonable heterogeneity variance. We considered meta-regression analyses to explore whether differences in true effect sizes are explained by ondansetron dose, bupivacaine dose, use of colloid, preload versus co-load, or mean patient weight.
Ideally, we should select one to two characteristics driven by clinical considerations. The most important characteristic is ondansetron dose, and we provide results from that analysis. We also regress the logarithm of the observed risk ratio with the other covariates. The results are presented in table 2 ( appendix 1). Effect size is correlated with bupivacaine dose and use of colloid. We did not run a regression model with the covariates because of missing values and the small number of studies in which the covariate was reported. It should be noted that in applying multiple univariate meta-regression, we ran a high risk of spurious findings due to an inflated type I error rate, and we cannot exclude the possibility of confounders. We did not run a meta-regression on the use of subarachnoid opioid because trials that had used colloid had also used opioid and results would be similar to the meta-regression involving the use of colloid.
We performed sensitivity analyses for the primary outcomes and phenylephrine consumption by evaluating the difference in the outcome direction, magnitude, and significance when we used fixed-effect and random-effects models for the meta-analysis or if we removed the studies that scored “unclear” in more than three of the Cochrane risk of bias assessment. We did not find a difference in the direction of the drug effect or statistical significant of any outcome variable except for the incidence of hypotension in cesarean delivery. The direction of effect did not change for this outcome; however, the meta-analysis become barely statistically insignificant (P = 0.06) when we removed studies that scored “unclear” in more than three parameters (data not shown).
Trial Sequential Analysis
We conducted TSA for the incidence of both hypotension and bradycardia. The tests were conducted with the following conditions: boundary type was two sided, type I error of 5%, type II error of 10% (90% power), the incidence in the control group and the diversity were determined from the conventional meta-analysis, the effect measure was relative risk, the model was random effects.
Incidence of Hypotension.
In All Trials.
In the first scenario, we used an a priori anticipated intervention effect of 20% RRR, diversity of 68% in the included trials, and a control event proportion of 43.9% derived from the accumulated control groups of the included trials. By using these assumptions, the required information size is 4,053 and none of the sequential monitoring boundaries have been surpassed even though the conventional boundaries have been surpassed. The conventional 95% CI = 0.46 to 0.83; however, the TSA-adjusted 95% CI = 0.34 to 1.12, making the meta-analysis inconclusive as to whether the intervention is statistically significant considering a realistic intervention effect, sparse data, and multiple testing. This analysis demonstrates that the required information (4,053) size is significantly larger than the number of subjects included in the analysis (1,045), suggesting that more information is required to come to a conclusion ( appendix 2A). The relatively large effect indicated in the conventional meta-analysis may be due to bias as trials with high or uncertain risk of bias are included, and there is evidence of missing data due to publication bias as discussed earlier (Publication Bias in Results section).
In the second scenario, we used the point estimate of the intervention effect, a 38.8% RRR that was derived from all the included trials, a diversity of 68% in the included trials, and a control event proportion of 43.9% derived from the cumulated control groups of the included trials. The required information size is 1,031, and the trial sequential monitoring boundaries for benefit have been surpassed. The conventional 95% CI = 0.46 to 0.82 and the TSA-adjusted CI was the same as the conventional 95% CI as the required information size has been reached ( appendix 2B). The assumption of the RRR is the only value changed between the two scenarios. A 39% RRR was derived from the point estimate in indicated in the meta-analysis of the included trials. However, this estimate is data driven and does not have the inferential force of an a priori estimate.
Considering Cesarean Trials Only.
In the first scenario, we used an a priori anticipated intervention effect of 20% RRR, diversity of 74% in the included trials, and a control event proportion of 56.0% derived from the accumulated control groups of the included trials. By using these assumptions, the required information size is 3,187, and none of the sequential monitoring boundaries have been surpassed even though the conventional boundaries have been surpassed. The conventional 95% CI = 0.45 to 0.87; however, the TSA-adjusted 95% CI = 0.32 to 1.22, making the meta-analysis inconclusive as to whether the intervention is statistically significant considering a realistic intervention effect, sparse data, and multiple testing. This analysis demonstrates that the required information (3,187) size is significantly larger than the number of subjects included in the analysis (711), suggesting that more information is required to come to a conclusion ( appendix 2C). The relatively large effect indicated in the conventional meta-analysis may be due to bias as trials with high risk of bias are included, and there is evidence of missing data due to publication bias as discussed earlier (Publication Bias in Results section).
In the second scenario, we used the point estimate of the intervention effect, a 38.8% RRR that was derived from all the included trials, a diversity of 74% in the included trials, and a control event proportion of 56.0% derived from the cumulated control groups of the included trials. The required information size is 840, and the trial sequential monitoring boundaries for benefit have been surpassed. Both the conventional 95% CI = 0.45 to 0.87 and the TSA-adjusted CI are similar to the conventional as the boundary is surpassed ( appendix 2D). The assumption of the RRR is the only value changed between the two scenarios. A 38.8% RRR was derived from the point estimate indicated in the meta-analysis of the included trials. However, this estimate is data driven and does not have the inferential force of an a priori estimate.
Incidence of Bradycardia.
In All Trials.
In the first scenario, we used an a priori anticipated intervention effect of 20% RRR, a diversity of 0% in the included trials, and a control event proportion derived from the cumulated control groups of the included trials of 11.4%. These assumptions yield a large required information size of 7,096 participants. None of the sequential monitoring boundaries have been surpassed even though the conventional boundaries (P < 0.002) have been surpassed. The conventional 95% CI = 0.25 to 0.74; however, the TSA-adjusted 95% CI = 0.05 to 3.85 with only the conventional boundary surpassed (fig. 5A).
In the second scenario, we used the point estimate of the intervention effect of 60% RRR, a diversity of 0% in the included trials, and a control event proportion of 11.4% derived from the cumulated control groups of the included trials. The required information size is 630 being considerably lower under these data-driven assumptions. The required information size to detect or reject an RRR of 60% has been reached, and the TSA-adjusted CI is similar to the conventional CI (fig. 5B).
In the third scenario, we used the point estimate of the intervention effect, 57% RRR, an anticipated diversity of 20% accounting for a possible increase in heterogeneity when further trials are included, and a control event proportion of 11.4% derived from the cumulated control groups of the included trials. The required information size is 931, and the required information size to detect or reject an RRR of 57% has not been reached. The conventional 95% CI = 0.25 to 0.74, and the TSA-adjusted CI is identical as the trial sequential monitoring boundary has been surpassed. Even though the required information size is adjusted for expected increase in heterogeneity, this analysis is still mainly driven by a point estimate from the meta-analysis of trials, which incur a risk of bias. As such, the effect indicated may be a mere bias effect instead of genuine effect of the intervention (fig. 5C).
Considering Cesarean Trials Only.
In the first scenario, we used an a priori anticipated intervention effect of 20% RRR, a diversity of 0% in the included trials, and a control event proportion derived from the cumulated control groups of the included trials of 11.4%. These assumptions yield a large required information size of 7,445 participants. None of the sequential monitoring boundaries have been surpassed even though the conventional boundaries (P < 0.002) have been surpassed. The conventional 95% CI = 0.21 to 0.74; however, the TSA-adjusted 95% CI = 0.03 to 4.99 with only the conventional boundary surpassed (fig. 5D).
In the second scenario, we used the point estimate of the intervention effect of 57% RRR, a diversity of 0% in the included trials, and a control event proportion of 11.4% derived from the cumulated control groups of the included trials. The required information size is 745 being considerably lower under these data-driven assumptions. The required information size to detect or reject an RRR of 57% has been reached, and the TSA-adjusted CI is similar to the conventional CI (fig. 5E).
In the third scenario, we used the point estimate of the intervention effect, 57% RRR, an anticipated diversity of 20% accounting for an increase in heterogeneity when further trials are included, and a control event proportion of 11.4% derived from the cumulated control groups of the included trials. The required information size is 931, and the required information size to detect or reject an RRR of 57% has not been reached; the conventional 95% CI = 0.21 to 0.74, and the TSA-adjusted 95% CI = 0.20 to 0.79, which is quite similar. Even though the required information size is adjusted for expected increase in heterogeneity, it is still mainly driven by a point estimate from the meta-analysis of trials that incur a risk of bias. As such, the effect indicated may be a mere bias effect instead of genuine effect of the intervention (fig. 5F).
Table 3 summarizes the GRADE assessments findings and recommendations.
By using conventional meta-analysis, we found that ondansetron given before subarachnoid anesthesia attenuated the incidence of hypotension and bradycardia when all types of surgery were considered together or when cesarean delivery was considered independently with a highly statistically significant result. We also found that phenylephrine consumption was reduced in cesarean delivery using conventional meta-analysis. However, using TSA the validity of these conclusions is brought into question. When we assessed the incidence of hypotension and bradycardia findings in all types of surgeries, we did not find sufficient evidence to support these findings. The reason for the difference is that the analysis was underpowered to come to this conclusion and did not reach the TSA-adjusted statistical significance due to sparse data and repetitive testing in the cumulative meta-analysis. Furthermore, when we assessed the quality of evidence, using the GRADE system, we found it to be low to very low.
Meta-analysis is very important method used to consolidate evidence and is ranked at the top of the quality of evidence hierarchy.49 It is commonly used to collate outcomes from multiple small trials. However, traditional meta-analyses may not be reliable and may show discrepancies with findings of subsequent large randomized control trials.50 For instance, when a meta-analysis includes a small number of trials, comprising a small number of patients, random errors can occur, leading to spurious conclusions (i.e., false-positive or negative findings) just as in underpowered individual trials.51 Addressing this concern, techniques to test the reliability and quality of meta-analysis, including TSA, have been developed.
Seventy-seven percent of the meta-analyses in the Cochrane Library are underpowered to detect even a 30% RRR, and nearly, all meta-analyses are underpowered to detect a 10% RRR.52,53 Therefore, most meta-analyses can be considered merely interim analyses on the way to reach the required information size in a random-effects model. Consideration of the RRR that the information contained in a meta-analysis is capable of identifying is important at the outset of the analysis.51,54 Finally, at least one simulation study strongly indicates that a huge fraction of meta-analyses overestimate the intervention effect by 20 or 30% when there is truly null effect if the required information size has not been reached.55 Because significant weight is placed on the outcome of meta-analyses, it is not clear that they should be analyzed in a more relaxed manner than that required for high-quality clinical trials within the regulations of the Food and Drug Administration and European Medicines Agency.
Meta-regression is a statistical method that can be used to explain heterogeneity and identify the factors that improve (or reduce) the efficacy of an intervention.56 However, it is important to keep in mind that this approach identifies associations rather than causality.57 Our meta-regression analyses identified two significant associations, namely the dose of intrathecal bupivacaine and colloid preload. These modifiers were associated with different degrees of effectiveness of ondansetron on the incidence of hypotension. The finding that the higher the dose of bupivacaine associated with lower effect of ondansetron may be related to variability in the degree of sympathetic blockade between techniques or a ceiling effect for ondansetron that is overcome with high-dose bupivacaine.58 The negative effect of colloid preload may be explained by some previous findings that preload with colloid (but not crystalloid) has been shown to reduce the incidence of hypotension.59 However, this is considered controversial.
Three different TSA scenarios yielded different results. This can be understood as follows:
The first important consideration is the risk of bias. TSA can only account for additional random error resulting from sparse data and repetitive testing when the included trails have an overall low risk of bias. If trials are included with a high bias risk, a breakthrough of the trial sequential monitoring boundary may be a result of a bias effect. Among the trials considered in this analysis, all but one had uncertain or high risk of bias.
The second important assumption is the size of the anticipated intervention effect. A realistic anticipated intervention effect proposed a priori at the protocol level is regarded as the most valid analysis as it minimizes the risk of the analysis being data driven. However, the anticipation of the intervention effect may be unrealistically low or high after data collection (just as a Bayesian prior may be pessimistic or optimistic). The collated data may eventually tell us that our anticipation was in fact pessimistic or optimistic or there may be no or variable proposed intervention effect sizes among the included trials. This was the case in our analysis. For this reason, we used sensitivity analysis. Herein, we compared the results using several effect sizes.
An analysis using an anticipated intervention effect derived from the point estimate of the RRR was evaluated. In this case, if TSA does not demonstrate breakthrough of the sequential monitoring boundaries, it would be very clear that sufficient evidence is lacking that the intervention has this effect. However, if there is a breakthrough of the sequential monitoring boundaries using a data-driven estimate of effect size, it may be that the intervention effect actually is as large as the point estimate indicates; however, the derived value may be fortuitous, and the analysis does not have the same weight as an analysis using an a priori anticipated intervention effect. For example, the conventional 95% CI = 0.25 to 0.74; however, the TSA-adjusted 95% CI = 0.05 to 3.85 when the incidence of bradycardia assessed in all trials with a priori anticipated intervention effect of 20% RRR.
An analysis using the point estimate from the conventional meta-analysis and the actual diversity measured in the meta-analysis and with the incidence of bradycardia assessed in all trials was conducted. For example, TSA-adjusted CI (95% CI = 0.24 to 0.74) is similar to the conventional CI when there is a breakthrough of the trial sequential monitoring boundary.
An analysis using the point estimate from the conventional meta-analysis and an anticipated diversity of 20% was conducted, as the estimated diversity in the actual meta-analysis is 0. This is because the heterogeneity (D2 or I2) does not become stable before at least 20 trials have been included in the meta-analysis.60 By using the same bradycardia in all trials example, we found that the conventional and the TSA-adjusted CI (95% CI = 0.24 to 0.74) are identical as the required information size has been reached. But as we discussed earlier, this analysis is still mainly driven by a point estimate from the meta-analysis of trials, which incur a risk of bias.
If all of the aforementioned scenarios analyses indicate a statistically significant effect (breakthrough of one of the sequential monitoring boundaries for benefit or harm), we may also recommend a TSA analysis using an anticipated intervention effect derived from the limit of the 95% CI closest to the RRR of zero. If this TSA demonstrates a breakthrough of the sequential monitoring boundaries, it “clears the table” as nearly all possible anticipated intervention effects (given the data) would yield the conclusion that the intervention works.61 This last is clearly not the case for any of the Ondansetron meta-analyses as an anticipated intervention effect of 17% RRR for hypotension and a 27% RRR for bradycardia do not provide TSA analyses with breakthrough of the trial sequential monitoring boundaries. However, even though the required information size is adjusted for expected increase in heterogeneity, this analysis is still mainly driven by a point estimate from the meta-analysis of trials, which incur a risk of bias.
This is especially not worthy when a meta-analysis includes trials with overall high risk of bias as the effect, mirrored by the z-curve breakthrough, may be due to a bias effect and not a genuine intervention effect.
The use of TSA has the advantage of controlling for the risk of type I and II errors when data are sparse, that is less than the required information size, to demonstrate an anticipated realistic and relevant effect size, and when cumulative meta-analysis is repeatedly updated after each new trial. Two available methods to control for the risk of type I error are as follows: (1) adjusting the threshold for statistical significance to account for the increased risk of random error (e.g., trial sequential monitoring [O`Brien-Fleming] boundaries) and (2) penalizing the test statistics in congruence with the strength of the available evidence. To control for type II error before a meta-analysis surpasses its required information size, thresholds are set up for when the tested drug/intervention can be deemed nonsuperior (and/or noninferior) to the control group (i.e., futility boundaries).32 One of the limitations of TSA is that the results can be markedly affected by heterogeneity when the meta-analysis is updated with new trials. Heterogeneity only becomes stable in a cumulative meta-analysis when 20 to 25 trials have been included. When less than 20 trials are included in a meta-analysis, diversity can be a major factor. As such, it is important to consider the possibility of heterogeneity increasing when further trials are included. This is especially important if initially the statistical heterogeneity is absent as was the case in our bradycardia analyses.
TSA may have varying assumption regarding how many times the cumulative meta-analysis has been tested. This addresses the common statistical problem of multiple testing or adding a new data point or to study the analysis until the desired conclusion is reached. The most conservative assumption is that a cumulative meta-analysis has been carried out every time the result of a new trial has been added to the meta-analysis. Another possible assumption is that trials are reported in groups, and the TSA program can accommodate to this situation and “exclude” some of the “interim analyses” due to this assumption. The calculated boundaries based on the continuous α-spending (the LanDeMets α-spending function)62 in this framework are first and foremost dependent on the accrued information fraction and less dependent on the assumption of how many individual tests.26
One of the main problems in cumulative meta-analysis is heterogeneity between trials that creates the necessity to apply various priors for this factor. TSA is a frequentist approach. We have only identified one article that takes a semi-Bayesian approach63 ; however, in this report, the method lacked priors for the anticipated intervention effects. We do not regard TSA as the only possible way to incorporate a relationship between an anticipated intervention effect and the actual accrued information. However, its resemblance to interim analyses makes it familiar to trialists.
To our knowledge, a fully Bayesian meta-analysis concept incorporating assumptions for prior distributions, the intervention effect, and heterogeneity is not ready available. In a Bayesian analysis, combinations of optimistic, realistic, and pessimistic priors for both the intervention effect and the heterogeneity would possibly create nine scenarios that may be even more complex to interpret than a TSA with three scenarios (one primary and two sensitivity analyses).
One major limitation of this meta-analysis is the inconsistency in definition of hypotension and bradycardia used in the included studies. This factor seems to have a major influence in study results. For instance, Terkawi et al.47 found that the incidence of hypotension dramatically changes using different clinically reasonable definitions for hypotension. Significant heterogeneity was derived from variability in this definition. The definition of bradycardia was also markedly variable between studies, although our conventional meta-analysis did not show heterogeneity in this outcome. Baseline blood pressure was measured once in all studies, except Wang et al.43 and Wang et al.,42 and based on that single reading, the authors defined their hypotension threshold. Furthermore, many studies measured the blood pressure at 5-min intervals, which may be inadequate to capture transient hypotension during treatment.
Publication bias (mainly small trial bias) likely compromised the results of the meta-analysis. Both visual and statistical methods detected small-study effects, and subsequent visual and statistical methods showed that the exaggerated effects in smaller trials are probably due to publication bias.
The majority of the included trials had uncertain allocation concealment. Allocation concealment is the bias domain most consistently shown to be associated with bias effect.64
TSA has been criticized as being a conservative approach as it uses an a priori intervention effect65 and the total variance for calculating the required information size in the random-effects meta-analysis.66,67 Using an a priori intervention effect does not consider the intervention effect estimated from the data already accrued. Applying such an approach may lead to larger required information sizes.65 Considering the total variance in the random-effects model due to random variation can be considered a “worst-case scenario.”68 However, we rarely know whether variation is caused by systematic differences or by random variation,67,69 and it, therefore, seems mandatory to perform analyses considering the possibility that all the variance encountered in the random-effects meta-analysis could be due to chance findings.70,71
Well-designed and well-powered trials, with overall low risk of bias and large samples sizes addressing realistic intervention effects, are necessary for the evaluation of these pharmacologic effects. The definition of the primary outcome and its trigger for treatment is critical. Specifically for the study of hemodynamic effects after subarachnoid anesthesia, we recommend measurement of stable baseline hemodynamics before performing anesthesia to more accurately define the target blood pressure treatment point for each patient. We also recommend that researchers report the vasopressors consumption, frequent measures of hemodynamic outcome (by taking repeated vital signs measurements at small interval, not more than 3 min, before and after the dose of vasopressor given). Our meta-regression results did suggest that the dose of intrathecal bupivacaine and the use of colloid preload are associated with different degree of effectiveness of ondansetron. There may be an interaction between these two factors. Although most of the studies that showed benefit used 10 mg bupivacaine without adjuvants, this dose is unlikely to be sufficient for longer duration of surgery. Therefore, to be able to generalize these findings, future studies may need to group their patients based on different intrathecal drug doses.
In conclusion, we used more recent statistical techniques for meta-analysis to assess the validity of results found with conventional meta-analysis. Clarity about the validity of results from meta-analysis is vital as they are given large weight in pyramid of evidence. This is particularly important in the field of anesthesiology, as trials addressing important questions are often small and individually not powered to yield conclusive results unless the anticipated effect size is very large. Meta-analysis is commonly performed on these trials, and the results are given significant importance. These methods address often incorrect hidden assumption that the information size is sufficient to report the 95% CI or to test at a 5% statistical significance level.
Applying these considerations to our meta-analysis, there was insufficient information to conclude a RRR of 20%, even when trials with evidently high risk of bias were retained. Among trials with overall low risk of bias, there was insufficient information to conclude the higher RRR of 32% that was indicated by the point estimate in the conventional meta-analysis. Although conventional meta-analysis showed a statistically significant effect of ondansetron in mitigating the incidence of hypotension and bradycardia and reducing phenylephrine consumption, we found that these conclusions were based on trials that were underpowered and had low to very low quality due to risk of bias, imprecision, and indirectness. Essentially, the quality of the meta-analysis is greatly affected by the quality of the trial data included. Evaluation of the quality of the original data is a critical step before the results of meta-analysis can be incorporated into clinical practice.
The use of ondansetron may be effective and simple to implement; however, with the currently available evidence, it is hard to make any recommendations based on efficacy in this setting. Therefore, better-designed, well-powered prospective studies are necessary to make recommendations on the use of ondansetron to prevent hypotension and bradycardia resulting from subarachnoid anesthesia.
Support was provided solely from institutional and/or departmental sources.
Dr. Wetterslev is a member of the Copenhagen Trial Units task force for developing theory and software for trial sequential analysis. Dr. Mavridis received research funding from the European Research Council (IMMA 260559). The other authors declare no competing interests.
PubMed search strategy: ((((((ondansetron[mesh]) OR zofran))) AND (((Anesthesia) OR analgesics) OR analgesia))) AND (((((("Hypotension"[Mesh]) OR "Blood Pressure"[Mesh]) OR "Arterial Pressure"[Mesh])) OR low blood pressure)). Search strategies for Cochrane and Web of Science available on request.
Bubble plot. (A) From all studies, and (B) from cesarean delivery studies only. Effect of ondansetron dose on the incidence of hypotension. The red line represents the linear regression, and the blue lines represent its 95% CI. confl and confu are the lower and upper bounds of the 95% CI, respectively. logRR = logarithm of the risk ratio.
Trial Sequential Analysis for incidence of hypotension, using required information size. (A) Scenario 1 in all studies, (B) scenario 2 in all studies, (C) scenario 1 in cesarean delivery studies, and (D) scenario 2 in cesarean delivery studies. The red inward sloping lines to the left of the graph make up the trial sequential monitoring (O’Brien-Fleming) boundaries. The red outward sloping lines to the right of the graph represent the futility region. The z-statistic (summary estimate over its SE) is computed when a trial enters the analysis. The blue line is the cumulative Z-curve, and each black square in the line represents one study. The number (in black) that is parallel to the end of the Z-curve represents the total number of patients in the corresponding meta-analysis. The intersection of the red lines at Z-score +1.96 and −1.96 represent the “conventional” significant P value (0.05). CEP = control event proportion; RIS = required information size; RRR = relative risk reduction.