With interest, we read the article by Colquhoun et al. on the association between a tidal volume regimen during one-lung ventilation and postoperative pulmonary complications.1  In the article’s title and Discussion section, the authors claim that the tidal volume regimen was not associated with the studied outcome, and in the Results section, they explicitly report a “lack of association.”

While it is not our intention to criticize or debase this otherwise excellent study, we would like to address a fundamental statistical misconception that we regularly observe in medical literature: the misinterpretation of nonsignificant hypothesis test results as evidence for the lack of a difference, effect, or association.

A nonsignificant result of a superiority test merely means that there is insufficient evidence to reject the null hypothesis and claim a difference, effect, or association. Importantly, however, it does not exclude clinically important differences, and thus does not imply that treatments (or whatever exposure is being studied) have equivalent effects on (or association with) some outcome.2 

Rather than focusing on only statistical significance, we would like to highlight the importance of considering the CI of the effect size estimate when interpreting study results.3  Inferences from a study should apply to the population from which the data were sampled, not to the sample itself, and the CI provides a range of plausible estimates of what the “true” effect (or association) could be in that population. More formally, the CI contains the true population parameter in a fixed percentage of cases with repeated sampling, and this fixed percentage—often arbitrarily set at 95%—is termed the confidence level. This means that if a study was to be repeated indefinitely under the same conditions, each time with a new sample from the same population, and if the 95% CI for the treatment effect were computed each time, 95% of the varying CIs would contain the unknown true population parameter. In contrast to common belief, this does not mean that there is a 95% probability that any particular CI contains the true population parameter. For example, in the study by Colquhoun et al., the confounder adjusted odds ratio was 0.86 with a 95% CI of 0.56 to 1.32. Let us assume that the true odds ratio that authors aim to estimate (which, of course, is actually unknown) is, say, 0.9. Now, it does not make any sense to say that there is a 95% probability that 0.9 falls within the range between 0.56 and 1.32. The CI either contains the unknown true population value or does not, and the probability of containing this value is therefore either 1 or 0. In practice, it is generally unknown whether or not a particular CI contains the population parameter of interest. However, the vast majority of 95% CIs do contain the parameter, and thus, there is a good reason to believe that a particular 95% CI estimated in a study “likely” (even though we cannot assign a specific probability) includes the population parameter.

While statistical significance and CIs are often thought of as two distinct entities, they are actually closely related: when the 95% CI contains the null-hypothesis value of no effect (or no association)—for example, 1 for an odds ratio or 0 for a mean difference—the data are compatible with the lack of an effect, and a corresponding hypothesis test at a 100% − 95% = 5% (0.05) significance level would be “nonsignificant” (i.e., would result in a P > 0.05).4  However, being compatible with the lack of an effect can still mean also being compatible with clinically important effects, as the entire CI needs to be considered. Returning to the study by Colquhoun et al., the CI suggests that the odds of postoperative pulmonary complications could plausibly be somewhere between 44% lower and up to 32% higher in one treatment group compared to the other. This is a difference that most clinicians would probably consider clinically relevant, and thus, the study does not demonstrate the lack of a clinically important association in either direction. The same is true in many other articles reporting nonsignificant study results. We believe it is important that authors and readers are aware that absence of evidence must not be confused with evidence of absence. A nonsignificant difference between treatment groups should be interpreted and reported in terms of insufficient evidence to reject the null hypothesis, but neither demonstrates the lack of a difference, nor the equivalence of treatments.

The authors declare no competing interests.

1.
Colquhoun
DA
,
Leis
AM
,
Shanks
AM
,
Mathis
MR
,
Naik
BI
,
Durieux
ME
,
Kheterpal
S
,
Pace
NL
,
Popescu
WM
,
Schonberger
RB
,
Kozower
BD
,
Walters
DM
,
Blasberg
JD
,
Chang
AC
,
Aziz
MF
,
Harukuni
I
,
Tieu
BH
,
Blank
RS
:
A Lower tidal volume regimen during one-lung ventilation for lung resection surgery is not associated with reduced postoperative pulmonary complications.
Anesthesiology
.
2021
;
134
:
562
76
2.
Schober
P
,
Vetter
TR
:
Noninferiority and equivalence trials in medical research.
Anesth Analg
.
2020
;
131
:
208
9
3.
Schober
P
,
Bossers
SM
,
Schwarte
LA
:
Statistical significance versus clinical importance of observed effect sizes: What do P values and confidence intervals really represent?
Anesth Analg
.
2018
;
126
:
1068
72
4.
Schober
P
,
Vetter
TR
:
Confidence intervals in clinical research.
Anesth Analg
.
2020
;
130
:
1303