To the Editor:
Joosten et al. are to be congratulated on their deployment of technically complex closed-loop systems to support patients during anesthesia and surgery.1 The possibility of experiencing impaired neurocognitive function in association with a surgical episode is a concern to patients and those who care for them. It makes sense to establish whether changes in clinical technologies might diminish or abolish these unwelcome syndromes.
Nevertheless, we have concerns about the Primary Outcome Measure and its analysis.
On clinical trials.gov (https://clinicaltrials.gov/ct2/show/NCT03148730) the Primary Outcome Measure is “Incidence of postoperative cognitive dysfunction.” This implies a definition of postoperative cognitive dysfunction. The authors chose the Montreal Cognitive Assessment score (maximum 30), so we need to consider what a meaningful change is. Reduction by a single point is very unlikely to be clinically significant and certainly does not represent a reduction of more than 1 SD less than normative published data.2 A decline of two points? Five points? Falling from more than 26 to less than 26? Each patient either does or does not have postoperative cognitive dysfunction and the incidence would be the proportion of patients with the condition in each of the two treatment groups; then we can compare the incidence after control and closed-loop treatments.
In the article, the primary outcome was the change of the cognition score. This is an important alteration because incidences, group differences, and individual change are not the same thing. The authors have concluded there is a difference in cognitive outcome based on a screening test (the Montreal Cognitive Assessment) and ignored the results of the more robust cognitive assessment tools that returned no difference between groups.
Not all the primary outcome data is shown. The Montreal Cognitive Assessment has been treated as parametric (normally distributed) in the power calculation, but is (partially) reported as nonparametric in the results. Baseline scores are set out in table 1. We looked for, but cannot find, any summary of the data after baseline; the values at 1 week and 90 days are not reported. Instead we are given what is probably the median (it is not defined) and the interquartile ranges of the change from baseline. It would be helpful to see the raw data perhaps presented as a scattergram with lines to show the individual trajectories. In addition, summary statistics (i.e. the median and interquartile ranges of the scores at 1 week and 90 days for each of the treatment groups) would be helpful.
Regarding the analysis and statistical significance, we note the 95% CI of the differences include zero at both 1 week and at 3 months? How can these results be significant?
In addition the post hoc sensitivity analysis showed no difference between the treatment groups when the absolute values were analyzed. What was the reason for the decision to use change in scores from baseline as the primary analysis? Was any statistical correction made for multiple testing? Bonferroni correction or similar?
The abstract will be widely read. The conclusion is not based on the prespecified primary outcome measure. In addition, none of 18 secondary outcome measures defined on clinicaltrials.gov were reported (understandable because of space constraints). However, the authors include three additional measures of which two are similar to, but not the same as, those prespecified.
Overall we are worried that Joosten et al. have overstated their findings. An alternative interpretation is that the high-tech technique made no difference to outcome.
The authors declare no competing interests.