We thank Drs. Brodner and Van Aken for their interest in our study of the Prospective Evaluation of a RIsk Score for postoperative pulmonary COmPlications in Europe (PERISCOPE)1 and for giving us the opportunity to extend the discussion of perioperative risk assessment through this correspondence. They have questioned the evidence for the generalizability of the score developed in the Assess Respiratory Risk in Surgical Patients in Catalonia (ARISCAT) study2 based on two issues. One is the different mortality rates in the PERISCOPE cohort and the larger cohort of the European Surgical Outcomes Study (EUSOS).3 The second issue is the external validation process used to explore the utility of the ARISCAT score in wider European settings, by applying it to the PERISCOPE sample and subsamples. We will comment on these two issues separately.
First, we point out that the question of postoperative mortality as reported after EUSOS3 has been discussed in correspondence between Drs. Brodner and Van Aken4 and the EUSOS authors.5 Thus, the need for caution before assuming that 4% is the true incidence of postoperative mortality in Europe has already been covered, and it has been pointed out that the heterogeneity of countries and hospitals and the differences in the sample sizes contributed by each of the EUSOS centers account for the mortality observed and the dispersion of rates.
Neither the PERISCOPE1 nor EUSOS3 mortality rates of 0.9% and 4%, respectively, should be interpreted as representative of any particular population because both cohorts were convenience samples rather than random population-based ones. This aspect of design, however, does not mean that either study is biased or inaccurate. We note here that other large series6,7 have recorded postoperative mortality rates very close to ours. In any case, we emphasize that neither mortality (as an outcome) nor other variables that were not predictors in the ARISCAT score2 can be discussed as central concerns in the context of the PERISCOPE validation study. We only contribute these observations to reflect on the profiles of the two European samples. Even if the EUSOS mortality rate, found in a larger population, could somehow be considered a definitive standard, or reference figure, it would still be entirely valid to perform an external validation of a predictive model for complications in a population with a different mortality profile from the EUSOS cohort’s. External validation is a dynamic process in which an understanding of performance in different settings progressively increases confidence in a score’s generalizability or clinical reliability. If we find a score is unhelpful, we will know we need to learn more.
Our last comment regarding the issue of comparison to EUSOS3 is that other than similar reliance on convenience cohorts in that study and ours, we cannot agree that the two designs were similar. However, as mentioned above, we do not wish to go into extensive detail in comparing the studies, so suffice it to say that the primary outcome of our study was the presentation of a postoperative pulmonary complication not mortality.
Next, Drs. Brodner and Van Aken make certain affirmations about internal and external validity that do merit discussion here because we would not wish readers to be misled. It is wrong to argue that a finding of excellent internal validity, such as the ARISCAT2 score showed, represents a limitation to external validity. It is precise when a predictive model has shown internal validity that its generalizability is worth exploring externally.8 It is true that the discrimination and calibration of models are usually optimistic in their development sample, but this is the very reason why they should then undergo external validation, which might support or rule out transportability. Following recommendations from specialists in the field,8–10 we used rigorous collection and analysis methods in the PERISCOPE study, whose design was praised in an editorial in this journal.11 We take this opportunity to express our thanks for that praise, but to have done otherwise than control the design carefully would certainly have led to confusing results.
The concluding hypothesis of Drs. Brodner and Van Aken, that increased attention to measuring oxygen saturation may have helped to reduce mortality in the PERISCOPE1 cohort, is attractive but we cannot, of course, confirm it based on our data. We think it might be a strategy worth studying in an appropriate clinical trial, however.
Finally, we want to emphasize that, in our opinion, the greatest strength of our study lies in the replication itself, which is an essential and often overlooked procedure to verify the validity of a predictive model. We agree with Eisenach and Houle11 that reproducibility, replication, and generalizability are like the rungs on a ladder. However, perhaps because of their complexity, understanding, and implementing, the results of risk modeling often seem to be a rocky road.
The authors declare no competing interests.