To the Editor:
We read with interest the article by Sonny et al.1 comparing two methods for frailty measurement in the ability to predict hospital length of stay after noncardiac surgery. Assessments of frailty have been relevant in surgical outcomes research since milestone work by Makary et al.2 With more than 60 instruments to measure frailty currently available and no consensus on how to integrate frailty measures into perioperative management, we agree with the authors that assessing comparative predictive accuracy between different frailty instruments is timely and clinically relevant.
It is not surprising, however, that the two measures selected (the phenotypic Hopkins Frailty Score and a modified deficit accumulation score [i.e., frailty index]) in the study of Sonny et al.1 demonstrated large error in the prediction of prolonged hospitalization across a heterogeneous group of patients undergoing noncardiac surgery. Many studies evaluating perioperative prediction models have consistently shown substantial challenges in accurately estimating prolonged hospitalization. Indeed, of routinely collected outcomes, hospital length of stay has arguably the highest variation and most substantial contribution from indirect patient predictors (e.g., social status, home supports, availability of allied health services). Neither of the frailty instruments tested by Sonny et al.1 capture these factors and are therefore unlikely to accurately predict or explain a substantial degree of variance in prolongation of stay.
Despite the clear importance of the authors’ efforts to compare frailty instruments, the choice of these two frailty tools, each of which lacks multidimensionality, likely contributes to the low predictive accuracy reported. As the authors rightly point out, frailty instruments differ in their estimation of a frailty score based on the use of a phenotypic model (requiring prospective ascertainment and direct patient evaluation) or an accumulation of deficits model of frailty (amenable to medical record evaluation, which can be conducted retrospectively or in real time). The phenotype model, developed by Fried et al.3 and adapted in various operationalizations, including the Hopkins Frailty Score (used by Sonny et al.1 ), provides an objective assessment of biologic manifestations of frailty. In the phenotype model, changes and dysregulation present at the cellular and subcellular level are expressed through means that are primarily physical in nature (i.e., decreased energy, strength, gait speed, body mass, and activity levels); this means that the phenotype model does not directly include cognitive or social deficits, both of which are agreed upon by experts to be essential components of the frailty syndrome4 and both of which are likely to substantively influence a patient’s length of hospital stay.
The deficit accumulations model, originally developed by Mitnitski et al.,5 has also been adapted in many forms yet comes with clear guidelines for robust derivation of a frailty index. This guidance includes the need to measure 30 or more deficits that exist across multiple domains (e.g., cognitive, medical, psychosocial, physiologic).6 The Rockwood frailty index is robust, likely reflecting redundancy and strong interrelationships between the different elements that make up the model. Unfortunately, the modified frailty index (initially derived for use with the National Surgical Quality Improvement Program administrative database) contains only 11 deficits, 10 of which are specifically medical diagnoses. Although the reduced number of variables increases measurement ease and broad implementation, the additional variables that are missing within the modified frailty index compared to a more traditional Rockwood frailty index likely contribute to risk for perioperative adverse outcomes, particularly in surgical populations with relatively low levels of frailty (~20% in the study by Sonny et al.1 ) This makes the modified frailty index more closely aligned to a condensed comorbidity index (e.g., Charlson index) than a true multidimensional measure of frailty.
Additionally, frailty status assigned by both tools was applied in a dichotomous manner (with a cutoff of either 3 or higher [in the text] or 4 or higher [in Table 3] for the Hopkins Frailty Score and 4 or higher for the modified frailty index). While we agree that a dichotomized frailty assignment is often used for risk stratification, if the objective of a study is to determine the predictive accuracy of a given scale, it is well demonstrated that dichotomization of a predictor variable can lead to decreased predictive performance compared to continuous representations such as regression splines or polynomials.7
Last, and perhaps most salient to the study by Sonny et al.,1 type of surgery is likely the most important predictor of complications and length of stay. While the study’s primary outcome, prolonged length of stay, was determined based on the difference between actual length of stay and hospital- and surgery-specific expected length of stay (which provides some degree of procedural adjustment), this approach may also downwardly bias measures of accuracy. The study was performed in the period from 2015 to 2016, whereas expected length of stay data were based on historical trends from 2010 to 2015. Poor temporal transportability of prediction models is well documented. It would be interesting in future studies to better understand whether actual versus predicted total length of stay, adjusted for procedure as a covariate in a temporally contemporaneous cohort, might yield better performance when assessing the accuracy of prediction tools. Especially if, as in the current study, procedures with typically shorter lengths of stay like orthopedic surgeries are found more than two times more common in the group with frailty (introducing confounding bias).
We commend the authors for highlighting the need to identify a best method for perioperative frailty estimation to aid implementation of frailty measurement more effectively into clinical practice. More importantly, however, we believe that the failure of the two frailty instruments studied in predicting prolonged hospitals stays, readmissions, and 30-day complications, should be interpreted with caution. We appreciate this recent feature article for highlighting the need to advance research on the risk prediction ability of frailty by type of surgery in anesthesiology. The gold standard for risk prediction in anesthesiology is yet to be determined, and tools will likely need to be more granular and stratified by type of surgery.
The authors declare no competing interests.