A barrier to routine preoperative frailty assessment is the large number of frailty instruments described. Previous systematic reviews estimate the association of frailty with outcomes, but none have evaluated outcomes at the individual instrument level or specific to clinical assessment of frailty, which must combine accuracy with feasibility to support clinical practice.
The authors conducted a preregistered systematic review (CRD42019107551) of studies prospectively applying a frailty instrument in a clinical setting before surgery. Medline, Excerpta Medica Database, Cochrane Library and the Comprehensive Index to Nursing and Allied Health Literature, and Cochrane databases were searched using a peer-reviewed strategy. All stages of the review were completed in duplicate. The primary outcome was mortality and secondary outcomes reflected routinely collected and patient-centered measures; feasibility measures were also collected. Effect estimates were pooled using random-effects models or narratively synthesized. Risk of bias was assessed.
Seventy studies were included; 45 contributed to meta-analyses. Frailty was defined using 35 different instruments; five were meta-analyzed, with the Fried Phenotype having the largest number of studies. Most strongly associated with: mortality and nonfavorable discharge was the Clinical Frailty Scale (odds ratio, 4.89; 95% CI, 1.83 to 13.05 and odds ratio, 6.31; 95% CI, 4.00 to 9.94, respectively); complications was associated with the Edmonton Frail Scale (odds ratio, 2.93; 95% CI, 1.52 to 5.65); and delirium was associated with the Frailty Phenotype (odds ratio, 3.79; 95% CI, 1.75 to 8.22). The Clinical Frailty Scale had the highest reported measures of feasibility.
Clinicians should consider accuracy and feasibility when choosing a frailty instrument. Strong evidence in both domains support the Clinical Frailty Scale, while the Fried Phenotype may require a trade-off of accuracy with lower feasibility.
Preoperative frailty has been associated with adverse postoperative outcomes
It remains unclear which frailty scale is the best predictor of adverse postoperative outcomes
This meta-analysis of 45 articles identified that specific frailty scales may be better predictors for some adverse outcomes when compared to others
The Clinical Frailty Scale was most strongly associated with mortality and discharge not to home
The Edmonton Frail Scale was a better predictor of complications
The Frailty phenotype was most strongly associated with postoperative delirium
Frailty is a state of increased vulnerability to adverse health outcomes that results from accumulation of age- and disease-related deficits.1,2 Since 2009, there has been a rapid accumulation of evidence demonstrating that the presence of frailty before surgery is associated with a more than two-fold increase in the odds of dying or experiencing a complication after surgery, along with increased risk of delirium, development of new disability, and increased resource use.3–6
In recognition of the important role that frailty plays in predicting adverse outcomes in older surgical patients, numerous guidelines recommend that frailty be assessed routinely before surgery. These statements come from multidisciplinary and international societies, including the American College of Surgeons (Chicago, Illinois) and American Geriatrics Society’s (New York, New York) Optimal Preoperative Assessment of the Geriatric Surgical Patient guidelines,7 the Association of Anesthetists’ of Great Britain and Ireland’s (London, United Kingdom) Perioperative Care of the Elderly Guidelines,8 and the Society for Perioperative Assessment and Quality Improvement’s (Glenview, Illinois) Perioperative Management of Frailty guidelines.9 However, to date, evidence suggests that frailty assessments are not part of routine preoperative practice in most settings.10 Multiple barriers to routine preoperative frailty assessment likely exist. One clear barrier is the large number of heterogenous frailty instruments described in the literature, reflecting a lack of consensus among experts in frailty assessment.11 While multiple systematic reviews have estimated the strength of association between frailty and a variety of adverse outcomes,3–5 the common approach to analysis has been to combine all frailty instruments together to provide a single pooled estimate of association. This approach precludes the opportunity to compare different frailty instruments in terms of their ability to predict important patient- and system-level outcomes. Feasibility is likely another barrier; clinicians are unlikely to adopt an instrument that requires substantial time or resource to operationalize in practice.12 However, reviews have not considered or synthesized data regarding the feasibility of different frailty instruments in clinical practice. Ultimately, clinicians will need to combine information about accuracy and feasibility to guide decisions about what frailty instrument to use in their clinical setting to ensure that best practices are being applied to the care of this high-risk group of older surgical patients.
To provide robust comparisons between frailty instruments, we performed a systematic review and meta-analysis of studies that prospectively assessed frailty status in preoperative clinical practice. Our objectives were to assess and compare the ability of well-studied frailty instruments to predict important post-operative outcomes, while also synthesizing available data about the feasibility of these instruments.
Materials and Methods
This systematic review and meta-analysis was conducted after best practice recommendations, including the Meta-analysis of Observational Studies in Epidemiology guidelines13 and the Cochrane Collaboration handbook.14–17 Before conducting the review, we registered a study protocol with the International Prospective Registry of Systematic Reviews (CRD42019107551). The results are reported in keeping with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement.18
Data Sources and Searches
A comprehensive search strategy was developed in consultation with an information specialist, informed by previous systematic reviews related to frailty and perioperative outcomes.15,16,19,20 The strategy then underwent the peer review of electronic search strategy checklist by a second independent information specialist.21 A copy of the search strategy is included in Supplemental Digital Content, table 1 (http://links.lww.com/ALN/C333). The search strategy was applied to Medline, Excerpta Medica Database, the Cochrane Library, and the Comprehensive Index to Nursing and Allied Health Literature databases, with each searched from inception to November 18, 2018. The reference lists of related systematic reviews as well as included articles were searched by hand to identify other studies that may have been missed by the initial search. No language restrictions were applied.
Eligible studies were included if they: (1) studied a population of surgical patients greater than or equal to 18 yr; (2) included an explicitly described frailty instrument applied prospectively in a clinical encounter before surgery; and (3) reported relevant outcomes and the association of frailty with outcomes.
Study outcomes were informed by a core outcome set for older people,22 as well as routinely reported perioperative outcomes. Mortality (in-hospital or within 30-days) was our primary outcome; complications, discharge disposition, delirium, length of stay, and measures of function or disability were secondary outcomes. Our primary focus was on effect sizes as these were the measures of predictive ability routinely reported across studies. We also collected other formal measures of predictive accuracy reported (e.g., discrimination, calibration, sensitivity, specificity, predictive values, likelihood ratios, explained variance, improvement in model fit). To address our second objective, we collected relevant feasibility outcomes as defined by Bowen et al. (acceptability, implementation, and practicality).23
Studies were excluded if they: (1) included mixed populations with less than 50% of surgical patients; (2) the frailty instrument was solely applied to electronic data (e.g., electronic health records, administrative data, registries); (3) frailty status was based on comprehensive geriatric assessment only (as this approach is specific to geriatric medicine physicians and not widely available before surgery24 ); and (4) frailty status was based on single laboratory or imaging results (e.g., sarcopenia and hypoalbuminemia, tests that represent separate, although, related conditions that are not equivalent to the multidimensional nature of frailty11,25 ). No other restrictions were placed on frailty instrument definitions. We considered minimally invasive cardiac valve procedures as surgical, however, we did not consider coronary artery interventions (angiograms, angioplasty, stenting) as surgical procedures (as anesthesiologists are not routinely involved). Conference abstracts or sources of grey literature were not included as methodologic descriptions would be inadequate to assess study quality and risk of bias. Case studies and case series were also excluded, as these studies lacked comparison of people with frailty to people without.
Data Extraction and Quality Assessment
Duplicate assessment of titles and abstracts was performed by independent reviewers. Studies classified as “yes” or “unsure” were advanced to full text review; agreement between both reviewers was required for exclusion. Full text review was also completed in duplicate by independent reviewers. Any uncertainties or conflicts were resolved by consensus in discussion with lead authors (S.A., D.M.). Data extraction was then performed using a form specifically designed for this study; this included quantitative and qualitative feasibility data. The form was piloted by two senior investigators before full implementation, and the first eight studies extracted by each investigator were reviewed with a senior author before proceeding with full data extraction. Data was extracted by two reviewers and independently checked for accuracy by the first author (S.A.). Study authors were contacted as required to request missing or incomplete data, or to clarify methods or findings. All stages of the review were completed using DistillerSR (Evidence Partners, Canada).26
Risk of bias was analyzed using the Quality in Prognostic Studies tool.27 Risk of bias was assessed independently for each study by two team members, with at least one review by a lead author. Uncertainties and disagreements were resolved by consensus in discussion with lead authors.
Data Synthesis and Analysis
Study results were pooled according to the specific type of frailty instrument used; modified versions of instruments were classified with the original version (e.g., Fatigue, Resistance, Ambulation, Illnesses, Loss of weight [FRAIL] Scale included with Frailty Phenotype group). Studies assessing physical measures of frailty, such as gait speed, handgrip strength and Short Physical Performance Battery were pooled together. Studies that reported data for more than one frailty instrument contributed data to each applicable class of frailty instrument in the meta-analyses (e.g., if a study reported on the Frailty Phenotype and Clinical Frailty Scale, the study would contribute data to both meta-analyses).
Recognizing that many studies would use differing cut-offs and categorizations of frailty instrument scores to define frailty exposure, we pre-specified that we would pool the non-frail or lowest frailty score category as the reference group, and for the comparator group with frailty: (1) the group specified with frailty for studies using a binary exposure; or (2) the group specified with moderate frailty for studies with a multi-category frailty exposure.
Data analyses were completed using Comprehensive Meta-Analysis (Biostat, USA).28 We prespecified the use of random effects models using Dersimonian and Laird inverse variance weighted meta-analyses. These models were used to generate pooled odds ratios for binary outcomes and standardized mean differences for continuous outcomes based on unadjusted effect sizes, event rates, or measures of central tendency and variance from each included study. Unadjusted data were used as clinical frailty assessment is typically employed as a risk stratification tool, as opposed to as part of a multivariable risk model (which are not routinely operationalized in preoperative clinical practice29 ). A random effects approach was chosen to allow for expected heterogeneity across studies; epidemiologic and content knowledge would suggest that data collected from different surgical specialties and procedures would not meet the assumptions of fixed-effects meta-analysis.
Meta-analysis was performed by frailty instrument when more than two studies with appropriate outcome data were available. Where inadequate data was available to support a meta-analysis (including formal measures of predictive accuracy), results were narratively synthesized. Where medians and interquartile ranges were reported for continuous outcomes, means and standard deviations were calculated using the methods of Wan et al.30 Heterogeneity was assessed using the I2 statistic (although no analytic decisions were made based on measures of heterogeneity); where the I2 statistic exceeded 75% for primary outcome analyses we assessed for sources of heterogeneity. A two-tailed, 5% significance level was used for all analyses.
Two meta-regression analyses were carried out for mortality. The first evaluated whether there was evidence of effect modification by frailty instrument when all frailty instruments presenting data from more than two studies were combined. The second was an exploratory analysis to specifically determine if there was a difference in the association of the Clinical Frailty Scale versus the Fried Phenotype in predicting mortality (these were the two most studied instruments for this outcome).
Feasibility data were synthesized using directed content analysis.31 We used Bowen et al.’s feasibility framework to identify key coding categories.23 Categories included aspects of acceptability (i.e., satisfaction, intention to continue use, perceived appropriateness), implementation (i.e., degree of successful execution, resources needed to implement, factors affecting implementation), and practicality (i.e., ease of use, efficiency/speed, costs, positive/negative effects on users or targets). Coding of extracted data was performed by the first and senior authors (S.A., D.M.). Along with coding within categories, we determined whether the available data were positively, negatively, or neutrally supportive of an instrument’s feasibility, as well as whether the supporting information was based on objective (i.e., purposely and/or quantitatively measured) or subjective (i.e., described without supporting assessment or measurement) data.
We identified 985 titles and abstracts; after removing three duplicates, we reviewed 982 (fig. 1). We assessed 338 full-text articles and included 70 studies. Together, the included studies involved 42,954 participants and were published between 2009 and 2018. Regions of origin included North America, Europe, Australia, Asia, and South America. Full details of included studies are provided in table 1.
Frailty Instruments and Classifications
Frailty was defined using 35 different instruments. The most prevalent was the Fried Phenotype or related modifications (32 studies),4,6,32–61 followed by the Clinical Frailty Scale (12 studies),6,55,56,59,62–68 a physical measure of frailty (gait speed, timed get up and go, handgrip strength, short physical performance battery; 12 studies),48,52,56,58,59,69–74 the Frailty Index (nine studies),32,64,75–81 the Edmonton Frail Scale (seven studies),53,67,82–86 or a measure of function or disability (Katz Instrumental Activities of Daily Living, Activities of Daily Living, Eastern Cooperative Group Performance States, self-reported mobility assessment; four studies).48,71,73,87 Other instruments were reported in two or fewer studies.40,59,69,71,73,78,88–98
Dichotomization of a frailty instrument was the most common approach to assessing frailty (29 studies [41%]), while 23 studies (33%) categorized frailty into three levels and nine studies categorized it into four or more levels (13%). A continuous measure of frailty was used in four of studies (6%). Four studies reported data for patients in the severely frail category.40,62,65,85
Surgical and Patient Populations
Mixed surgical procedures were the most commonly studied populations (20 studies [29%]), followed by general and cardiac surgery (17 studies each [24%]), orthopedics (seven studies [10%]), urology (four studies [6%]) and vascular (three studies [4%]); single studies from otolaryngology, gynecology, and thoracics were also included. Average study population age ranged from 50 to 89, and the proportion of female patients ranged from 25 to 100%.
Thirty-two studies (n = 34,949) reported outcome data for in-hospital or 30-day mortality (table 2).35,37,39,43,48–50,54,56,59,61–63,65–68,70,72,73,76,78,79,85,88,91–93,96,97,99 Seven studies reported outcomes for more than one frailty instrument.48,56,59,67,73,78,91
The Fried Phenotype was examined in the largest number of studies (10, n = 2,022).35,37,39,43,48–50,56,61,70 Across studies, frailty based on the Fried Phenotype was associated with mortality (odds ratio, 3.95; 95% CI, 2.00 to 7.81; P < 0.0001; I2= 0; fig. 2A). The next most studied instrument was the Clinical Frailty Scale (six studies; n = 7,793),59,62,63,65,66,68 which was also associated with mortality (odds ratio, 4.89; 95% CI, 1.83 to 13.05; P = 0.002; I2= 75.3; fig. 2B), with the largest effect size of any instrument. Heterogeneity appeared to be attributable to the three studies of the Clinical Frailty Scale in cardiac surgery59,66,68 (I2 in cardiac studies, 89%; noncardiac, 0%; pooled odds ratio cardiac, 4.59; 95% CI, 1.19 to 17.69; pooled odds ratio noncardiac, 4.64; 95% CI, 1.30 to 16.60). Physical measures of frailty (gait speed, short physical performance battery) were evaluated in three studies, but involved the largest total number of patients (n = 15,429)48,59,70,72 and reported the smallest pooled effect size (odds ratio, 3.21; 95% CI, 2.37 to 4.36; P < 0.0001; I2 = 0; fig. 2C).
There were insufficient data for meta-analysis of other frailty instruments; however, all but one study78 that investigated the relationship between a frailty instrument and mortality found a directionally consistent association where mortality was more common in people with frailty. These results are reported in table 2.
The meta-regression for mortality across frailty instruments demonstrated no evidence of a significant effect modification by instrument (P = 0.545). There was no evidence that the Clinical Frailty Scale had a stronger association with mortality than the Fried Phenotype (meta-regression odds ratio, 1.16; 95% CI, 0.35 to 3.82; P = 0.807).
Five studies reported other measures of predictive accuracy for mortality (Supplemental Digital Content, table 2, http://links.lww.com/ALN/C333). As individual predictors, the G8 Screening tool, Edmonton Frail Scale, and Risk Analysis Index were weakly to moderately discriminative (area under the receiver operating characteristic curve [AUC] approximated 0.7 for each instrument).96,97 When added to the Society of Thoracic Surgery (Chicago, Illinois) and EuroScore II cardiac surgery risk models, frailty instruments demonstrated improvements in model discrimination.59,68
Fifty studies reported outcome data for postoperative complications (total n = 31,408; Supplemental Digital Content, table 3, http://links.lww.com/ALN/C333).30–34,36–39,41–43,45–48,51,53–56,58,60,63,64,66,69,71,73–76,80–82,84,86,88,89,91,94–101 Eight studies reported postoperative complication data for more than one frailty instrument.32,40,48,53,56,58,67,78 Thirty-eight effect sizes were included in a meta-analysis. The definition of a postoperative complication varied between studies. Some studies used the Clavien-Dindo Classification of Surgical Complications104 or the National Surgical Quality Improvement Program definition.105 Others included complications that were relevant to the surgical procedure (e.g., sternal wound infection or delayed graft function for kidney transplant).
Again, the Fried Phenotype was the most studied instrument (22 studies; n = 4,250)30–34,36–43,45,47,48,51,53,54,56,58,85 and was associated with complications (odds ratio, 2.47; 95% CI, 2.00 to 3.04; P < 0.0001; I2 = 9.4; Supplemental Digital Content, fig. 1A, http://links.lww.com/ALN/C333). The Edmonton Frail Scale (five studies; n = 510)53,82–84,86 and Frailty Index (five studies; n = 1,072)64,76–78,106 were the next most studied, with both the Edmonton Frail Scale (odds ratio, 2.92; 95% CI, 1.52 to 3.46; P = 0.001; I2 = 54.0; Supplemental Digital Content, fig. 1B, http://links.lww.com/ALN/C333) and Frailty Index (odds ratio, 2.29; 95% CI, 1.52 to 5.65; P < 0.0001; I2 = 61.1; Supplemental Digital Content, fig. 1C, http://links.lww.com/ALN/C333) significantly predicting complications. Three studies evaluated the Clinical Frailty Scale’s association with complications (n = 519)62,63,66 and found a directional, but nonsignificant, association with complications (odds ratio, 1.68; 95% CI, 0.95 to 2.95; P = 0.073; I2 = 73.2; Supplemental Digital Content, fig. 1D, http://links.lww.com/ALN/C333). Physical measures of frailty, namely gait speed, were also evaluated in five studies and (n = 15,750)62,63,66 and were associated with complications (odds ratio, 1.98; 95% CI, 1.47 to 2.68; P < 0.0001; I2 = 35.8; Supplemental Digital Content, fig. 1E, http://links.lww.com/ALN/C333). All studies evaluating other frailty instruments and complications also found that people with frailty had higher odds of complications.
Eighteen studies reported other measures of predictive accuracy for complication outcomes (Supplemental Digital Content, table 2, http://links.lww.com/ALN/C333). Six studies provided data on the Fried Phenotype, which had weak to moderate discrimination (AUC, 0.60 to 0.76);36,39,40,46,49 three studies of the Edmonton Frail Scale demonstrated weak discrimination (AUC, 0.65 to 0.69).53,67,84 The Frailty Index demonstrated moderate to strong discrimination (AUC, 0.71 to 0.82).88,98,107 All frailty instruments tested improved discrimination and/or explained variance when added to existing multivariable models.
Twenty-five studies reported discharge disposition data (Supplemental Digital Content, table 4, http://links.lww.com/ALN/C333), defined as new admission to nursing home, transitional care, or rehabilitation facility (total n = 6,558);6,32,33,35,37,39,40,46,48,55,57,58,62,63,70,76,77,83–85,92,94,95,101,102 six studies reported outcome data for more than one instrument.6,32,40,48,55,58
Eleven studies included the Fried Phenotype4,6,32,33,35,37,39,46,48,53,55,57–59 (n = 3,202; odds ratio, 5.18; 95% CI, 3.34 to 8.03; P < 0.0001; I2 = 56.4; Supplemental Digital Content, fig. 2A, http://links.lww.com/ALN/C333), five studies included the Clinical Frailty Scale6,55,62,63,101 (n = 1,186; odds ratio, 6.31; 95% CI, 4.01 to 9.36; P < 0.0001; I2 = 0; Supplemental Digital Content, fig. 2B, http://links.lww.com/ALN/C333), four studies included the Frailty Index32,76,77,90 (n = 1,141; odds ratio, 2.29, 95% CI, 1.52 to 3.46, P = 0.006, I2 = 78.9; Supplemental Digital Content, fig. 2C, http://links.lww.com/ALN/C333), and three studies included physical measures of frailty48,58,72 (n = 15,429; odds ratio, 3.94; 95% CI, 2.49 to 6.24; P < 0.001; I2 = 0; Supplemental Digital Content, fig. 2D, http://links.lww.com/ALN/C333). Again, all other studies also found increased odds of nonfavorable discharge in people with frailty.
Five studies provided other measures of predictive performance for adverse discharge outcomes (Supplemental Digital Content, table 2, http://links.lww.com/ALN/C333).6,40,46,87,94 The Fried Phenotype demonstrated moderate to strong discrimination (AUC, 0.78, 0.83) and increased discrimination when added to the American Society of Anesthesiologists, Lee, and Eagle scores. The Clinical Frailty Scale was more sensitive (80 vs. 67%), but less specific (61 vs. 66%) than the Fried Phenotype when identifying older people who were not discharged home after surgery.
Twelve studies reported outcome data for postoperative delirium (total n = 2,537; Supplemental Digital Content, table 5, http://links.lww.com/ALN/C333)35,40,41,44,53,56,61,69,82,84,89,92 and four of these reported outcome data for more than one frailty instrument.40,53,56,69 Eight effect measures were meta-analyzed, six for the Fried Phenotype (n = 594)35,41,44,56,61 and three (n = 307)53,82,83 for the Edmonton Frail Scale. Both the Fried Phenotype (odds ratio, 3.79; 95% CI, 1.75 to 8.22; P = 0.001; I2 = 27.6; Supplemental Digital Content, fig. 3A, http://links.lww.com/ALN/C333) and Edmonton Frail Scale (odds ratio, 2.11; 95% CI, 1.06 to 4.21; P = 0.034; I2 = 0; Supplemental Digital Content, fig. 4B, http://links.lww.com/ALN/C333) were associated with delirium. Three studies provided other measures of predictive accuracy (Supplemental Digital Content, table 2, http://links.lww.com/ALN/C333).40,69,89 The Fried Phenotype, Groningen Frailty Indicator, and Sinai Abbreviated Geriatric Evaluation were all moderately discriminative (AUC, 0.74, 0.77, and 0.70, respectively) when predicting delirium.
Adequate data were available for meta-analysis of length of stay for the Frailty Index, Clinical Frailty Scale, and Fried Phenotype, which were all associated with increased length of stay (standardized mean difference, 0.83, 0.54, and 0.38, respectively; Supplemental Digital Content, fig. 4A to 4C, and table 6, http://links.lww.com/ALN/C333). Ten studies reported postoperative functional outcomes (five Fried Phenotype, three Edmonton Frail Scale, and two Clinical Frailty Scale) with heterogeneity in outcome definitions and analyses precluding pooling. Frailty was typically associated with worse functional outcomes (Supplemental Digital Content, table 7, http://links.lww.com/ALN/C333).
Thirty-two studies reported aspects of feasibility.6,39,46,55,56,60,63–65,67,68,70,76–78,82–84,86,92,93 Findings are summarized in table 3 (full details in Supplemental Digital Content, table 8, http://links.lww.com/ALN/C333); only 19 of 50 available data points were based on objective data, compared to 31 of 50 that were subjectively supported. Overall, the Clinical Frailty Scale, Edmonton Frail Scale, Frailty Index, and Fried Phenotype had the largest amounts of available data. All available data positively supported the Clinical Frailty Scale, while the Edmonton Frail Scale and Frailty Index also had predominantly positive feasibility ratings, compared to the Fried Phenotype, where the majority of data did not support feasibility. Only one study directly compared objectively-measured feasibility ratings between frailty instruments, reporting that the Clinical Frailty Scale was easier to use (P < 0.0001), had fewer logistical and environmental barriers (P < 0.0001), and was faster to administer (P < 0.0001) than the Fried Phenotype.6 Time was the most frequent feasibility measure reported,4,6,46,56,60,64,67,77,78,82,84,86,92 ranging from 44 s for the Clinical Frailty Scale to 5 to 20 min for the Fried Phenotype. The need for additional equipment was identified as a significant barrier to the use of Fried Phenotype39 , and in some versions of the Frailty Index,77 while physical measurements were identified as barriers for assessment of emergency surgery patients.39,55,56,76,83 Missing assessment data was noted for the Frailty Index and Fried Phenotype,6,77 while difficulty with patient interpretation of questions was reported for the Edmonton Frail Scale.67
Risk of Bias
Risk of bias results according to the Quality in Prognostic Studies tool are reported in Supplemental Digital Content, table 9 (http://links.lww.com/ALN/C333). There was 96% between-rater agreement across all items and no disagreements were greater than +/- 1 level. The main contributor to high risk of bias were issues of confounding (typically because of combining surgical procedures or urgency categories) and unclear risk of bias was most commonly present due to a lack of reporting of study attrition.
In this systematic review and meta-analysis of 70 studies that reported the association of frailty, prospectively assessed in preoperative clinical settings, with postoperative outcomes, we found that the Clinical Frailty Scale had largest effect size when predicting postoperative mortality (our primary outcome), with a greater than 4.5-fold increase in the odds of death. Our findings also provide novel pooled effect estimates for mortality, complications, nonfavorable discharge, delirium, and impaired postoperative function, as well as measures of predictive accuracy, specific to five unique frailty instruments, which can directly inform preoperative frailty assessment. Our review has further identified that future studies should address not only effect sizes, but also considerations of feasibility and predictive accuracy, which were rarely reported. Available evidence currently supports the feasibility of the Clinical Frailty Scale over other frailty instruments for preoperative use.
Routine preoperative assessment of frailty in older adults has been recommended by best practice guidelines since 2012, when a joint guideline from the American College of Surgeons and American Geriatrics Society included frailty assessment as a key component of the optimal assessment of the older surgical patient. Similar recommendations have subsequently emerged from international and multidisciplinary organizations.9,108 However, low rates of guideline uptake may be related to a lack of evidence comparing different frailty instruments in perioperative medicine,12 including systematic reviews that have examined the relationship between preoperative frailty and postoperative outcomes by pooling all frailty exposures into a single effect.4,16,17,19,20,109,110 In contrast, our study was designed to allow comparison between instruments by separately pooling data from adequately studied frailty tools (i.e., more than two studies of a given instrument) and summarizing measures of predictive accuracy and feasibility. This allows readers to directly assess the strength of association and accuracy for each instrument across outcomes of key importance, while also informing decisions about actual use of different frailty instruments in routine clinical practice.
Based on our results, it appears that frailty measured using the Clinical Frailty Scale has the strongest association with postoperative mortality, although the significance of this difference compared to other instruments is uncertain. Therefore, from a strength of association perspective the Fried Phenotype could also receive strong consideration from clinicians. The Fried Phenotype had the largest amount of available data, was strongly associated with mortality, and demonstrated low heterogeneity between studies. However, time requirements and other logistical considerations suggest that the Fried Phenotype is less feasible that the Clinical Frailty Scale. Minimal data comparing the predictive accuracy of frailty instruments for mortality were available, although available data did not clearly identify a single instrument as highly discriminative, leaving this as an area for future research.
While we specified mortality as our study’s primary outcome, it is well recognized that outcomes beyond mortality are of key importance to older people and the healthcare system. When identifying older individuals at high risk of complications, which may occur in a majority of older people with frailty,5 both the Edmonton Frail Scale and the Fried Phenotype were associated with an approximately 2.9-fold increase in the odds of experiencing a complication, however, the Fried Phenotype had more available data and a lower degree of heterogeneity. The Frailty Index or Fried Phenotype appear to have the strongest discrimination when predicting complications. Older people strongly value maintenance of independence and prioritize getting home.22,111 Therefore, discussing risk of nonhome discharge is highly relevant before surgery where the strongest evidence supports the Clinical Frailty Scale, as it had the largest association with nonhome discharge and lower heterogeneity compared to the Fried Phenotype and Frailty Index. Delirium is also a priority outcome for older surgical patients; however, it was the least well-studied of our outcomes eligible for pooling. Here, the Fried Phenotype was more strongly associated than the Edmonton Frail Scale (odds ratio, 4.1 vs. 2.1). As highlighted by recent publications, however, new findings could still alter our conclusions.112 Therefore, frailty instrument selection decisions could be aided through accumulation of additional data for the Clinical Frailty Scale in terms of complications and for all instruments in terms of delirium and length of stay prediction. Overall, when considering strength of association and predictive accuracy in choosing a frailty instrument, clinicians and institutions may need to identify the outcomes (and related processes) that are of highest priority to them.
Future research evaluating the predictive accuracy of frailty instruments should also consider patient-centered and -reported outcomes. Despite identifying 70 studies of preoperative clinical frailty assessment, only 10 reported relevant function, quality of life or disability outcomes. Such data are key considerations for older people with frailty considering surgery and will be needed to support evidence-based interventions, such as shared decision making, that might help older people with frailty ensure that their medical decisions are congruent with their values and preferences.
Ultimately, adequate predictive accuracy is a foundational consideration when choosing a risk stratification tool. However, given the lack of undisputable evidence to support the accuracy of one instrument over another and a lack of consensus on frailty definitions in general,11 feasibility should be strongly considered. Many measures of feasibility exist, but in the preoperative setting we would suggest that acceptability, practicality and integration may be most relevant.23 Based on limited, but consistent data, the Clinical Frailty Scale appears to be the most feasible instrument among those routinely studied. In the area of practicality (e.g., need for resources and time) the Clinical Frailty Scale appears to be meaningfully faster than the Fried Phenotype and requires no extra tools or physical measurements of subdomain scoring. Currently, researchers can contribute by formally comparing objective measures of feasibility between leading instruments while implementation programs could scientifically evaluate the implementation process to inform future efforts. Comparisons of clinically applied instruments with instruments that can be applied via electronic data are also warranted, as electronic frailty indices could be a feasible manner to assess for frailty in certain settings and jurisdictions.113,114
Strengths and Limitations
This study should be appraised in terms of its strengths and limitations. Protocol preregistration and adherence to best-practice recommendations for systematic reviews supports a robust and low risk of bias review. Furthermore, application of a broad, peer-reviewed search strategy to medical and allied health databases ensured that the available literature was thoroughly included. We also considered a variety of outcomes known to be important to patients, clinicians, and the healthcare system, while considering feasibility to ensure that our findings can directly inform clinical practice. However, a lack of formal assessment of feasibility in many studies decreases the certainty of our findings. Furthermore, only the Fried Phenotype had at least five studies in each meta-analysis, which suggests that further studies for other frailty instruments could lead to changes for some estimates. Our decision to focus on unadjusted effect measures likely reflected typical use of frailty instruments in clinical practice and maximized available data; however, lack of confounder adjustment further limits the robustness of causal inference using observational data. Other measures of predictive performance (e.g., discrimination and calibration, among others) were inadequately reported to synthesize, hence, our analysis is largely based on strength of association, not formal prediction metrics.115 Reliability is also an important consideration when evaluating risk assessment tools, but was not considered in our review and was rarely reported in included studies; poor reliability can lead to attenuation of effect sizes116 and may be of particular importance for instruments that require subjective evaluation. Inadequate data were available to test the impact of potential effect modifiers, such as surgery type, on our pooled measures of association; lack of head to head comparisons precluded techniques such as network meta-analysis. Clinicians may also consider the underlying framework supporting different frailty instruments (e.g., physical manifestations of cellular dysfunction with the Fried Phenotype vs. accumulating multidimensional deficits with the Frailty Index117 ) when choosing a tool; we did not directly assess these underlying conceptual frameworks. Finally, we excluded comprehensive geriatric assessment (as this is a technique limited to geriatricians), however, this technique is considered as a gold standard approach to frailty assessment in some settings.
Clinicians should consider accuracy and feasibility when choosing a frailty instrument. Strong evidence in both domains support the Clinical Frailty Scale, which had the largest pooled effect size for predicting mortality and non-home discharge after surgery and appears to be the fastest and most practical instrument that has been widely studied. Strong associations with complications and delirium, and a largest amount of available data, also support the Fried Phenotype, but its use appears to require a trade-off between high accuracy with lower feasibility.
The authors wish to acknowledge Ms. Sascha Davis, Learning Services, The Ottawa Hospital, for her expertise is designing and executing the systematic review search strategy.
Dr. McIsaac receives salary support from The Ottawa Hospital Department of Anesthesiology and Pain Medicine (Ottawa, Ontario, Canada) and is supported by the Canadian Anesthesiologists’ Society (Toronto, Ontario, Canada) Career Scientist Award, but reports no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 yr and no other relationships or activities that could appear to have influenced the submitted work. The authors acknowledge The Ottawa Hospital Department of Anesthesiology and Pain Medicine for supporting use of Distiller SR software.
The authors declare no competing interests.