Although there are thousands of published recommendations in anesthesiology clinical practice guidelines, the extent to which these are supported by high levels of evidence is not known. This study hypothesized that most recommendations in clinical practice guidelines are supported by a low level of evidence.
A registered (Prospero CRD42020202932) systematic review was conducted of anesthesia evidence-based recommendations from the major North American and European anesthesiology societies between January 2010 and September 2020 in PubMed and EMBASE. The level of evidence A, B, or C and the strength of recommendation (strong or weak) for each recommendation was mapped using the American College of Cardiology/American Heart Association classification system or the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. The outcome of interest was the proportion of recommendations supported by levels of evidence A, B, and C. Changes in the level of evidence over time were examined. Risk of bias was assessed using Appraisal of Guidelines for Research and Evaluation (AGREE) II.
In total, 60 guidelines comprising 2,280 recommendations were reviewed. Level of evidence A supported 16% (363 of 2,280) of total recommendations and 19% (288 of 1,506) of strong recommendations. Level of evidence C supported 51% (1,160 of 2,280) of all recommendations and 50% (756 of 1,506) of strong recommendations. Of all the guidelines, 73% (44 of 60) had a low risk of bias. The proportion of recommendations supported by level of evidence A versus level of evidence C (relative risk ratio, 0.93; 95% CI, 0.18 to 4.74; P = 0.933) or level of evidence B versus level of evidence C (relative risk ratio, 1.63; 95% CI, 0.72 to 3.72; P = 0.243) did not increase in guidelines that were revised. Year of publication was also not associated with increases in the proportion of recommendations supported by level of evidence A (relative risk ratio, 1.07; 95% CI, 0.93 to 1.23; P = 0.340) or level of evidence B (relative risk ratio, 1.05; 95% CI, 0.96 to 1.15; P = 0.283) compared to level of evidence C.
Half of the recommendations in anesthesiology clinical practice guidelines are based on a low level of evidence, and this did not change over time. These findings highlight the need for additional efforts to increase the quality of evidence used to guide decision-making in anesthesiology.
Anesthesia clinical practice guidelines make evidence-based recommendations intended to optimize patient outcomes. The extent to which these recommendations are supported by high-quality evidence is not known.
In a systematic review of 2,280 recommendations in 60 guidelines published by major North American and European societies, half of the recommendations were supported by a low level of evidence.
The proportion of recommendations supported by a high level of evidence did not increase between 2010 and 2020.
Perioperative mortality is the third leading cause of death in the United States after heart disease and cancer.1 Over 60 years ago, Beecher reported that anesthesia caused 1 death per 1,560 operations.2 Analyses based on contemporary data report that anesthesia-related mortality has dropped by nearly 99% to 8.2 deaths per million surgical discharges.3 However, this contemporary analysis underestimates the impact of anesthetic care on outcomes because it only attributes deaths to anesthesia if they were caused by overdoses or adverse effects of anesthetics, malignant hyperthermia, or failed or difficult intubations.3 This analysis ignores the role that anesthesiologists play in optimizing patient physiology to prevent complications such as myocardial infarctions, kidney injury, and strokes.3
Reducing preventable deaths and complications after surgery requires a better understanding of the gaps in the evidence base currently used by anesthesiologists to make clinical decisions. For nearly three decades, anesthesiology societies have published clinical practice guidelines on the perioperative management of patients undergoing surgery and other procedures. Anesthesiologists rely on these recommendations to guide decision-making because clinical practice guidelines represent the “epitome” of evidence-based medicine. These recommendations are based on the best available evidence and serve as the framework for best practices in perioperative care. However, clinical practice guidelines are only valid if the scientific basis for these guidelines is valid. In their landmark study published in 2009, Tricoci et al.4 reported that only 11% of the American College of Cardiology/American Heart Association guidelines were based on the highest level of evidence, whereas nearly half were based only on expert opinion or case studies. This reliance on expert opinion is problematic because expert opinion, by definition, has not been scientifically validated. Ten years later, the extent to which cardiovascular guidelines rely on expert opinions has not changed significantly.5 Similar findings have been reported for other medical and surgical subspecialties.6–8 To date, the quality of the evidence supporting clinical practice guidelines in anesthesiology has not been reported.
We report the results of our systematic review of anesthesiology evidence-based clinical practice guidelines published by the major North American and European societies and anesthesiology subspecialty societies. Our primary objective is to evaluate the quality of the evidence underlying anesthesiology clinical practice guidelines. Our second objective is to examine the change in the quality of the evidence supporting these clinical practice guidelines over time. Our goal is to better understand the evidence base for anesthesia practice and help inform discussions on future steps needed to improve the quality of evidence underlying the perioperative care of surgical patients.
Materials and Methods
Protocol and Registration
We conducted our systematic review using the Cochrane method. We expanded our analysis to include guidelines published outside of the United States based on comments that we received during the editorial process. Our revised protocol was published in Prospero (CRD42020202932, June 9, 2020), an international registry of systematic reviews, after the initial peer review.9 Our report adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.10
Eligibility Criteria
We reviewed perioperative clinical practice guidelines developed by the major anesthesiology societies in North America and Europe between January 1, 2010, and September 9, 2020. All documents that had a clear statement of being a clinical practice guideline and that graded the levels of evidence supporting their recommendations were included. We excluded guidelines related to intensive care and chronic pain. We excluded previous versions of published guidelines in our main analyses. We also excluded practice advisories because they represent a level of recommendation lower than that offered by clinical practice guidelines.11
Search Strategy
A librarian (L.H.) built a specific and sensitive search strategy, including the name of the major North American and European anesthesiology societies and the names of the leading subspecialty societies, followed by the names of the anesthesiology journals with the 10 highest impact factors (Scimago),12 and finally connected with terms related to clinical practice guidelines and synonyms: ((‘American Society of Anesthesiologists’ OR ‘American Society of Regional Anesthesia and Pain Medicine’ OR ‘Society for Obstetric Anesthesia and Perinatology’ OR ‘Society of Cardiovascular Anesthesiologists’ OR ‘Society for Ambulatory Anesthesia’ OR ‘Society of Anesthesia and Sleep Medicine’ OR ‘Society of Critical Care Anesthesiologists’ OR ‘Society for Pediatric Anesthesia’ OR ‘Trauma Anesthesiology Society’ OR ‘Society for Neuroscience in Anesthesiology and Critical Care’ OR ‘Society for Airway Management’ OR ‘Society of Academic Associations of Anesthesiology and Perioperative Medicine’ OR ‘Society for the Advancement of Transplant Anesthesia’ OR ‘American Society for Enhanced Recovery’ OR ‘American Pain Society’ OR ‘European Society of Anaesthesiology’ OR ‘European Society of Regional Anaesthesia and Pain Therapy’ OR ‘European Society for Paediatric Anaesthesiology’ OR ‘European Association of Cardiothoracic Anesthesiology’ OR ‘Neuroanaesthesia and Critical Care Society’ OR ‘Obstetric Anaesthetists Association’ OR ‘Difficult Airway Society’ OR ‘ERAS Society’ OR ‘Association of Anaesthetists’ OR ‘Royal College of Anaesthetists’ OR ‘Canadian Anesthesiologists Society’ OR ‘Regional Anesthesia and Pain Medicine’:jt OR ‘Anesthesia and Analgesia’:jt OR ‘Anesthesiology’:jt OR ‘British Journal of Anaesthesia’:jt OR ‘Anaesthesia’:jt OR ‘European Journal of Anaesthesiology’:jt OR ‘Canadian Journal of Anesthesia’:jt OR ‘Paediatric Anaesthesia’:jt OR ‘Acta Anaesthesiologica Scandinavica’:jt OR ‘Anaesthesia Critical Care and Pain Medicine’:jt)) AND (‘practice guideline’ OR ‘guideline*’ OR ‘evidence based’ OR ‘task force’)
We used a time filter between January 1, 2010, and September 9, 2020. The decision to include or exclude each society for the search strategy was determined by three anesthesiologists (L.G.G., J.A.W., and M.R.W.).
Information Sources
We searched PubMed and EMBASE from January 1, 2010, to September 9, 2020, for clinical practice guidelines developed by the major anesthesiology societies in North America and Europe. No restriction on language was used. We also searched the web pages of these societies.
Study Selection
Two investigators independently screened the titles and abstracts of all references from the search results using the systematic review software Abstrackr.13 The full texts of the relevant citations were reviewed and further screened for eligibility. Finally, based on the recommendations of the Cochrane Handbook for Systematic Reviews14,15 and the PRISMA statement checklist,10 disagreements about the references for data extraction were resolved by consensus. The analytic sample consisted of 60 guidelines with 2,280 recommendations.
Data Collection Process
Two investigators independently collected data from the included guidelines. The following items were retrieved: guideline title, sponsor (e.g., American Society of Anesthesiologists), year of publication, update status, method used to grade evidence, funding source, population or focus of guideline, and the anesthesia subspecialty (if applicable). The extracted results were compared for concordance between reviewers, and disagreements were resolved by consensus. If a guideline was intended for a multidisciplinary audience (i.e., 2010 guideline for diagnosis and management of patients with thoracic aortic disease16 and 2011 guideline for coronary artery bypass graft surgery17 ), we only considered the recommendations directed toward anesthesiologists.
Extraction of Level of Evidence
The reviewed guidelines used different methodologies for evaluating the level of evidence. One third of the recommendations (796) were graded using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. According to the GRADE system, level of evidence A is defined as “consistent evidence from well-performed randomized controlled trials or overwhelming evidence of some other form”; level of evidence B is defined as “evidence from randomized controlled trials with important limitations or very strong evidence of some other form”; and level of evidence C is defined as “evidence from observational studies, unsystematic clinical experience, or from randomized controlled trials with serious flaws”18 (table 1). We categorized the other recommendations (1,484) using the American College of Cardiology/American Heart Association classification system: level of evidence A includes data from multiple randomized controlled trials or meta-analyses, level of evidence B represents data from a single randomized controlled study or observational studies, and level of evidence C is limited to data from case reports and expert opinion4 (table 1). For those guidelines that did not explicitly classify the level of evidence using the American College of Cardiology/American Heart Association or GRADE classification system, two investigators independently classified the recommendations using the grading system (American College of Cardiology/American Heart Association or GRADE) that most closely approximated the grading system used in the guideline (table 1). Agreement between the evaluators was achieved by consensus as per the Cochrane Handbook for Systematic Reviews.14
Extraction of Strength of Recommendation
Recommendations (796) classified using the GRADE system were classified as either strong recommendations (benefits clearly outweigh risk and burdens or vice versa) or weak recommendations (benefits closely balanced with risks and burdens)19 within the body of the documents. All other recommendations (1,484) were classified as strong or weak recommendations based on the American College of Cardiology/American Heart Association classification system (table 1) by three investigators (A.L., D.A.R., J.E.B.-C.), who independently reviewed the wording and categorized them as strong recommendations: class I (benefit clearly outweighs risk) or class III (no benefit, not helpful, harmful); or weak recommendations: class II (benefit closely balanced with risks).20 Figure A1 shows the phrases used to map recommendations to the American College of Cardiology/American Heart Association strength of recommendations using either the GRADE or American College of Cardiology/American Heart Association classification system. For example, class I recommendations are those for which there is evidence and general agreement that the treatment is useful or effective. These are presented with terms such as “should,” “is recommended,” “is indicated,” and “is useful/effective/beneficial.” Agreement between the evaluators was achieved by consensus as per the Cochrane Handbook for Systematic Reviews.14
Risk of Bias in Individual Studies
All documents included were assessed independently by three reviewers using the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument.21 AGREE II is a framework for assessing the quality of guidelines that AGREE II defines “as the confidence the potential biases of guideline development have been addressed adequately.”21,22 Upon completing the 23 items of the AGREE II instrument, the reviewers made a judgment about the quality of the guideline considering the criteria in the assessment process. A threshold of 70% in the overall assessment was used to identify highest quality guidelines with lowest risk of bias. This threshold was decided by consensus among the authors.21,23
Analysis
Descriptive Analysis
We first report the proportion of recommendations supported by levels of evidence A, B, and C. We then report the proportion of recommendations supported by levels of evidence A, B, and C stratified by the strength of the recommendation (strong versus weak), by classification system (GRADE, American College of Cardiology/American Heart Association), and by specialty (general, cardiovascular, obstetric, pediatric, acute pain, regional, and neuroanesthesia). For simplicity of presentation, the term “general” is used to define nonspecialty care. We used multinomial logistic regression modeling, only including intercept terms, to compare the proportion of recommendations supported by level of evidence A versus level of evidence C and the number supported by level of evidence B versus level of evidence C.
Statistical Analysis
Bivariate multinomial logistic regression was used to separately examine the association between the quality of evidence supporting clinical practice guidelines and (1) subspecialty, (2) strength of recommendation (strong versus weak), (3) region (the United States, Europe, or multinational), (4) methodology used for grading the quality of the evidence (American College of Cardiology/American Heart Association or GRADE), and (5) risk of bias (defined as an overall score of less than 70% or greater than or equal to 70% [where a higher score indicates a lower risk of bias] on AGREE II). The dependent variable was specified as a categorical indicator: level of evidence A, B, or C.
We then examined whether the quality of evidence supporting clinical practice guidelines changed over time using multinomial logistic regression. The analytic sample included all general guidelines that were revised (519 previous recommendations and 590 revised recommendations). We excluded subspecialty guidelines because very few subspecialty guidelines were updated. The dependent variable was specified as a categorical indicator variable: level of evidence A, B, or C. The key independent variable indicated whether a recommendation was included in the original guideline or the revised guideline. We estimated an unadjusted model in the main analysis. We then performed a sensitivity analysis in which we estimated a nonparsimonious multivariable model adjusting for subspecialty, strength of recommendation (strong versus weak), region (the United States, Europe, or multinational), and the methodology used for grading the quality of the evidence (American College of Cardiology/American Heart Association or GRADE). We did not adjust for AGREE II because it did not have a clinically meaningful effect size in the descriptive bivariate analyses. Next, we performed a secondary analysis based on the complete set of recommendations including previous versions of revised guidelines (2,280 recommendations from current guidelines and 580 recommendations from previous guidelines that had been revised). The key independent variable was the year in which a guideline was published, specified as a continuous variable. As above, we also performed a sensitivity analysis which adjusted for subspecialty, strength of recommendation (strong versus weak), region (the United States, Europe, or multinational), and the methodology used for grading the quality of the evidence (American College of Cardiology/American Heart Association or GRADE).
The use of multinomial logistic regression was not prespecified in our published protocol. We chose this approach instead of logistic regression to avoid the loss of information that would occur if we collapsed the three levels of evidence (levels of evidence A, B, and C) into two categories (level of evidence A and B versus level of evidence C). We selected multinomial logistic regression instead of ordered logistic regression because the parallel regression assumption in ordered logistic regression is rarely met.24
All analyses were performed using STATA 16.1 (StataCorp, USA). Because recommendations within the same guideline may not be independent, we used cluster robust variance estimators using the guideline as the unit of clustering.25 Findings are reported as relative risk ratios. Two-sided P values of less than 0.05 are reported as statistically significant.
No statistical power calculation was conducted before the study. The sample size was based on the available data.
Results
Study Selection and Characteristics
We found 7,808 citations, of which we reviewed 271 in full text, and included 70 documents (60 guidelines with 2,280 recommendations) for data extraction (fig. A2; table 2). Overall, 29 guidelines were developed in the United States, 15 guidelines in Europe, and 16 in both. Sixteen of the guidelines were developed by or in collaboration with the American Society of Anesthesiologists (Schaumburg, Illinois) and ten of the guidelines were developed by or in collaboration with the European Society of Anesthesiology (Brussels, Belgium). Of the 2,280 recommendations, 60% were addressed toward general anesthesiology practice: 22% (511) to cardiovascular anesthesia, 6% (140) to regional anesthesia and acute pain, 5% (123) to obstetric anesthesia, 4% (93) to pediatric anesthesia, and 2% (51) to neuroanesthesia.
Level of Evidence Supporting Recommendations
We mapped the level of evidence in individual guidelines to that used by the American College of Cardiology/American Heart Association and GRADE systems (see table 1 for definitions). Level of evidence A supported 16% (363 of 2,280) of recommendations, level of evidence B supported 33% (757 of 2,280), and level of evidence C supported 51% (1,160 of 2,280). When assessing only strong recommendations, 19% (288 of 1,506) were supported by level of evidence A, 31% by level of evidence B (462 of 1,506), and 50% (756 of 1,506) by level of evidence C evidence (fig. 1). After stratifying this analysis by the classifying system (GRADE versus American College of Cardiology/American Heart Association), we found that the distribution of levels of evidence was qualitatively similar to the above (fig. 1).
Risk of Bias within Clinical Practice Guidelines
The scores of the AGREE II domains for each of the clinical practice guidelines are shown in table 2. Forty-four of the clinical practice guidelines (73%) exceeded the threshold score of 70% (table 3). Recommendations with a low risk of bias (AGREE II score greater than or equal to 70%) were not more likely to be supported by level of evidence A versus level of evidence C compared to recommendations with a higher risk of bias (relative risk ratio, 0.91; 95% CI, 0.32 to 2.57; P = 0.857; fig. 3a). Recommendations with a low risk of bias were also not more likely to be supported by level of evidence B versus level of evidence C compared to recommendations with a higher risk of bias (incidence-rate ratio, 1.05; 95% CI, 0.53 to 2.06; P = 0.897; fig. 3b).
Level of Evidence Supporting Recommendations Stratified by Subspecialty
Figure 2 depicts the distribution of levels of evidence across the different subspecialties stratified by the level of evidence classification system (GRADE versus American College of Cardiology/American Heart Association). Neuroanethesia (relative risk ratio, 0.06; 95% CI, 0.02 to 0.21; P < 0.001) and regional (relative risk ratio, 0.37; 95% CI, 0.20 to 0.68; P = 0.001) were less likely to be associated with level of evidence A versus level of evidence C compared to general (fig. 3, a and b). Recommendations in clinical practice guidelines for cardiovascular anesthesia were more likely to be associated with level of evidence B versus level of evidence C (relative risk ratio, 1.87; 95% CI, 1.02 to 3.43; P = 0.043) compared to general (fig. 3, a and b). Acute pain (relative risk ratio, 0.32; 95% CI, 0.11 to 0.97; P = 0.044), obstetrics (relative risk ratio, 0.29; 95% CI, 0.11 to 0.82; P = 0.019), and regional (relative risk ratio, 0.33; 95% CI, 0.22 to 0.49; P < 0.001) were less likely to be associated with level of evidence B versus level of evidence C compared to general (fig. 3, a and b).
Strength of Recommendation
Compared to weak recommendations, strong recommendations were not significantly more likely to be associated with level of evidence A versus level of evidence C (relative risk ratio, 2.05; 95% CI, 0.93 to 4.55; P = 0.077), or level of evidence B versus level of evidence C (relative risk ratio, 0.84; 95% CI, 0.54 to 1.29; P = 0.419).
Regional Differences
There were 29 U.S. guidelines, 15 European guidelines (25 documents), and 16 multinational Enhanced Recovery after Surgery guidelines (the United States and Europe; fig. A1). Recommendations that were jointly developed in the United States and Europe were more likely to be supported by (1) level of evidence A versus level of evidence C (relative risk ratio, 4.63; 95% CI, 2.09 to 10.3; P < 0.001) and (2) level of evidence B versus level of evidence C (relative risk ratio, 3.06; 95% CI, 1.57 to 5.96; P = 0.001) compared to U.S. guidelines.
Methodology Used to Grade Level of Evidence: American College of Cardiology/American Heart Association versus GRADE
Using GRADE to classify level of evidence was not significantly associated with level of evidence A versus level of evidence C (relative risk ratio, 0.98; 95% CI, 0.41 to 2.36; P = 0.961) or level of evidence B versus level of evidence C (relative risk ratio, 1.45; 95% CI, 0.79 to 2.65; P = 0.231) compared to the American College of Cardiology/American Heart Association methodology.
Temporal Trends
Recommendations in revised guidelines were not more likely to be supported by level of evidence A versus level of evidence C (relative risk ratio, 0.93; 95% CI, 0.18 to 4.74; P = 0.933) compared to recommendations in the original guidelines. Recommendations in revised guidelines were also not more likely to be associated with level of evidence B versus level of evidence C (relative risk ratio, 1.63; 95% CI, 0.72 to 3.72; P = 0.243). In the sensitivity analysis in which we adjusted for recommendation strength, region, and methodology, recommendations in the revised guidelines were also not more likely to be supported by level of evidence A versus level of evidence C (relative risk ratio, 1.08; 95% CI, 0.24 to 4.88; P = 0.921) or level of evidence B versus level of evidence C (relative risk ratio, 2.08; 95% CI, 0.92 to 4.69; P = 0.077) compared to recommendations in the original guidelines (fig. 4). In the secondary analysis based on the complete set of recommendations (including previous versions of revised guidelines), the publication year was not associated with the level of evidence supporting the recommendations for either level of evidence A versus level of evidence C (relative risk ratio, 1.07; 95% CI, 0.93 to 1.23; P = 0.340) or level of evidence B versus level of evidence C (relative risk ratio, 1.05; 95% CI, 0.96 to 1.15; P = 0.283). The results of the sensitivity analysis in which we adjusted for recommendation strength, region, and methodology are shown in figure A3 (a and b).
Discussion
In this systematic review of clinical practice guidelines developed by anesthesiology societies from the United States and Europe, only 16% of all recommendations were supported by a high level of evidence (level of evidence A). In total, 51% of recommendations were supported by a low level of evidence (level of evidence C). More strikingly, 50% of all strong recommendations were also only supported by a low level of evidence. The proportion of recommendations supported by level of evidence A or B did not increase over time compared to level of evidence C. Finally, recommendations in multinational guidelines were four times more likely to be supported by level of evidence A than recommendations in U.S. guidelines.
Previous studies have also evaluated the level of evidence supporting recommendations in clinical practice guidelines published by other medical organizations such as the American Heart Association, the American College of Cardiology, the European Society of Cardiology (Sophia Antipolis, France), the Society for Critical Care Medicine (Mount Prospect, Illinois), and the American College of Obstetricians and Gynecologists (Washington, D.C.).5–8,95,96 In common with anesthesiology, most of the recommendations from these medical specialties were also based on a low level of evidence instead of high-quality evidence. With the exception of the Infectious Disease Society of America (Arlington, Virginia), the reliance on expert opinion did not change over time.95
The large proportion of recommendations in anesthesia clinical practice guidelines based on low-quality evidence is a cause for concern. In the past, large clinical trials in perioperative medicine were uncommon compared to other fields such as cardiology.97 However, the number of high-quality large clinical trials in perioperative medicine has increased markedly over the past 10 years. In particular, these clinical trials have focused on the use of aspirin, clonidine, and β-blockers in patients undergoing noncardiac surgery98–100 ; the safety of nitrous oxide101 ; the avoidance of general anesthesia in patients undergoing cancer surgery102 ; the safety of lower versus higher depth of anesthesia103 ; the use of the Bispectral Index to reduce awareness104 ; the cardioprotective effects of volatile anesthetics105 ; and transfusion triggers.106 Despite this, there remain many important foundational questions that have yet to be answered. For example, although observational studies demonstrate a strong association between hypotension and end-organ damage,100,107 we still lack a high level of evidence to support the specific mean arterial pressure target recently proposed in the Perioperative Quality Initiative consensus statement on intraoperative blood pressure.108
Our work and that of others demonstrate the extent to which clinical practice guidelines are based primarily on a low level of evidence. However, despite the recent increase in high-profile randomized clinical trials in perioperative medicine, randomized controlled trials will never replace lower levels of evidence because of cost considerations and time constraints.109 Randomized controlled trials are expensive, usually taking several years to complete, and may lack external validity when study populations do not represent the population at large. Although drawing causal inferences from observational trials is generally discouraged because nonrandomized trials may not control for unknown prognostic factors,110 there is frequently a good correlation between randomized and observational studies.111,112
In the absence of randomized clinical trials, many clinical questions may be addressed using well performed observational studies. Confounding bias, which is the main limitation of observational studies, can be reduced by using comprehensive databases that include most prognostic factors and (in some cases) through the use of statistical techniques such as propensity scoring, instrumental variable analysis, and inverse probability weighting. Well performed observational studies with very large effect sizes or large effect sizes can serve as level of evidence A or B, respectively, as defined by the GRADE methodology.113 Our finding that over half of recommendations in clinical practice guidelines are based only on a low level of evidence should lead us to increase our efforts to conduct both robust randomized and observational studies. However, we should also recognize that some anesthesia best practices, such as pulse oximetry and capnography, are not supported by high levels of evidence but are nonetheless considered to be the foundation of anesthesia care. Finally, it is important to recognize that expert opinion can help guide clinical practice until the time when higher quality evidence becomes available.
Our study has several important limitations. First, our findings on the level of evidence supporting recommendations in anesthesiology clinical practice guidelines developed by major anesthesiology societies in North America and Europe cannot be generalized to include all of the evidence base for anesthesiology and perioperative medicine. Second, anesthesiology clinical practice guidelines lacked a single uniform grading system for assigning levels of evidence and the strength of their recommendations. The American College of Cardiology/American Heart Association and GRADE systems use different criteria for the levels of evidence. For example, the American College of Cardiology/American Heart Association classifies recommendations as level of evidence C if they are based on expert opinion or case studies. GRADE, on the other hand, classifies evidence from observational studies or randomized controlled trials with serious flaws as level of evidence C. However, despite using two different classification systems, we still found that most guidelines were based on level of evidence C irrespective of which classification system was used. Third, for those guidelines that used grading systems that were similar but not identical to either the American College of Cardiology/American Heart Association or GRADE systems, we mapped their grading system to either American College of Cardiology/American Heart Association or GRADE to provide a standardized framework for categorizing the strengths of the recommendations and the levels of evidence. The risk of introducing bias in the mapping process was minimized by using multiple evaluators. Fourth, the American College of Cardiology/American Heart Association definitions for levels of evidence have changed slightly over time. We used the American College of Cardiology/American Heart Association level of evidence definitions presented in the seminal article by Tricoci et al.4 because these definitions most closely approximated the approach used in guidelines that used a grading methodology similar to the American College of Cardiology/American Heart Association classification system. Finally, we excluded clinical practice guidelines that did not explicitly grade the levels of evidence to minimize the risk of misclassification of the levels of evidence. We also excluded consensus statements based on expert opinion only. Excluding the consensus statements may have led us to underestimate the proportion of recommendations based on level of evidence C.
Conclusions
In summary, less than one fifth of recommendations in anesthesiology clinical practice guidelines are supported by level of evidence A, and half of the recommendations are supported by level of evidence C. The quality of the evidence in anesthesiology clinical practice guidelines has not improved in the last 10 years. Given that death after surgery is a leading cause of death, our findings highlight the need to increase the number of well performed randomized and observational trials in perioperative medicine to lessen the reliance on low levels of evidence in anesthesia and perioperative medicine. To accomplish this, we need to increase National Institutes of Health investment in perioperative medicine and create a comprehensive research agenda to bring together anesthesiologists, surgeons, public health experts, and patients to improve perioperative outcomes.
Acknowledgments
The authors appreciate the contributions of Cosmo Fowler, M.D. (Rochester, New York), in the guidelines’ appraisal with the AGREE II tool; Daniela Martinez, B.S. (Rochester, New York), for her assistance with Microsoft Excel data calculations; and Courtney Vidovich, B.S. (Rochester, New York), for her contributions in the screening of searched references.
Research Support
Supported by the Department of Anesthesiology and Perioperative Medicine at the University of Rochester School of Medicine and Dentistry (Rochester, New York).
Competing Interests
The authors declare no competing interests.