IN 1954, the year I was born, Dylan Thomas wrote, “Time held me green and dying, but I sang in my chains like the sea.” In these lines, he expresses his disdain for aging, illness, infirmity, and eventual death. How differently the great poet must have felt 2 yr before his premature death when he penned his most famous lines:
Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.
Consider a young mother with cancer. Consider a child with a lethal congenital condition. Rage seems the only appropriate response to the dying of the light. But of course, none of us wants to go into that good night any earlier than strictly necessary, and preferably only after long and fulfilling lives. Neither do our patients. After all, the one thing patients ask of us, above all else, is to keep them alive. It is thus reasonable to ask how well we do. The answer depends on which perioperative period we consider.
Few patients die during surgery, and to our credit, intraoperative mortality is now at least a factor of 10 less than it was three decades ago during my residency—despite surgical patients now being much older and sicker.1 In fact, intraoperative mortality is now so low that it is hard to measure.2 The marked reduction in intraoperative mortality did not happen by magic; it happened because of a concerted effort to improve drugs, monitors, and training. No other specialty has remotely reduced mortality by an order of magnitude, and we deserve credit for the impressive improvement.
Many anesthesiologists and surgeons incorrectly believe that a patient safely delivered to the postanesthesia care unit has survived the most dangerous part of hospitalization. In fact, 30-day postoperative mortality is 1,000 times greater than preventable intraoperative mortality. If the 30 days after surgery were considered a distinct disease, it would be the third leading cause of death (fig. 1).3 The numbers are sobering: about 2% of U.S. surgical inpatients die within 30 days.4 Worldwide, at least five million patients die each year within a month of surgery. Furthermore, about half of 30-day mortality occurs during the initial hospitalization—and therefore while patients remain under full medical care and in our highest-level facilities. Because patients die after surgery rather than intraoperatively, postoperative mortality must be considered the major perioperative problem (table 1).
During the first postoperative year, about 5% of surgical patients die. Among those more than 65 yr of age—about one third of our patients—1-yr mortality is 10%.5 How many anesthesiologists appreciate that one in 10 elderly surgical patients is dead within the year? Most postoperative mortality is, naturally, consequent to severe underlying pathology and necessarily invasive operations. And as might thus be expected, postoperative deaths are nonrandom: sicker patients are far more likely to die. In fact, death can be predicted remarkably accurately just from administrative data, specifically a patient’s accumulated diagnostic and procedural codes.6,7
The question, then, is the extent to which anesthesiologists contribute to mortality by what we currently do, and—more importantly perhaps—whether we can prevent serious complications and mortality by doing things differently? Perhaps the place to start is with the causes of death. Thirty-day all-cause mortality is largely cardiovascular—mostly myocardial infarctions.8 The incidence of postoperative myocardial infarctions is far higher than generally appreciated. About 8% of surgical inpatients more than 45 yr of age have an infarction, usually within the initial 3 postoperative days.4 This is orders of magnitude greater than the risk in comparable patients who do not have surgery.
The infarction incidence is higher than generally appreciated because 80% of postoperative myocardial injury is clinically silent; that is, detectable only by troponin monitoring. It is tempting to assume that clinically apparent events are the more serious ones, and that others are just “troponitis.” But that would be wrong: mortality is nearly identical for symptomatic and asymptomatic postoperative infarctions. Furthermore, the mortality is a staggering 10%. One in 10 patients with symptomatic or asymptomatic postoperative troponin elevation thus die within the month (table 2).9
During the initial postoperative year, the causes of death shift. About half of 1-yr mortality is due to cancer. Of course, this does not imply that surgery or anesthesia causes cancer. These are patients who come to us with malignancies and then die from disease progression. But it certainly begs the question of whether anything we do might reduce the risk of cancer recurrence. And while perhaps unlikely, there are reasons to believe that regional analgesia might help.10 At least two major trials of regional analgesia and cancer recurrence are in progress. It is also possible that perioperative administration of cyclooxygenase-2 inhibitors will reduce cancer recurrence.11–13 These theories remain entirely speculative, but are examples of research that would “make a difference.”
My father used to say that most any problem could be solved by “throwing money at it.” I am afraid perioperative mortality will not be so easily solved—although money would surely help! The problems are multifactorial, which is another way of saying that there is plenty of blame to pass around. Basic scientists, translational investigators, and clinicians have all contributed. In the subsequent three sections, I will identify issues each group might consider.
Beautiful science with no conceivable direct benefit to humans may be well worth doing. Much theoretical physics, for example, is not obviously useful but is nonetheless magnificent and broadens our understanding of the universe. There is similarly a role for fundamental mechanistic and physiologic studies. But if investigators claim that research will be useful, then it probably should be. Often it is not.
Practically every biomedical basic science grant application, and most research reports, starts with assertions that the proposed studies or presented results will markedly enhance clinical care. Few actually do. Consequently, the ratio of clinically useful advances to basic science articles is tiny. Or to put this another way, humans have proven to be a poor model for rats. Basic scientists need to help the rest of us identify studies and results that are actually applicable to patients. That is, guide us to the results that really matter and should progress to testing in animals and then humans.
We have seen many clearly delineated mechanisms that just did not translate from test tubes and animals to humans. Consider vitamins and dietary supplements; there are good mechanisms explaining why many will enhance health, yet virtually none has proven beneficial in broad populations. Vitamin E, especially, has been disappointing as large randomized trials show no benefit from supplements despite compelling reasons to anticipate benefit.14,15 Same with vitamin C,14,16,17 olive oil,18 margarine, red wine, and nearly every other dietary intervention. In fact, it is hard to think of another area where such a mountain of scientific (and nonscientific) articles have produced but a thimble-full of compelling human outcome data.
Closer to home, there is no question that nitrous oxide interferes with vitamin B12 and folate metabolism, thus increasing plasma homocysteine, impairing endothelial function, and impairing protein synthesis.19 Yet, two large randomized trials have convincingly shown that nitrous oxide causes no harm more serious than nausea and vomiting20,21 —and less of that than volatile anesthetics.22 Why the disconnect? Why could not our basic science colleagues help us understand that the molecular effects of nitrous oxide on protein production were unlikely be clinically important, thus obviating the need to randomize more than 9,000 patients to establish the safety of nitrous oxide (fig. 2)?
Therapeutic hypothermia is another example. It has been known since the early 1970s that a few degrees centigrade of hypothermia ameliorates ischemia and reperfusion injury on a cellular level.23 Furthermore, therapeutic hypothermia reduces ischemic injury in virtually every model in every animal species.24 Yet, the results in humans have been dismal. Large trials failed to demonstrate benefits from hypothermia for brain trauma,25 aneurysm surgery,26 and acute myocardial infarction.27 (Curiously, a major trial of hypothermia for stroke, an obvious application of therapeutic hypothermia, has yet to be completed.) A bright spot was out-of-hospital cardiac arrest, based on two modest-sized studies.28,29 However, a subsequent study with more than twice as many patients as the original two combined showed no benefit.30 And if anything, therapeutic hypothermia for in-hospital cardiac arrest appears to worsen outcome.31
Even cardiopulmonary bypass, which was routinely done at 28°C for its putative brain protection, is now often conducted with patients kept normothermic with equally good results—which is consistent with many randomized trials showing no benefit from hypothermic cardiopulmonary bypass.32 At this point, neonatal asphyxia, which is reasonably well documented,33,34 and organ donation (based on a single major trial)35 remain the only indications for deliberate hypothermia. In fairness, though, I note that hypothermia studies are challenging and that study design and execution (particularly the delay between insult and implementation of hypothermia) may be the major problem rather than the theory or mechanism. Half-a-dozen major trials are in progress, and therapeutic hypothermia may yet be proven beneficial in some circumstances.
A deep understanding of genetics was among the scientific triumphs of the last half-century. Powerful techniques such as genome-wide arrays were to unlock the genetic basis for much disease, opening an era of individualized medicine. While there have been undoubted advances, genetics has yet to fulfill its initial promise. Genetic analysis remains critical for diseases caused by single mutations, many of which have been understood for decades. But the more common diseases such as hypertension and cardiovascular conditions, the ones that actually kill lots of people, are controlled by dozens or hundreds of genes and have largely resisted analysis despite enormous effort.
Genetic analysis is nonetheless well on its way to replacing caffeine–halothane contracture testing for malignant hyperthermia.36 Presumably, genetic analysis will eventually be the standard diagnostic approach to this uniquely anesthetic disease—and probably to many others as well. I have no doubt that genomics will eventually contribute enormously to diagnosis and treatment throughout medicine, but I am similarly impressed that progress has been much slower than predicted and anticipated.
For example, it is worth considering that National Institutes of Health (Bethesda, Maryland) spent $15 billion dollars of its $26 billion 2016 budget (58%) on research with key words that included “gene,” “stem cell,” and “regenerative medicine.” Perhaps as a consequence, more than 29,000 articles with those key words were published in 2014. And what do we have to show for it? Sixty years after identification of the single-gene mutation for sickle cell, not a single targeted therapy has been developed.37 And sickle cell is a “simple” genetic problem. We are nowhere near solving the far more common, lethal, and complicated problems such as cardiovascular disease and cancer.
Even the most fundamental basic science is worthwhile and at least enhances understanding of physiology. Furthermore, research can be “beautiful” without being obviously useful—like astronomy. I recognize that in early stages, it can be difficult or impossible to estimate which novel techniques and approaches may prove useful. But the goal I set to scientists doing basic anesthesia research is to guide clinical investigators toward results most likely to enhance care. Or to put this another way, clinical trials are difficult, expensive, and time consuming; we will never be able to do many of them. It is thus important that we test theories that are both important and likely to be true. Basic scientists can help by guiding clinical investigators toward the theories most worth testing. New structures, such as broad-based consensus panels with various types of basic scientists and trialists, might prove helpful.
Translational and Clinical Research
There remains widespread misunderstanding about what “statistically significant” means. P = 0.05 does not mean that there is a 95% chance that a replication study will show similar results. Instead, P = 0.05 corresponds to only a 50% chance that a comparable study will have P ≤ 0.05.38,39 The P value needs to be 0.005 for this replication probability to reach the conventional power criterion of 80% and 0.0003 to reach 95%.40 Figure 3 explains the implications of P = 0.05 on replicability, and why a value of 0.0003 is needed to provide 95% power for replication. Typically, it requires about 3.5 times as many patients to power a study for 95% replication than for 50%. A corollary is that most clinical studies are quite underpowered for replication.
It is an unfortunate quirk of history that 0.05 was designated a “significant” P value. A more appropriate value would have been 0.005, or better 0.001.41 Had one of these values been designated the criterion for significance, medical literature would be clogged with many fewer false-positive studies—and the ones reported to be positive would be far more likely to be replicable.
A further difficulty is that “replicate” in this context applies just to the conclusion that the populations differ, not to the magnitude of the treatment effect, which is what clinicians really need to know. For example, a statistically significant result might have CIs around the relative treatment effect ranging, say, from 1.03 to 6.0. The difficulty is that a treatment effect of a few percent may not be clinically important, especially if the novel treatment is more expensive and has yet-to-be-characterized potential side effects. Conversely, a large treatment effect may be implausible and would suggest that the results are simply wrong. Large numbers of subjects are needed for robust results, especially when the outcomes of interest are relatively rare dichotomous events, such as myocardial infarction, respiratory arrest, or death (fig. 4).
Unfortunately, it takes many more patients to establish tight CIs around a treatment effect than to simply conclude that the populations do not differ by chance. A consequence of relying on P values as our primary strength-of-evidence indicator is that many statistically significant results have wide CIs that provide little guidance to clinicians. A further problem is that identical P values may result from studies with wildly different reliability.
For example, consider two trials of perioperative β blockers for prevention of myocardial infarctions (table 3). The first enrolls 200 patients and identifies one infarction in the treatment group and nine in the placebo group for a relative risk of 0.11 and P value of 0.02. The second enrolls 4,000 patients and identifies 200 infarctions in the treatment group and 250 in the placebo group for a relative risk of 0.80 and P value of 0.02.42 Which of these studies with identical P values do you believe?
The second study is far more believable for two reasons. One is that the treatment effect is plausible. That heart rate control reduces infarction risk by 20% is perfectly reasonable; that it would reduce the incidence of a complicated multifactorial outcome by a nearly a factor of 10 is not. The second problem is that the smaller study is fragile. The concept of fragility refers to small studies that are statistically significant, but depend critically on just a few outcomes that could easily have differed.43 For example, consider the consequences of adding just two events to the treatment groups in each study, which would easily happen by chance. The P value for the smaller trial would increase to 0.13, but the P value in the larger trial would remain unchanged at 0.02. The importance of fragility is demonstrated by frequent series of progressively larger studies that “correct” initial overly optimistic results.
Most everyone is aware that random chance can falsify research results. We thus look to statistical analysis for an estimate of the extent to which apparently robust signals might result from random error (bad luck). The trouble is that there are three other major sources of error that are harder to detect and usually impossible to quantify: selection bias, confounding, and measurement bias.44 Strong study design is the best protection against all three sources of error, with randomization generally protecting against selection bias and confounding and blinding protecting against most types of measurement bias. But even the best-designed randomized and blinded trials are subject to certain types of nonrandom error such as attrition bias.45
Large randomized, blinded trials are generally considered the highest level of clinical evidence. But they are expensive and usually take a long time to conduct. There will never be enough randomized trials to address even a small fraction of the clinically important questions. (Novel designs such alternating intervention46 and decision-supported randomization47 will help, but are only suitable for certain types of interventions.) Fortunately, trials can now be supplemented by analysis of large informative registries fed from electronic health records.48
Registry analyses provide an opportunity to address some questions more quickly and at far lower cost than trials; furthermore, some questions such as those related to unmodifiable factors (i.e., obesity, sex, and age) can only be addressed by epidemiologic analyses (table 4).38 But the trade-off is that registry analyses present a far greater risk of bias and confounding without the protections of randomization and blinding. The difficulty is that few anesthesiologists—or even investigators—appreciate how subtly error can creep into noninterventional studies. Let me give you an example from a recent review.44
Consider, for example, a study by Schull and Cobb49 in which the investigators asked an important question: Is arthritis hereditary? The experiment consisted of asking otherwise-similar people, with and without arthritis, whether their parents had arthritis. Their results are shown in table 5. The results were clear: people with arthritis were far more likely to report that one or both parents also had arthritis. The difference was highly statistically significant, with P = 0.003.
There was just one problem. The subjects with arthritis and the subjects without arthritis were siblings; they had exactly the same parents! So what happened here? Were some of the subjects lying? Unlikely. Most likely, people with rheumatoid arthritis thought much more about arthritis than those who did not. And they were far more likely to have discussed the issue with their parents and thus know (and remember) whether their parents had arthritis. This is an example of family information bias, a type of recall (measurement) bias. The difficulty is that there are many other types of bias, some of which are equally subtle, and it is usually difficult to estimate to what extent bias has degraded observational analyses. For additional discussion of sources of error and clinical research methodology, see recent reviews.44,45,48
Small fragile trials and confounded registry analyses do not advance our specialty. Some even guide us in the wrong direction, producing potential or actual harm. What we need is fewer and better studies. The goal I set for clinical investigators is thus to continue the recent trend toward large well-powered studies that provide actionable answers to important questions.42,50
Very large clinical trials—the ones providing the best guidance to clinicians—can require years of effort by hundreds of investigators. It is unreasonable to expect investigators to sustain such effort if they will not be rewarded with academic credit. If anesthesia is to have the number of large trials the specialty needs and deserves, department chairs and university promotion committees will have to recognize participation in large trials and consequent “corporate” authorship as a real academic activity.
Clinicians, you are not off the hook. Most everyone talks about practicing evidence-based medicine. And I fully understand the challenge because when you look for evidence, there turns out to be remarkably little. But it is also undoubtedly true that many clinicians do not implement well-established practices, instead basing practice on best clinical judgment—also known as “making it up.” Or even worse, there are clinicians who continue to practice much as they were taught in residency decades ago, ignoring rigorously proven advances.
For example, troponin screening for myocardial injury is inexpensive and the number needed to test is less than 15,9,51 which is tiny compared to many routine tests. Furthermore, troponin monitoring identifies a condition that has a stunning 10% 30-day mortality. And unlike many test results, positive troponins are actionable. Patients who experience a postoperative infarction should have a cardiology consult, have their hypertension and heart rate controlled, be put on aspirin and angiotensin-converting enzyme-inhibitors, and considered for statin treatment. Infarctions are also major life events and can be used as teachable moments52,53 to encourage patients to exercise, eat a healthful diet, and stop smoking. All these opportunities are lost in unscreened patients.54
Troponin then is a valuable screening test that is rarely ordered. N-terminal pro-brain natriuretic peptide falls into the same category.55,56 But what about all the useless tests we order? How about all those coagulation tests in patients with no relevant history? What about all the electrocardiograms in asymptomatic patients, and stress tests that do not even slightly alter management?8 Also consider perioperative normothermia for which there is copious evidence.57 In the United States, most surgical patients are actively warmed, which is effective.58 But the use of effective warming is much less common in other countries—and almost nonexistent in some.
For half-a-century, anesthesiologists have sniggered when internists advised us to “avoid hypoxia and hypotension.” We dismissed the advice about avoiding hypoxemia because we know it is rare intraoperatively. But we also now know that postoperative hypoxemia is common, severe, and prolonged—and that nurses taking conventional vital signs at 4-h intervals miss 90% of it.59
We have not done better with hypotension: in recent years, it has become clear that intraoperative hypotension is far more common than was generally appreciated and that even mild intraoperative hypotension is strongly associated with myocardial injury and death.60,61 Actually, it is even worse: for decades, we fairly uncritically induced deliberate intraoperative hypotension, sometimes essentially for surgical convenience—harming who-knows-how-many patients in the process. And as with hypoxemia, it seems likely that postoperative hypotension is even more harmful. I am afraid that the internists were right, and in arrogantly dismissing their advice to “avoid hypoxia and hypotension,” we missed important opportunities to enhance care.
Nausea and vomiting prophylaxis is another area in which we generally do poorly. Too many clinicians give every patient a single antiemetic, completely ignoring highly compelling evidence and guidelines, indicating that some patients should get none—and that others should get two or more.62 That common and effective22 antiemetics are inexpensive is no excuse since all drugs potentially cause complications.
It is also worth noting the practices for which there was no compelling evidence. Nitrous oxide is an example I mentioned previously. Of course, nitrous oxide is hardly essential and it is perfectly easy to provide general anesthesia without the drug. But that hardly excuses allegedly scientific decisions to completely eliminate the drug in some institutions or to build new hospitals without nitrous oxide piping.
Most importantly, clinicians need to move beyond the operating room because postoperative mortality is orders of magnitude greater than intraoperative deaths. If anesthesiologists do not participate meaningfully in postoperative care, it seems unlikely that we will have any substantive impact. For example, there are already fellowships in perioperative care—offered in internal medicine departments! Hospitalists are also increasingly involved. Should not we be training anesthesiologists in the complete medical care of postoperative patients, rather than just pain management? Postoperative care could even become a board-recognized field, joining existing subspecialties of intraoperative anesthesia, critical care, and pain management.
Similarly, it seems likely that continuous ward monitoring will soon be the standard-of-care since vital signs at 4- to 6-h intervals clearly miss many (and probably most) recue opportunities. We should take the lead in making continuous postoperative monitoring standard and effective. And by effective monitoring, I do not mean simply purchasing devices and installed yet another computer screen in nursing stations to generate a near-constant series of false alarms that are ignored. Instead, I mean establishing integrated systems whereby real-time patient information, with appropriate context, streams to someone who thoughtfully evaluates data and trends and intervenes as necessary to prevent harm. Who better than an anesthesiologist?
Basic scientists, translational investigators, and clinicians all bear some responsibility for postoperative mortality, the major perioperative problem—and all can help ameliorate risk. Basic scientists, I ask you to consider which types of studies are actually likely to ultimately enhance patient care. Please also recognize that clinical research is an expensive and highly limited resource. You can help our specialty by prioritizing findings that are most likely to prove clinically important and therefore worthy of clinical investigation.
Clinical investigators need to stop churning out small fragile trials with results that are about as likely to be wrong as right—and that do not provide useful bounds on treatment effects. For registry-based studies, the problem is bias and confounding rather than fragility; neither can be “solved,” but careful question selection, study design, and analyses reduce the risk of error. What we mostly need, although, is more large randomized trials that provide robust guidance to clinicians.
And finally, clinicians need to stay abreast and implement relevant findings. It is tragic when new knowledge gained at enormous expense and effort fails to enhance care because findings are implemented slowly. But most importantly, clinicians need to “own” postoperative care rather than just managing pain. Providing good analgesia is our responsibility and a laudable goal. But pain is not the primary cause of most postoperative deaths; we need to consider and minimize all causes. Doing so will require that anesthesiologists substantially increase their involvement in postoperative care—that is, really become perioperative physicians.
Our specialty is at a cross-roads. One path is to embrace postoperative mortality and for basic scientists, translational investigators, and clinicians to make a sustained and concerted effort to reduce deaths after surgery—just the way our specialty solved intraoperative mortality. The other path is to declare anesthesia responsibility as ending when patients leave the postanesthesia care unit. I note, although, that the later approach is exactly the same as defining anesthesia as irrelevant to the major perioperative problem, which is postoperative mortality.
If anesthesia is to continue making a meaningful contribution to perioperative care, we can no longer define success by getting patients to the postanesthesia care unit alive. We have largely solved intraoperative mortality, to our credit. But the operating room is no longer where patients die; instead, they die in the days and weeks after surgery. We thus need to be involved when patients actually get into trouble. Specifically, anesthesiologists need to contribute after patients leave the recovery unit. We need to actually become perioperative physicians, not just talk about it as we mostly have for the last decade or longer. This might be a good time for our specialty to remember the immortal words of Rabbi Hillel: “If I am not for myself, who will be for me? But if I am only for myself, who am I? And if not now, when?” When is now.
I recognize that perioperative medicine is a new environment for most of us. It will require new general medical and administrative skills, as well as new practice patterns. Yes, it will be hard; yes, it will require a prolonged commitment from each of us; and yes, our specialty will have to reinvent itself. When the going gets tough—as it will—I hope you will remember the words of Dory Previn:
i can’t go on…
can’t go on;
i can’t go on;
i’ll get up
and go on.
And in doing so, we will enhance care of our patients and reinvent our specialty for the next generation. And like the great in every era, we will leave “footprints on the sands of time.”*
Support was provided solely from institutional and/or departmental sources.
The author declares no competing interests.
From A Psalm of Life, Henry Wadsworth Longfellow, 1838.