OCTOBER 16, 1846, marked a dramatic day in the history of humankind, with the first public demonstration of anesthesia (fig. 1A).1If the reduction of human suffering is medicine's primary goal, it could be argued that anesthesiology has contributed more to humankind than any other field of medicine. In its first issue in 2000, the editors of The New England Journal of Medicine published an editorial on a millennium in medicine, in which they presented the 11 most important advances in medicine in the past 1,000 years.2Anesthesiology was, of course, on the list. However, because information was published in chronological order, anesthesiology did not appear first, its rightful position of importance in my opinion. Although none of us can take credit for this advance, we can clearly be proud of our heritage and what we do every day in reducing pain and suffering for millions of people.
During the past 164 yr, the field of anesthesiology has rapidly progressed, with many developments that have improved the quality and safety of anesthesia care and enabled tremendous advances in the surgical disciplines. During this lecture, I will focus on two “points of inflection” in the field of which I am familiar: the development of noninvasive monitoring of oxygenation (and monitoring standards, in general) and the development of perioperative anesthesia information management systems (AIMS). I believe many of the older members of this audience will agree that there was a significant change in the practice of anesthesiology between 1980 and 1990. I hope to convince you that we are in the midst of another change that will dramatically affect the way we practice in the next decade as we first implement information systems as a routine and then use the data derived from those systems to make another dramatic change in the way medicine is practiced, not only in our field but in other disciplines.
As I review the progress of anesthesiology for greater than the past 150 yr, I see striking similarities in the progress of the aviation industry. This may make even more sense for me. Having a father who was a test pilot and having soloed my first plane at the age of 17 years, I felt a remarkable déjà vu “soloing” my first anesthetic. What do we actually do to patients? (1) We suspend consciousness. (2) We counterbalance painful stimuli. (3) We maintain normal physiology during a planned trauma. (4) We frequently produce nausea and vomiting. We are a lot like pilots. Both of us have a fun job taking people places. We place people in a dangerous situation. We try to make people feel at ease and allay their fears. We really do not provide complete information for consent. If we were to inform the patient of what we actually plan to do to him or her, we would have to tell the patient the following. We will give you a drug that will cause you to stop breathing, and your oxygen concentration will start to decrease. In case you attempt to breathe, we prevent even the slightest possibility of that occurring by giving you a second drug that paralyzes your muscles. Then, during the next critical few minutes, we manage to control your airway and place you “safely” on a ventilator. This is similar to what a pilot would have to say to passengers before starting down the runway. Before the plane takes off, the pilots do not state that the liftoff speed is 180 mph and that if that speed is not reached half-way down the runway the plane will end up in a pile of flames and most likely all will perish. The pilot should also state that during the first few minutes of maximum thrust, if for some reason the plane should lose power, again it will fall to the earth in a pile of flames and most likely all will perish. This type of informed consent, as with detailed information regarding anesthesia, would not help the passengers (or patients) undergo their “flight” at ease. Therefore, neither pilots nor anesthesiologists inform their patients nor passengers with accurate details of what is about to happen. A formal informed consent does not seem necessary because everyone has a general understanding that being up in the air is not safe and could potentially be lethal. The same could be said regarding anesthesia but in a more vague way (i.e. , most patients know that anesthesia is an abnormal state with inherent danger, but the alternative seems much worse).
Next, both anesthesiologists and pilots have a flight plan A and plans B and C, should the unforeseen occur. Again, most of the risk is during the takeoff and the landing (i.e. , the induction and emergence); during the flight, both of us look at electronic devices to see where we are. Finally, we both tend to make some people experience nausea and vomiting.
As with pilots, anesthesiologists have short and long flights, with long ones requiring more planning and preparation. We may be flying young “healthy” planes, or we may be flying elderly planes with more “comorbidities.” We may need to fly under extreme conditions. The advances in anesthesiology in improving safety have given us the ability to care for more elderly patients undergoing more complex surgical procedures.
On December 17, 1903, the Wright brothers (Orville and Wilbur Wright, Dayton, Ohio) made history by flying the first heavier-than-air aircraft (fig. 1B). Approximately 4 yr later, on September 7, 1908, LT Thomas Selfridge climbed aboard an early Wright brothers aircraft with Orville Wright for a test flight as part of an evaluation for a military contract. Several minutes into the flight, a propeller broke and fell to the ground, and the plane soon followed, seriously injuring Orville Wright and making the 26-yr-old lieutenant the first aviation fatality (fig. 2A).3Although this disaster devastated the Wright brothers, there was a thorough investigation by the military that absolved Orville Wright of any blame, noting the crash was the result of a mechanical failure. Ultimately, Orville Wright was awarded the first military contract for $30,000 to further develop a military aircraft.
Similarly, 2 yr after the demonstration of ether anesthesia in Boston, Massachusetts, a 15-yr-old girl named Hannah Greener (who died on January 28, 1848) underwent chloroform anesthesia for the removal of a toenail. According to Thomas Nathaniel Meggison, M.D., who administered the chloroform, the girl did not take the anesthetic well. She died despite all resuscitative efforts, including “dashed water in her face,”“gave her some brandy,” and “opened veins in her arm and jugular” (fig. 2B).4Unfortunately, as anesthesia became more popular and expanded throughout the world, there continued to be significant mortalities. The first intravenous anesthetic, thiopental, was associated with problems when it was first made available in a 5% concentration.5The first major study6of anesthetic mortality noted that anesthesia was associated with a mortality of 1 in 1,560 individuals and was estimated to cause more deaths than polio during the height of the epidemic. Ironically, 50 yr later, in an Institute of Medicine report,7anesthesiology was highlighted as a leader in patient safety and recognized for notably reducing errors by using a “combination of technological advances and standardized equipment.” This reduced anesthesia-associated mortality to approximately 1 in 200,000 individuals.
How did this dramatic improvement in anesthesia safety occur? It was probably started with the efforts of Harvey Cushing, M.D. (a neurosurgeon born in Cleveland, Ohio, in 1869), considered by many as the father of neurosurgery, with his use of an anesthetic record to document pulse and respiration and, later, blood pressure.8For the first time, this allowed tracking of the physiologic course of anesthetic care. During the next 80 yr, anesthesia machines were developed and incorporated vaporizers, gas flow meters, ventilators, and carbon dioxide absorbers. However, the monitoring remained relatively unchanged, with the manual cuff, pulse rate, and auscultation of respirations (fig. 3A). This is what I would refer to as the visual flight rules era of anesthesia, equivalent to that of a 1930s aircraft; pilots were required to have exquisite “clinical skills” to assess the status of the aircraft and navigate simultaneously (fig. 3B). It was not until the early 1980s that the course of anesthesia changed with the nearly simultaneous availability of three monitoring devices: a noninvasive automatic blood pressure cuff, a capnometer, and a pulse oximeter.
In this section, I will briefly review the fascinating history of the development of pulse oximetry; more than any other device, the pulse oximeter signifies the point of inflection in the history of anesthesia. It is a monitor that was developed and promoted by anesthesiologists but has been adopted by all those in acute-care medicine. As with many innovations, the development of pulse oximetry involved a host of individuals. Most would agree that the first functional oximeter was developed by Glenn Alan Millikan, Ph.D. (a physiologist working for the Johnson Research Foundation; University of Pennsylvania, Philadelphia, Pennsylvania) during World War II as part of a series of physiologic experiments to determine when aviators (another aviation connection) would require supplemental oxygen.9Millikan was the son of Robert A. Millikan, Ph.D. (1868–1953), the Nobel Prize–winning physicist and cofounder of the California Institute of Technology, Pasadena.10Per previous data, the color of living tissue could change with the desaturation of hemoglobin and the color change was measured by light absorption or reflection. Millikan demonstrated that this change could be detected by shining light through the earlobe and measuring the change in the transmitted light intensity. Two modifications of the device were required to detect a signal related to arterial hemoglobin. Because light is absorbed by the blood and tissue in the ear, he had to zero the device by squashing the ear to eliminate blood and zero the light transmission to that of bloodless tissue. After the device was zeroed, he then released the pressure to allow blood to return to the ear, but this blood was a combination of arterial, venous, and capillary blood. To obtain a signal that was primarily arterial, he heated the device to 42°C to make the ear hyperemic and thereby arterialized the blood sensed by the oximeter. This device was successfully used in experiments during the next decade and was cited as a clinical monitoring device in Anesthesiology in 1951 by Stephen et al. 11Although the oximeter was useful in detecting desaturation that was undetectable clinically, this early device was difficult to maintain. If left in the same site, it would cause a burn. In 1974, a Japanese electrical engineer, Takuo Aoyagi, Ph.D. (Faculty of Engineering, Niigata University, Niigata Prefecture, Japan), made an insightful observation.12He was working on a technique to noninvasively estimate cardiac output by using a Millikan-type oximeter and intravenous dye. He detected this dye by placing a Millikan ear oximeter on his subjects and then attempted to measure a dye dilution curve as the intravenously injected dye perfused the ear, hoping that the ear blood flow could be related to the total cardiac output. During these experiments, he noted oscillations in the red and infrared signals of the ear oximeter. He came up with the ingenious idea that if he assumed that the pulsatile signal must be arterial blood, he could then derive a signal related to arterial hemoglobin saturation without first calibrating by compressing the ear and then heating the ear. Although he did not publish this as a pulse oximeter, it soon became known as the pulse oximeter because it analyzed the pulsatile light absorption signal in red and infrared light. This idea was soon adopted by Scott Wilber, B.E. (an engineer and founder of Biox Technology, Boulder, CO), and modified by using light-emitting diodes as light sources and photo diodes as light detectors, which allowed for a lightweight clip-on ear or finger probe.13The modern pulse oximeter was developed by an anesthesiologist, Bill New, M.D., Ph.D. (Engineer and Clinical Assistant Professor of Anesthesiology, Stanford University, Palo Alto, California).13He saw the tremendous application of the device in anesthesia and ingeniously decided to make the pulse beep tone change with saturation. With its easy-to-use sensor that needed no calibration to provide beat-to-beat arterial saturation and pulse, this generation of pulse oximeter was greeted with nearly instantaneous acceptance. The first publication documenting the accuracy of the pulse oximeter appeared in Anesthesiology in 1984 by Yelderman and New.14It was only 2 yr later that the American Society of Anesthesiologists (ASA) published standards for monitoring that recommended pulse oximetry.*It is impressive that it was only 2 yr from the introduction of the device to its consideration as a standard of care by the ASA.
I refer to the combination of pulse oximetry and capnography as the “dynamic duo” for acute-care monitoring. Pulse oximetry ensures beat-to-beat oxygen saturation and pulse while capnography ensures breath-to-breath ventilation and pulmonary blood flow (cardiac output). It is difficult to image a life-threatening situation in which these two devices remain in the normal range. I believe that most would agree that the significant reduction in anesthetic-related mortality in the 1990s was because of the routine adoption of oximetry and capnography. Once the value of pulse oximetry was noted by anesthesiologists, it progressively spread to all acute-care areas of medicine, including intensive care units, step-down units, and emergency departments.
Before we leave the historic portion of this lecture, I would like to bring up other analogies between aviation and anesthesiology. Just as W. T. G. Morton, M.D. (1819–1868), a dentist, tried to obtain commercial value from his discovery by attempting to market ether as a new substance (i.e. , “letheon”), the Wright brothers also filed a series of patents on their flying machine in the hopes of obtaining commercial success.15,16Both attempts were to no avail. Morton's attempt to disguise ether as another substance was discovered, disgracing him; although he is well recognized for his public demonstration of anesthesia, he never profited from it and died destitute. The Wright brothers also failed to profit substantially from their efforts. After years of litigation with Glenn H. Curtiss (aviator and founder of the US Aircraft Industry, 1878–1930), the federal government forced a resolution and suspended their patents. This was because World War I was approaching and the litigation was preventing development of aircraft for the war effort.17Again, the Wright brothers are noted in history for developing the first heavier-than-air flying machine but had to settle for a relatively small monetary reward. Other similarities between the two fields are the preflight aircraft walk-around inspection and the pre–take off checklist. These are analogous to the anesthesia machine checkout. As we heard at last year's Rovenstine lecture, given by Peter Pronovost, M.D., Ph.D. (Professor, School of Medicine, Department of Anesthesiology, Critical Care Medicine, and Surgery, Johns Hopkins University, Baltimore, Maryland), preprocedure checklists and recently adopted time-outs, now routine before surgical procedures, can be valuable.18,19Each of these processes was adopted in the aviation industry well before its “discovery” in medicine. The analogies go on and on. Filing a flight plan before the flight has been routine, just as the anesthesia workup with an anesthetic plan is a required part of our practice. Aircraft between the 1930s and 1970s progressively developed in complexity and dramatically increased the number of gauges and monitoring devices to alert the pilot about the status of the aircraft. During anesthesia, starting in the 1980s, the number of monitoring devices progressively increased to allow for close monitoring of patient status. Flight time restrictions on pilots were adopted in 1978, and anesthesia work hour restrictions for trainees were adopted in the United States in 2001.†20The Federal Aviation Agency was established in 1950 to improve safety and apply uniform standards to the aviation industry; the ASA established the Anesthesia Patient Safety Foundation to improve safety in 1986.‡21The National Transportation Safety Board started an aviation accident database in 1962, and the ASA developed a closed-claim investigation task force in 1991.§∥Finally, last but not least, flight simulators for training and certification and anesthesia simulators for training and ultimately certification were developed. Because our fields seem to be so similar, it might be useful to see what the aviation industry has done in the past 25 yr to predict what we most likely will be doing in the next 25 yr.
Anesthesiology Moves into the Digital Age
If you look at the progression of aircraft instrumentation, aircraft became more complex (i.e. , more gauges and dials appeared using simple high- and low-threshold alarms with little integration or intelligence) (fig. 4, A and B). Unfortunately, as the number of gauges increased, the ability for a human to monitor those gauges actually decreased.22Human factors research has demonstrated that when multiple high- and low-alert limits are used from multiple gauges, the normal response is to either silence the alarm limits or place them at such wide thresholds that they become nonfunctional.22If something does happen immediately and multiple monitors are in range at the same time, the alarms become distracting and prevent the pilot from focusing on the most important problems first. Thus, the designers of cockpits integrated alarms and prioritized those alarms; today, aircraft have three screens (i.e. , navigation [radar], primary flight display, and multifunction display) (fig. 5, A and B).#With the advent of global positioning system, the navigator has been replaced by a navigation screen that tracks the whereabouts of the aircraft along its planned course. The primary flight display linked to the multifunctional display produces an integrated system tracking the status of the aircraft and alerting the pilot to issues of concern, whether related to the mechanical function of the aircraft or its flight status.
When looking at a current anesthesia machine, with its monitors and a paper record, it looks much like an aircraft shortly after World War II: many dials with high/low alarms, navigation by pen and paper (the anesthetic record), and no integration of information that could be considered a primary flight display equivalent. With the advent of AIMS, providing electronic “navigation,” integrated monitoring systems, and electronic anesthesia machines, we have the opportunity to mimic the aircraft industry by integrating these information sources. This will allow us to manage patients more specifically by using data from current physiologic monitors and the anesthesia machine and from the patient's medical history and laboratory data to develop the “multifunctional display” and “primary flight display”; in medicine, this is called decision support. There are many examples of automatic decision support being used, such as reminders for antibiotic timing (either pop-up displays on an information system or α-numeric pages), alerts for abnormal laboratory values, and alerts for the potential of awareness during anesthesia.23–25The integration of these multiple sources of data provides us with the opportunity to move into a new era of perioperative care. We may have the opportunity to reduce our anesthetic-related mortality to lower than 1 in 200,000 individuals, but we may also have the opportunity to reduce the postoperative complications (e.g. , myocardial infarction, renal failure, and stroke) by optimizing and individualizing our perioperative care based on patient- and procedure-specific personalized care plans.
AIMS and Outcomes Research
An AIMS is composed of several components: an electronic anesthesia history and physical (H&P) examination findings, an intraoperative record, and procedures and postoperative documentation. In addition, these systems usually have interfaces with the hospital's electronic medical record. They have interfaces with the admission/discharge, the operating room scheduling, and the laboratory systems; in addition, they have an interface to be more functional with the e-mail and paging systems. Originally, when they first appeared, AIMS were called anesthesia record keepers because all they did was replicate the handwritten intraoperative record. Unfortunately, the companies who developed these systems did not survive because there was little value in replacing an inexpensive process (paper and pen) with an expensive electronic system, other than what was considered to be more accurate documentation.26,27For this reason, the electronic medical record, in most institutions, has progressed while most anesthesia departments have stayed with paper. As a society, we have not mandated these systems because they have been expensive and difficult to justify. Although there have been various attempts to develop a return on investment for an AIMS, I believe the most compelling reason to implement an AIMS is that the entire medical record will become electronic and that we (the field known for its advances in technology) should not be left out of the electronic age. Of the components of an AIMS, I believe the most important part is the electronic anesthesia workup (H&P). I hope to demonstrate why this part of an AIMS may allow us, as a specialty, to make a significant contribution to medicine.
An anesthesia H&P, or a preoperative evaluation, is unique among H&Ps in traditional medicine. In the paper world, an anesthesia evaluation is usually one sheet of paper with a variety of boxes that are checked to enable someone to quickly review the key organ systems and assess their status. Our evaluations are quick and focused. A traditional H&P, as we learned in medical school, has the following components: chief complaint, history of present illness, medical history, review of systems, impression, and plan. If you look up H&Ps in the electronic medical record of most institutions, you will find that the H&Ps follow this general format and are dictated and transcribed, therefore providing a readable “story” describing patients' problems and concluding with an impression and plan. The benefit of transcribing this into an electronic data repository enables the H&P to be viewed by multiple people and multiple places at any time. Unfortunately, it is still a text story. It does not allow us to perform outcomes research, which would require specific fields to be completed or picked, as opposed to transcribing text. Attempts have been made to make smart word searches to extract specific comorbidities or conditions from text H&Ps, but these are fraught with problems, as you can imagine (fig. 6). Although the patient in figure 6has a written history of coronary artery disease, it could easily state “patient does not have a history of coronary artery disease” or “father has history of coronary artery disease.” The number of word combinations to try to determine whether this patient has coronary artery disease is endless. Because of the limited value, from a clinical research perspective, of these text electronic records, most large clinical research databases require trained researchers to read the text and extract the pertinent history and comorbidities and enter them into a relational database, which can then be queried. An example of a system like this in our field is the Multicenter Study of Perioperative Ischemia Research; and in surgery, it is the National Surgical Quality Improvement Project (NSQIP).**28These clinical research databases with patient information in queriable data fields have been extremely useful in outcomes research, examples of which I will describe later.
Now, I will contrast an anesthesiology H&P with a traditional medical H&P. For example, a patient is being scheduled for a cholecystectomy. The surgical H&P describes this patient as having the chief complaint of postprandial periepigastric pain. The history of present illness describes this “40-yr-old woman was previously in good health until several months ago, when she noted increasing symptoms of colicky pain after eating fatty foods.” The history of present illness describes in more detail what aggravated the symptoms, what alleviated the symptoms, and what the patient had done about those symptoms. This was followed by a medical history in which the patient states that she took “birth control pills and has occasional back pain for which she takes Aleve.” She also had a history of postpartum depression after her first child, 15 yr ago. She had taken some over-the-counter medications to help curb her appetite. After this descriptive story of chief complaint, history of present illness, and medical history; the surgical workup briefly lists a review of systems. In contrast, we, as anesthesiologists, really do not care about much of this story.
Cholecystitis has been diagnosed, and the treatment plan (cholecystectomy) has been chosen. Therefore, from an anesthesiology viewpoint, we have little interest in how the patient got to the operating room; we only want to know what surgical procedure is planned. On the other hand, we have a serious interest in the review of systems (i.e. , the patient's comorbidities, how they affect the patient's physical status, and how they will affect our plan). Our chief complaint, history of present illness, consists of the following: “has gallbladder, doesn't want it.” The end. We actually do not ask a patient his or her medical history because it is too time-consuming to have the patient describe his or her version of the medical history; it is much more efficient for us to immediately go to a review of systems. In actuality, an anesthesia H&P is a detailed review of systems in which the pertinent history, the pertinent signs and symptoms, and management are documented (fig. 7). What we really do is review the patient's physical status, organ system by organ system, for risk stratification. In fact, we are the original risk stratifiers (i.e. , ASA physical status is the oldest and most recognized risk stratifier).29By the way, who do you think developed ASA physical status? E.A. Rovenstine, M.D. (1895–1960, Emery A. Rovenstine, Professor and Chair, Bellevue Hospital and New York University School of Medicine, New York) with two other colleagues.29Risk stratification is our primary concern as we try to assess the patient regarding suitability to undergo the planned procedure, tests that may be needed to further evaluate organ systems at risk, and how to plan the anesthetic to minimize damage to any organ system, allowing the patient to undergo the procedure as safely as possible. Interestingly, risk stratification (risk adjustment) is a key element of all of clinical outcomes research.28,30It is meaningless to compare outcomes of various groups unless they have been risk stratified. This was the primary conclusion when the Veterans Administration started their surgical outcomes project more than 20 yr ago.30Clearly, patients with more comorbidity would have a worse outcome when undergoing the same procedure. Unfortunately, the clinical databases being developed by other specialties, NSQIP, the society thoracic surgeons, and others (of which there are many) all require manual entry of data extracted from the medical record or an interview with the patient by a trained researcher.28,30This process would be prohibitively expensive to apply to all patients. In addition, because these data fields are designed by research groups in advance and implemented through many institutions, the research data fields are relatively static. They do not change with time (i.e. , they do not add fields regularly because it would be too disruptive to many data collectors at many institutions). On the other hand, anesthesiologists provide this risk stratification with the anesthesia H&P on every patient. If done so in an AIMS with data fields, we are populating our “clinical research database” as part of our standard care. Thus, it is actually “free”; we do this with our preoperative H&P, intraoperative documentation, and postoperative documentation.
As an example, I will describe the first simple study we conducted using our AIMS at the University of Michigan, Ann Arbor. In 2003, while attempting to mask ventilate a patient after the induction agent was given, I asked Richard Han, M.D. (a first-year anesthesia resident at the University of Michigan, 2003) how difficult it was for him to ventilate the patient. As he attempted to explain this to me, we realized it would be easier if we graded his ability to mask ventilate as follows: easy, medium, hard, or impossible. After the case, we reviewed the literature and found an excellent article that had been published on the topic by Langeron et al. 31Langeron et al. had an observer classify 1,502 cases as easy, difficult, or impossible (with specific descriptor definitions of each); the incidence of difficult was 5%, and only 1 of the 1,502 cases was classified as impossible. The independent predictors of difficult mask ventilation were as follows: a beard, body mass index more than 26 kg/m2, lack of teeth, older than 55 years, and history of snoring. We took the idea of Langeron et al. and decided to make the following scale: 0, mask ventilation was not attempted; 1, ventilated easily by mask; 2, ventilated by mask but required an oral airway, relaxant, or another adjuvant; 3, difficult mask ventilation (described as inadequate, unstable, or requiring two providers); and 4, unable to mask ventilate. We modified the definitions of Langeron et al. to make them easier to apply clinically. These choices were placed into the intraoperative record as a drop-down pick list; then, we waited. In 3 weeks, we had accumulated 1,405 cases. We found 1.6% of the cases were difficult by our definition and 1 of 1,405 was impossible, similar to the 1 of 1,502 noted by Langeron et al . We published this as a “Letter to the Editor”32as a scale we thought was more easily used clinically than first proposed by Langeron et al . Now for the impressive part. We waited 1 yr and did nothing but wait for the data to accumulate. At the end of 1 yr, we queried the database for 41,969 cases. This query resulted in 1.5% of the cases being difficult and 0.17% of the cases being impossible. We were able to determine independent predictors of difficult mask ventilation, impossible mask ventilation, and the combination of difficult mask ventilation and difficult intubation.33In addition, of the 28 patients who could not be ventilated, 27 were easily intubated and only 1 required a surgical airway.33,34
This article demonstrates the extraordinary power of using clinical outcomes data from a relational database derived from routine clinical care. To grasp the impact of this type of clinical research, it is useful to compare it with our clinical research accepted standard (i.e. , the prospective, randomized, double-blind, placebo-controlled trial). Randomized controlled trials are considered the strongest evidence because they involve detailed protocols that focus on a specific treatment being studied; in addition, because of their controlled randomized nature, they eliminate other causality and thereby prove a cause-and-effect relationship. In addition, to eliminate a type II error, the studies are sized by a power analysis. Unfortunately, there are significant limitations to randomized controlled trials. (1) As previously stated, infrequent events require many patients, but the cost per patient of running a randomized controlled trial necessitates a relatively small study, again sized (powered) for the effect of the treatment being studied. (2) These trials cannot be powered for the unknown side effects of the treatment. (3) Once a controlled trial has proved the effectiveness of a treatment in a few patients being treated under a detailed protocol (which is not necessarily routine clinical practice), the treatment is generally extrapolated to the population at large. This can lead to substantial problems (i.e. , rofecoxib [Vioxx] and aprotinin are two examples).35–37On the other hand, large clinical databases can be derived from a variety of sources, including AIMS. They have the power of size and the advantage of being clinical practice with all its variation. They do require different statistical tools, such as propensity score matching, that I will briefly review later. They also require objective data gathering for outcomes. In general, outcomes are the weakest part of an AIMS clinical database. Because we see patients at 24 h and many of the outcomes are not evident at that time, many of the important adverse outcomes of interest are not entered into our clinical systems. Therefore, we must rely on other objective outcomes, such as death, cost, troponins, and creatine values.
The lack of a study protocol leads to another opportunity for outcomes research. Because the care in large clinical databases is routine clinical care and not protocol driven, we can study variables that could not be studied in a prospective manner (e.g. , the effect of significant hypotension [blood pressure lower than 70 mmHg] on outcomes in elderly persons). It would be unethical to request Institutional Review Board approval for a study evaluating low blood pressure in the elderly population to assess its effect on outcome; however, in a large clinical database, hypotension is bound to occur and can be studied. Similarly, the wide variation in provider care (lack of protocol) is an asset, not a limitation, of this type of research because the protocol of interest can be “extracted.”
Different Type of Research, Different Type of Statistics: Propensity Score Matching
For those statisticians reading this article, please skip to the next section; for clinicians like me, the following discussion may be informative. We are all familiar with the types of statistical techniques used in prospective, randomized, controlled trials, as previously stated, power analysis to determine sample size and P < 0.05 to assess the significance of the results. When trying to determine the difference between one method of treatment and another, from a retrospective clinical database, how do you determine the control group? The answer is by using a statistical technique called propensity score matching. For example, the article by Karkouti et al. ,37published in 2006, regarding the adverse consequences of aprotinin use in cardiac surgical patients will be reviewed. This article, along with one by Dennis Mangano, M.D., Ph.D. (Professor, University of California, San Francisco; and Founder and Director, The Ischemia Research and Education Foundation, San Bruno, California), was responsible for the use of aprotinin in cardiac surgery being reevaluated. The subsequent large, multicenter, prospective study then resulted in aprotinin being pulled from the market after 18 yr of what was thought to be safe use.36–38
First, Karkouti et al. 37had a large database of patients who underwent cardiac bypass surgery who received either aprotinin or tranexamic acid to reduce perioperative blood loss. All the patient characteristics (i.e. , age, weight, and comorbidities) were significantly different (P < 0.05) between the 586 patients who received aprotinin and the 10,284 patients who received tranexamic acid; clearly, there was a selection bias regarding drug treatment and the outcomes cannot be compared. Second, the next step was to develop an equation that will calculate the probability that an individual patient would be treated with aprotinin. We use the data from all the patients to derive this predictive equation. The equation is derived using the technique referred to as logistic regression, which determines the probability that any individual patient with the listed characteristics will be treated with aprotinin. This predictive model (equation) is tested for its predictive “quality” using a receiver operating curve. If the area under the receiver operating curve is 0.75 or greater, it is considered a good predictive model. By using this predictive model (equation), you can determine a probability (propensity score) for each patient who would have been predicted to be treated with aprotinin (i.e. , each of the patients in both treatment groups will receive a propensity score, a probability score that he or she would have been treated with aprotinin). Third, patients who were treated with aprotinin are matched with those who were not treated with aprotinin (those treated with tranexamic acid, the control group) who had the same propensity scores (i.e. , propensity score–matched patients). In the study by Karkouti et al. , 449 patients treated with aprotinin and 449 patients treated with tranexamic acid were matched 1:1 with the same propensity (probability) scores. When these two groups are compared for patient variables (e.g. , age, weight, and comorbidities), the P values are all >0.05, meaning that these two groups of patients are not significantly different for preoperative characteristics. These propensity score–matched groups are analogous to the data presented as the first table in a prospective randomized study (the table that ensures that the randomization was effective in having similar types of patients in each group [treatment and control]). Once there are two patient groups with the same characteristics, we can look at any statistical difference in outcome. In this study, there was a statistically significant increase in renal dysfunction and renal failure in the patients treated with aprotinin compared with those treated with tranexamic acid. Armed with this technique, retrospective clinical databases can be used to compare treatments and outcomes at low cost compared with prospective studies; with such large numbers, they are able to find rare side effects that could not be found in small prospective studies. In an editorial by Gus Vlahakes, M.D. (Chief of Cardiac Surgery, Massachusetts General Hospital, Boston), which accompanied Mangano's article in The New England Journal of Medicine , he stated that conducting large postrelease clinical database reviews such as these is a new responsibility to ensure treatments that we find effective in smaller prospective studies do not have unrecognized adverse consequences when applied to the general population in clinical practice.39
Unfortunately, despite the tremendous value of large database retrospective research and propensity score matching, these types of studies are still not a replacement for the randomized controlled trial; only prospective randomized trials can truly determine the cause and effect of a treatment. Only large database retrospective reviews can determine side effects and associations. Therefore, each of these types of studies functions synergistically to advance medical knowledge. Retrospective studies can identify rare events and help us design appropriate prospective studies, which can then determine causation. Once these studies are complete and a new therapy or treatment is recommended, it would need to be followed up, ultimately with another retrospective review after that therapy has been applied to the population at large.
Despite the power of retrospective clinical databases, there are significant limitations. First, the variables that are important to the question being asked must be in the database. For example, if someone was reviewing preoperative airway evaluations to predict grade view on intubation, but the preoperative evaluation did not include Mallampati scores, it may not be an effective predictive model (which should be identified by a poor receiver operating curve score). Second, these are clinical data extracted from clinical care, entered by practicing clinicians. Therefore, the data will not be as “clean and accurate” as data collected by a professional researcher focused on a randomized controlled trial. Consequently, it is important that researchers using large databases understand the origin of the data and their limitations and develop techniques to clean the data before using them in an analysis.
These retrospective database queries are not the old paper chart reviews sometimes referred to as “garbage in/garbage out” research. Instead, in the generation of recycling, it is not garbage in but trash in; after appropriate cleaning and analysis, it can give us excellent recycled clinical outcomes research results.
Multicenter Perioperative Outcomes Group and the Anesthesia Quality Institute
During the past decade, it has become clear to academic institutions that have implemented AIMS that the clinical databases we are developing are powerful research tools. Pooling data from these AIMS clinical databases from multiple institutions around the country would provide more data, a broader patient context, and more varied clinical practices. The strengths of outcomes data are the variations in care and not strict adherence to protocols (i.e. , just the opposite of prospective studies, making a nationwide database an even more powerful research tool). In the summer of 2008, nine institutions met in Ann Arbor, to form the Multicenter Perioperative Outcomes Group (MPOG), with the vision of creating a data-sharing organization to enable faculty from multiple institutions to query one large data repository consisting of AIMS data from across the nation. One of the premises of this organization was that it would be inexpensive and that the membership “fee” would be the contribution of 10,000 cases entered into the database. Faculty from all the involved institutions would have the opportunity to request data from the database to answer clinical questions.††The goal of this new type of “open access” research database is to stimulate ideas from the broadest spectrum of clinical researchers/clinicians and thereby accelerate the creation of new knowledge. The MPOG has 35 member institutions and is in the process of writing the necessary software to allow data from the seven most commonly used AIMS vendors to be downloaded into the common MPOG research database. Hopefully, by the end of this year, the first MPOG research studies will be under way, initiating a new era of clinical research.
Coincidentally, in the summer of 2008, the ASA board approved the creation of the Anesthesia Quality Institute (AQI).‡‡This institute was founded to develop a quality reporting database that could be used by all ASA members. Clinical data from a variety of sources (i.e. , individual practices, hospitals, anesthesia billing systems, and AIMS) could be retrieved and could flow into a National Anesthesia Clinical Outcomes Registry. From there, data would flow into the AQI, where analysis could be conducted and outcome quality reports could be generated. This would enable our society to participate in the national quality efforts being proposed/mandated in a relatively cost-effective manner. Within 6 months of its founding, the AQI recruited Richard Dutton, M.D. (Professor, Department of Anesthesiology, University of Maryland School of Medicine, Baltimore). The following year, Rick and I, along with Sachin Kheterpal, M.D. (Assistant Professor, Department of Anesthesiology, University of Michigan), the research director of MPOG, met to outline a working relationship between MPOG and the AQI. Although these are separate organizations, the AQI is a member of MPOG and will be able to request data for quality research studies, as any other MPOG member. We envision each of these organizations growing in a synergistic and collaborative way to enhance the development of new knowledge from an academic perspective and advance quality from a patient care respective. The MPOG will be the high-resolution data repository, including all basic AIMS data, whereas the AQI will focus on the production of quality reports for the ASA membership.
Although the NSQIP, developed by the American College of Surgeons, is a tremendous database with respect to preoperative and postoperative outcomes, it uses an expensive method of collecting data and it only collects a 20% sample of general and vascular surgery patients.28At the University of Michigan, it costs $250,000/yr to collect this 20% sample of these two surgical services (Michigan enters approximately 400 of the 65,000 cases conducted each year). Therefore, the NSQIP model could not be adopted by the AQI and expanded for all of the surgical services and cases. The AQI and MPOG will use the extracts of clinical data from patient care to include all patients who receive anesthesia care. This is more cost-effective and will allow for a constantly growing database (in size and breadth). In addition, one of the core values of the NSQIP is preoperative risk stratification. Again, because of the way anesthesiologists think when they conduct a preoperative evaluation and the way the AIMS preoperative anesthesia H&Ps are developed, a core value of the MPOG database will be the ability to risk-stratify patients and thereby conduct outcome propensity score–matched studies on a grand scale. Because of our unique way of approaching patients as risk stratifiers, I believe we have the opportunity as a specialty to make an important contribution to the future of clinical research. The data we collect will be entered into a relational database because we, as a specialty, think in objective terms, categorizing patients into grading systems and using checklists. We also are somewhat unique in medicine in that we see everyone (both sexes, young and old, healthy and sick). Therefore, we should be able to make a contribution to medicine in the broadest terms. One of the greatest accomplishments of medical research from the past decade was sequencing the human genome.40I believe that anesthesiology, during the next decade, may be recognized for developing the human phenotype as we populate the nationwide perioperative database. This clinical database will be populated with subjects (operative patients) who will be characterized with an array of individual conditions, including age, sex, comorbidities, and medications; and who will all undergo a significant stress test to all of their vital organs (anesthesia and a surgical procedure). This stress test will be like no other. It will stress the cardiovascular, pulmonary, central nervous, coagulation, and immune systems; in addition, it will include pharmacodynamic responses to multiple intravenous medications. No Institutional Review Board would approve such a stress test; the beauty of this stress test is that the data are collected at no additional cost. Imagine a large enough database in which we will be able to characterize, at a granular level, multiple groups of patients and characteristics and develop a profile (phenotype) that will enable us to construct specific perioperative care plans. What we refer to as decision support may soon be referred to as “designer clinical management,” analogous to the designer drugs that are envisioned based on an individual patient's genotype. Again, like many other good ideas, collecting surgical and anesthetic information together for the purposes of outcomes research is not new. In 1934 (before computers), an article was published in Anesthesia & Analgesia on combining anesthetic and surgical records for statistical purposes, by guess who? Yes, again it is E. A. Rovenstine.41
Clinical Decision Support Goes to Fly-by-wire
In 1995, the Airbus 320 was launched as the first fly-by-wire commercial aircraft.42This had long been a requirement for military fighter aircraft to allow them to fly at extreme speeds with high maneuverability and still stay airborne.§§Flying-by-wire refers to the design of the controls of the aircraft; the pilot controls the yoke to guide the plane but that yoke is not connected to any of the ailerons or tabs by mechanics or hydraulics. It is connected to a computer and the pilot uses the yoke to tell the computer which way he or she would like to fly; the computer then processes that information and makes a series of complex changes in the ailerons, rudder, and tabs to make the plane do what the pilot wants it to do. It had become impossible to fly top-performance fighter aircraft safely under the conditions the aircraft was expected to fly. Fly-by-wire has become common on large aircraft (commercial and military). This “driving-by-wire” has crept into the auto industry first with antilock brakes, followed by automatic parking, automatic braking, and crash avoidance systems.43As we use these large clinical databases to develop designer anesthetic plans, it will become too difficult for a practitioner to track those plans for individual patients, with specific comorbidities and medications, undergoing procedures. There will need to be a progression from what we have as pop-up alert decision support to complete an integrated primary flight display and multifunctional display support, as in modern aircraft (fig. 8A). If you think this analogy to the aircraft industry is going too far, look at the cover of the August 2010 issue of Anesthesia & Analgesia , which shows a display for pharmacokinetic modeling in the cockpit of an F-111 aircraft.44The cover description says, “Anesthesiologists are pilots … navigating the patient through profound physiologic trespass … coming to a cockpit near you.”44Early data suggest types of management that are not clinically feasible. Two years ago, Kheterpal published an article45from our institution in which he examined preoperative and intraoperative predictors of postoperative myocardial infarction. One of the findings was that if the median blood pressure measured during 10-min epochs decreased to lower than 60% of the patient's preoperative baseline measurement, the incidence of postoperative myocardial infarction increased.45Calculating the median blood pressure during a 10-min interval continuously in real time during a case and comparing it with 60% of the preoperative blood pressure would be a challenge for even the brightest anesthesiologist. In 2005, Terri Monk, M.D. (Professor, Anesthesiology, Duke University Medical Center, Durham, North Carolina), presented a study reporting the 1-yr mortality related to the area under the curve of bispectral index measurements lower than 45 (cumulative deep hypnotic time).46Measuring the real-time area under the curve would also be challenging. Neither of these calculations would be difficult for the decision-support computer in the AIMS. These types of informatics may aid in our anesthetic plan and, more importantly, assist us in selecting the most appropriate postoperative care plan.
During the next decade, as these data provide more specific decision care plans that are automatically derived when the patient's preoperative comorbidities, medication, procedure, and intraoperative response data are entered, some might suggest that this is the ultimate in “cookbook medicine.” I would say, absolutely! I would suggest that all the best chefs use books, all engineers I know use books, and people who are looking out for the welfare of their patients should also use all the decision support available. Figure 8Bis an attempt at an “avionic screen” for the management of an intraoperative patient. We designed this with the help of the Engineering School at the University of Michigan and incorporated many of the aspects of general anesthesia care and teaching, calculated on a real-time basis.47The heart and lungs move in real time from the physiologic monitor; the estimated cardiac filling is determined by continuous input and output calculations, unless there is an invasive monitor. It highlights organ systems at risk, alerts you when physiologic data are in marginal or dangerous range, and is programmed to detect immediate events (e.g. , malignant hyperthermia and tension pneumothorax). When I have demonstrated this to anesthesiologists, they have said “with this type of support, anyone could provide anesthesia,” and I would say, “with this type of support, anyone could provide safer anesthesia.” I have also heard people remark that this type of support is trying to put us out of a job. My response would be that, despite the fact that the Airbus 320 flown in US Airways Flight 506 in New York had electronic decision-support capabilities, I think all of the passengers would agree they were happy to have Captain Chesley B. “Sully” Sullenberger and copilot Jeffrey Skiles at the front of the plane when the birds hit the fan.∥∥
I would like to acknowledge Jenny Mace (Faculty Staff Associate, Department of Anesthesiology, University of Michigan Health System, Ann Arbor, Michigan) for her extensive assistance in developing this lecture and manuscript, especially for her efforts in finding the historical photos, which I feel added greatly to the presentation.