A surgical scoring system, akin to the obstetrician's Apgar score, has been developed to assess postoperative risk. To date, evaluation of this scoring system has been limited to general and vascular services. The authors attempt to externally validate and expand the Surgical Apgar Score across a wide breadth of surgical subspecialties.
Intraoperative data for 123,864 procedures including all surgical subspecialties were collected and associated with Surgical Apgar Scores (created by the summation of point values associated with the lowest mean arterial pressure, lowest heart rate, and estimated blood loss). Patients' death records were matched to the corresponding score, and logistic regression models were created in which mortality within 7, 30, and 90 days was regressed on the Apgar score.
Lower Surgical Apgar Scores were associated with an increased risk of death. The magnitude of this association varied by subspecialty. Some subspecialties exhibited higher odds ratios, suggesting that the score is not as useful for them. For most of the subspecialties the association between the Apgar score and mortality decreased as the time since surgery increased, suggesting that predictive ability ceases to be helpful over time. After adjusting for the patient's American Society of Anesthesiologists classification, Apgar scores remained associated with death among most of the subspecialties.
A previously published methodology for calculating risk among general and vascular surgical patients can be applied across many surgical services to provide an objective means of predicting and communicating patient outcomes in surgery as well as planning potential interventions.
What We Already Know about This Topic
The Surgical Apgar Score, a simple 0–10 score based on blood loss, lowest blood pressure, and lowest heart rate during surgery, predicts mortality in general and vascular surgery patients.
What This Article Tells Us That Is New
In a review of more than 120,000 patients, the Surgical Apgar Score correlated with risk of death across many subspecialties, although the strength of the correlation varied.
CLINICIANS have a need for predictive tools to assess perioperative risk. Several algorithms have been used or developed for risk stratification such as the American Society of Anesthesiologists Physical Status Classification System (ASA classification), the Physiologic and Operative Severity Score for enUmeration of Mortality and morbidity (POSSUM), the Acute Physiology and Chronic Health Evaluation (APACHE), and the Simplified Acute Physiology Score (SAPS). However, each of these systems has limitations and restricted uses. The ASA classification was initially intended as a means to stratify a patient's systemic illness but not postoperative risk, and it has been criticized for being too simple. It does not account for the age of the patient, nature of the surgical procedure, anesthetic technique, competency or training level of the surgical team, or duration of surgery. Although the ASA classification has proved to be a predictive preoperative risk factor in mortality models, its subjective nature and inconsistent scoring between providers make it less than ideal for performing evidence-based postoperative risk calculation.1–4The POSSUM, APACHE, and SAPS and their later derivations (portsmouth POSSUM, colorectal POSSUM, APACHE II and III, and SAPS II) are more accurate and objective predictive algorithms, but not all of the variables needed are easily and consistently attainable in an operating room setting, making them more practical in their initially intended role as critical care auditing tools rather than predictive tools.5,6
Recently, Gawande et al. designed a more ideal postoperative scoring system, the Surgical Apgar Score, patterned after the Apgar obstetric scoring system. Similar to the obstetric scoring system, the Surgical Apgar Score uses a three-item [estimated blood loss (EBL), lowest intraoperative heart rate (HR), and lowest intraoperative mean arterial pressure (MAP)] aggregate to risk-stratify patients undergoing surgery in the postoperative setting.7Gleaned from an initial dataset of 28 variables, these three values were each found to be independent predictors of outcomes. Strengths of the Surgical Apgar Score include the ability to calculate the score quickly and objectively. The provider could then anticipate the need for further or more aggressive interventions or use the data in benchmarking centers by predicted versus observed scores. Ultimately, the score may also prove useful in guiding preventive strategies such as optimizing intraoperative heart rate or blood pressure. To date, Gawande's Surgical Apgar Score has been studied in colorectal surgery, vascular surgery, and certain gynecologic and urologic procedures.7–11
The burgeoning literature on the Surgical Apgar Score also identifies potential weaknesses of the scoring system. For example, calculation of the score relies on EBL, which critics have often tagged as imprecise. However, previous studies have shown that the broad categories used to calculate the Surgical Apgar Score (0–100 ml, 101–600 ml, 60–1,000 ml, >1,000 ml) are easily within observers' range of precision.12,13Another hypothetical weakness lies in the fact that intraoperative hemodynamics may be affected by anesthetic medications and interventions such as induction and intubation, and therefore alter the computation of the Surgical Apgar Score. For example, a transient episode of hypotension associated with anesthetic induction would be treated the same as prolonged hypotension and result in a lower (worse) Surgical Apgar Score. On the other hand, a transient bradycardic episode would contribute to a higher (better) score. Nevertheless, several studies demonstrate that persistent heart rate elevation and hypotension are strongly associated with poorer outcomes, regardless of their cause.14–16Finally, other potentially predictive perioperative variables, such as coronary artery disease, intraoperative blood transfusion, ASA class, sex, volume of intravenous fluids administered, patient age, surgical time, functional status, renal function, and chronic steroid use are excluded from the Surgical Apgar Score. The exclusion of these potentially predictive preoperative risk factors could be construed as a weakness of the score. However, as previously mentioned, an important aspect of the usefulness of the Surgical Apgar Score is its simplicity.
Using data collected through our electronic Perioperative Information Management System from January 1, 2006 to December 31, 2009, we attempt to externally validate and expand the Surgical Apgar Score as a means for predicting postoperative mortality across a wide range of surgical subspecialties and to determine whether this score provides additional information beyond that of the patient's ASA status.
Materials and Methods
Perioperative data for 123,864 procedures including all surgical subspecialties (as defined by the primary service of the attending surgeon) were collected with an electronic Perioperative Information Management System (VPIMS; Acuitec LLC, Birmingham, AL) from January 1, 2006 to December 31, 2009. The data are stored in a perioperative data warehouse, using Microsoft SQL server technology (Microsoft Corporation, Redmond, WA). Preoperative and postoperative data were excluded. All data were collected during normal operations and were retrospectively reviewed. As a matter of routine care, mortality data are obtained from the US Social Security Death Index and linked to patients in the database using both the social security number and the date of birth. Both items have to match to be recognized as a correct linkage. In this study there were eight patients who had data indicating a date of death before the surgery date; when these cases were investigated, it was discovered that they were incorrectly identified in the perioperative electronic record. This error rate is approximately 1 in 10,000 cases. As the queries performed for the study resulted in deidentified information only, the study met criteria for nonhuman research and was performed with approval by our institutional review board (Vanderbilt University Human Research Protection Program, Nashville, TN).
From the captured intraoperative data, estimated blood loss, blood pressure, and heart rate were analyzed. If EBL was not recorded or was defined as “minimal,” it was assumed to be less than 100 ml. MAP was derived from the electronically captured invasive or noninvasive blood pressures, with preference for invasive, unless there was a manually entered provider-override value. HR was derived in order of preference from the provider's manually-entered HR, the electrocardiogram, or the pulse oximeter, depending on availability. As defined by the original article by Gawande et al. 7on the Surgical Apgar Score, point values associated with the lowest MAP, lowest HR, and EBL were summated to produce the Surgical Apgar Score (table 1).
Cases recorded as ASA 6 were excluded to remove organ donor cases. Patients younger than 18 yr old were also excluded. Death information was verified before September 1, 2009; thus, any patient whose surgery occurred after this date was excluded (9.8%). Fifteen subjects with invalid data such as negative or null days to death were also excluded. All remaining cases recorded in the database were initially included in the data export, but after further review of the specific subspecialties 9 were identified as nonsurgical (e.g. , epidurals and bronchoscopies, n = 397), and 7 were associated with low-volume community physician groups that do not regularly operate at our institution but infrequently perform a case for various reasons (n = 1,040). Data from these procedures, along with those that were listed as “unknown” (n = 8,426; unknown cases represent labor and delivery epidurals, offsite procedures in an ambulatory surgery center, bedside cases in the intensive care units, and sedation for gastrointestinal and radiology procedures) were excluded from this analysis (8.8%). See table 2for a complete breakdown of the omitted categories.
Patient characteristics and surgical summaries (year and operating room duration) were tabulated across the entire dataset. Categoric variables were represented as percentages and counts whereas continuous variables were summarized by the 5th, 25th, 50th (median), 75th, and 95th quantiles. The patient characteristics were also divided by survey year to determine whether the patient population varied by year (data not presented but available from the authors).
Table 3summarizes the patient characteristics. Within this adult population, the median (interquartile range) age was 51 yr (38–63 yr). Most of these patients were classified as either ASA 2 or 3 (42.6 and 40.3%, respectively). Mortality after 7, 30, and 90 days was 0.6%, 1.5%, and 2.7%, respectively.
To quantify the association between the Surgical Apgar Score and mortality, a series of regression models was developed. For each surgical subspecialty, three logistic regression models were estimated in which the Surgical Apgar Score was separately regressed on a 7-, 30-, or 90-day mortality. Seven-day mortality was defined as a binary variable that equaled 1 if the subject died within 7 days after surgery and 0 otherwise. Similar variables were calculated for 30 and 90 days. The Surgical Apgar Score was included in the model as a continuous variable, which assumes that there is a linear relationship between the score and the log odds of death. This assumption was assessed by fitting additional logistic models in which the Surgical Apgar Score was modeled with restricted cubic splines and comparing the models using the likelihood ratio test. The unadjusted odds ratio (OR) and its 95% confidence interval (CI) were then reported for each subspecialty. The OR estimates correspond to the fold-change in the odds of 7-, 30-, or 90-day mortality for each unit increase in the Surgical Apgar Score. As the OR was used to represent the measure of association, values closer to 1 are consistent with weaker associations and values farther away from 1 (either closer to 0 or greater than 1) are consistent with stronger associations. These estimates were then transformed to determine the probability of death at each day of interest and are displayed graphically.
It was also of interest to determine whether the Surgical Apgar Score provided additional information beyond that of a patient's ASA physical status. To address this, a second set of regression models was created in which ASA status was added as a covariate. ASA status was modeled as a linear term, and interactions with the Surgical Apgar Score were not included because of a lack of power among some of the subspecialties. The adjusted estimates associated with the Surgical Apgar Score were transformed and reported in a manner similar to that previously described. All analyses were performed in R version 2.11 (Vienna, Austria).17
Table 4summarizes the distributions of mortality and Surgical Apgar Scores by surgical subspecialty. For example, there were 1,558 surgeries involving burn patients. Of these patients, 2.7% died by day 7 and 5.8% by day 30. Approximately 2% had a Surgical Apgar Score ≤2, and nearly 10% had a score of ≥9. The most common subspecialties included orthopedic sports/hand (11%), urology (10%), orthopedic trauma (8%), general surgery (8%), and neurosurgery (7%). Mortality rates ranged from 0% (renal, day 7) to 10.3% (burn, day 90) and were the highest among burn, cardiac, emergency, trauma, and vascular patients. These rates were generally low within the first 7 days after surgery and remained low through day 90 in a few of the subspecialties (ophthalmology, oral, and renal). However, analyses of these subspecialties were still performed despite being underpowered to facilitate interpretation of the association between the Surgical Apgar Score and mortality. The distributions of Surgical Apgar Scores were similar across surgical subspecialties in that most of the scores were between 6 and 9, but the frequencies of low Surgical Apgar Scores (≤2) varied slightly. For example, low Surgical Apgar Scores were especially prominent among patients undergoing either liver transplantation or trauma procedures.
Lower Surgical Apgar Scores were associated with an increased risk of death (table 5). For example, the probability of death by day 90 was 32% (fig. 1B, 95% CI: 0.23–0.42) among cardiac patients with a Surgical Apgar Score of 2 and only 4% (0.03–0.05) among those with a score of 8. The magnitude of this association varied by subspecialty. Only a weak relationship was noted between Surgical Apgar Score and death among burn patients.
For each unit increase in Surgical Apgar Score, the unadjusted risk of death by day 7 decreased by nearly 50% among those undergoing vascular surgery (table 5, 95% CI: 0.44–0.60) but only decreased by 30% among emergency general surgery patients. Similar patterns were observed when modeling death at days 30 and 90. For most of the subspecialties the association between the Surgical Apgar Score and mortality generally decreased as the time since surgery increased (e.g. , neurosurgery: 0.59 to 0.71 to 0.81 OR at days 7, 30, and 90), suggesting that at some future point postoperatively the predictive ability ceases to be helpful. However, the relationship between the Surgical Apgar Score and mortality day was also maintained in a few of the subgroups (e.g. , cardiac, general surgery, and gynecology) and even strengthened over time in others (e.g. , urology). This may simply be an artifact because of sparse data among certain subspecialties, but it could also indicate that the Surgical Apgar Score is a better predictor for early death in some subspecialties and later death in others. After adjusting for ASA status, similar associations, although some were attenuated, were observed between the Surgical Apgar Score and mortality.
Figure 1summarizes the relationship between the Surgical Apgar Score and the probability of death (at days 7, 30, and 90) for several subspecialties. See figure, Supplemental Digital Content 1, http://links.lww.com/ALN/A739, for additional subspecialties. The points correspond to the estimated probability of death, whereas the vertical bars correspond to the 95% CI. Points and bars were omitted if there were less than 10 observations for a given Apgar value, whereas only the bars were omitted if there were at least 10 observations but no deaths. Among cardiac patients, there were no Surgical Apgar Scores of 0 (hence no points or bars) and only four patients with a Surgical Apgar Score of 2 (but no deaths at day 7, thus no bar).
Our analysis of the risk of postsurgical death at 7, 30, and 90 days expands on the original work published by Gawande et al. and others by performing the analysis of the Surgical Apgar Score across all major surgical subspecialties at an academic medical center. As also shown by Regenbogen et al ., we have established that the predictive value of the Surgical Apgar Score remains valid at an institution outside of where it was developed and that it can be derived from electronic records.8In addition, we have demonstrated that the magnitude of the relationship of Surgical Apgar Score to the risk of postsurgical death at 7, 30, and 90 days varies by surgical subspecialty and that it contains information above and beyond that of the ASA metric. This difference between subspecialties is important to note and may occur because comorbidities and potential causes of death vary among the subspecialty populations. For example, major cardiac events in vascular patients may be well predicted by the Surgical Apgar Score as opposed to sepsis in burn patients.
The Surgical Apgar Score remains a simple, easily calculable score immediately postprocedure for assessing postoperative risk of death. The score may have usefulness in several areas. For example, during the handoff process (the communication between physician services or physician and nursing team members) it can signal the provider taking over care to the overall risk the patient is facing and may indicate the need for additional care measures to minimize risk. The Surgical Apgar Score could be incorporated into electronic documentation packages for real-time calculation either during or at the end of surgery, providing an automated warning to clinicians. This prognostic value may alert the provider that additional diagnostic testing (arterial blood gases, serum lactate, or hematocrit determinations), further resuscitation, one-on-one nursing, or more invasive monitoring is indicated. In fact, several proven risk modification strategies (such as deep breathing and thoracic epidurals for high risk pulmonary patients, or intraoperative β-blockade for vascular procedures in patients with coronary artery disease) exist in the literature, which suggests that early identification of high-risk patients and implementation of risk modification strategies can decrease hospital stay and mortality.18–20Frequently, decisions to transfer patients to intensive care units are based on clinical impressions rather than hard data. Although improving outcomes through postoperative interventions based on the Surgical Apgar Score is only speculative at this point, the score does provide an objective adjunct to facilitate discussions of the surgeon, anesthesiologist, and intensive care unit physician in determining the need for these further or heightened postoperative care strategies.
Beyond immediate patient care issues, quality improvement initiatives may also be augmented with data from the Surgical Apgar Score. In the event detection process, quality officers may choose to select cases with a low Surgical Apgar Score for additional screening and possible analysis for peer review or other improvement processes. Furthermore, focusing attention on decreasing blood loss, decreasing maximum heart rate, and avoiding hypotension, the three factors that would increase a patient's Surgical Apgar Score, may lead clinicians to create quality and safety systems that are designed to reducing risk through preventing low Surgical Apgar Scores.
Unlike some of the previous studies of the Surgical Apgar Score, our study was limited to all-cause mortality as the primary endpoint. The decision to exclude secondary or surrogate endpoints such as major cardiovascular morbidity was made for several reasons: the difficulty in determining major negative outcomes across a wide variety of surgical services, the subjective and inconsistent reporting of such, and the cost-prohibitive nature of scrutinizing more than 100,000 charts. Furthermore, as shown in the PeriOperative Ischemic Evaluation (POISE) trial, there is literature suggesting that surrogate endpoints may not accurately predict mortality and therefore are not recommended for studies in which the aim is to reduce mortality.21,22,18As such, our findings that the Surgical Apgar Score predicts risk of postoperative death across a wide variety of surgeries may not be applicable to the prediction of other postoperative complications across the same surgical services.
In addition, as described in previous studies, the Surgical Apgar Score is dependent on the preoperative physical condition of the patient. Patients who are hypotensive and tachycardic preoperatively are likely to have poorer Surgical Apgar Scores, potentially based on their preoperative condition rather than the success or difficulty of surgical intervention. This study's population, for example, comes from a large academic referral institution and likely treats patients who have more significant comorbidities than in an average community hospital. As such, the Surgical Apgar Score has not been validated in and of itself to compare physicians or institutions.
The original model of Gawande et al. was kept simple so that a human could compute the score. Although the simplicity of the original model is reasonable and in fact a major point of the Surgical Apgar Score, the broad adoption of automatic perioperative information systems could facilitate a potentially more complex but improved model. The additional complexity would be acceptable (if needed) because the score could be computed in real time using the computer. Furthermore, although the original Surgical Apgar Score is now validated across a wide range of surgical subspecialties, it may be possible that the algorithm could be modified for better prediction among each subspecialty. Indeed, Gawande et al . developed the score for vascular and general cases, and our data suggest that the Surgical Apgar Score is a better predictor of mortality in those subspecialties than it is in, for example, burn excision surgery.
Future work should be directed toward improving the Surgical Apgar Score and examining the usefulness of the score in guiding intraoperative techniques and postoperative interventions, such as intensive care unit admission or other escalation in diagnosis or therapy. As mentioned previously, a potential weakness of the algorithm used to compute the score is that it treats transient and prolonged heart rate and blood pressure fluctuations alike. The score could potentially be improved by excluding for the period surrounding induction, or adding a time factor to the heart rate and blood pressure parts of the algorithm. Furthermore, it would be beneficial to note how the Surgical Apgar Score compares with other current studies relating intraoperative cardiovascular and anesthetic patterns to longer term postoperative outcome. For example, a recent retrospective analysis by researchers at the Cleveland Clinic, Cleveland, Ohio found that patients who simultaneously experience low-normal MAP, bispectral index score, and end-tidal volatile anesthetic concentration have nearly triple the risk for 30-day mortality as those whose numbers are higher. The negative outcome was especially common in patients who spent more than 20 min in a triple-low state, and was ameliorated by vasopressors.#**Perhaps a combination of the two algorithms (HR, MAP, EBL, end-tidal volatile anesthetic concentration, and bispectral index score) would result in a more powerful predictive model. Both the Surgical Apgar Score and the triple-low state seem to suggest that patients with autonomic dysfunction, i.e. , fragile patients, have poorer outcomes. The two scores seemingly justify and explain the benefit of already-proven therapies such as maintaining a baseline MAP sufficient for end-organ perfusion, avoiding tachycardia with appropriate analgesia and β-blockade, and minimizing the preoperative fasting period with a carbohydrate-rich drink.18,23Future work with these types of predictive models could provide insight into other beneficial intraoperative techniques and postoperative interventions. In so doing, evidence-based protocols may be developed that could potentially decrease morbidity, mortality, and costs.