Estimating surgical risk is critical for perioperative decision making and risk stratification. Current risk-adjustment measures do not integrate dynamic clinical parameters along with baseline patient characteristics, which may allow a more accurate prediction of surgical risk. The goal of this study was to determine whether the preoperative Risk Quantification Index (RQI) and Present-On-Admission Risk (POARisk) models would be improved by including the intraoperative Surgical Apgar Score (SAS).
The authors identified adult patients admitted after noncardiac surgery. The RQI and POARisk were calculated using published methodologies, and model performance was compared with and without the SAS. Relative quality was measured using Akaike and Bayesian information criteria. Calibration was compared by the Brier score. Discrimination was compared by the area under the receiver operating curves (AUROCs) using a bootstrapping procedure for bias correction.
SAS alone was a statistically significant predictor of both 30-day mortality and in-hospital mortality (P < 0.0001). The RQI had excellent discrimination with an AUROC of 0.8433, which increased to 0.8529 with the addition of the SAS. The POARisk had excellent discrimination with an AUROC of 0.8608, which increased to 0.8645 by including the SAS. Similarly, overall performance and relative quality increased.
While AUROC values increased, the RQI and POARisk preoperative risk models were not meaningfully improved by adding intraoperative risk using the SAS. In addition to the estimated blood loss, lowest heart rate, and lowest mean arterial pressure, other dynamic clinical parameters from the patient’s intraoperative course may need to be combined with procedural risk estimate models to improve risk stratification.
Both the Risk Quantification Index and Present-On-Admission Risk predicted mortality well. Adding the Surgical Apgar Score did not substantively improve predictions.
The Risk Quantification Index and Present-On-Admission Risk Index predict postoperative mortality based on administrative data only
The Surgical Apgar Score estimates risk from estimated blood loss, lowest heart rate, and lowest mean arterial pressure
It remains unknown whether adding intraoperative details (which are harder to obtain) to administrative data improves predictions by either model
Both the Risk Quantification Index and Present-On-Admission Risk Index predicted mortality well
Adding the Surgical Apgar Score did not substantively improve predictions
ESTIMATING surgical risk is critical for both preoperative and postoperative decision making. There is a growing need for more accurate risk stratification with the adoption of new payment methodologies, such as population health management, bundled payments, and value-based purchasing. Specific indices have been described for surgical risk, which include the Risk Stratification Indices (RSIs),1 the Risk Quantification Indices (RQIs),2 the Present-On-Admission Risk (POARisk) model,3 and models from the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP).4 Although the RSI and RQI models have been previously validated,5 the POARisk model has not yet been validated, and the ACS NSQIP models have not been sufficiently described to permit external validation (Karl Y. Bilimoria, M.D., M.S., F.A.C.S., Division of Research and Optimal Patient Care, American College of Surgeons, 633 N. St. Clair St., 22nd Floor, Chicago, Illinois 60611, personal e-mail communication, May 2014). In addition, it is unclear whether these procedural risk estimate models that use administrative data could be improved by including dynamic clinical parameters from the patient’s intraoperative course.
The Surgical Apgar Score (SAS) uses estimated blood loss (EBL), lowest heart rate (HR), and lowest mean arterial pressure (MAP) in calculating value on a 10-point scale that is predictive of surgical outcomes.6 These routinely available intraoperative data can provide an objective means of measuring and communicating patient risk from surgery. The SAS has been validated in multiple patient populations7–10 and is used to predict mortality, morbidity, and intensive care unit admission. The predictive performance of this validated intraoperative risk model has not been characterized in a general surgical population when used in conjunction with preoperative risk indices.
The goal of this study was two fold: to externally validate the POARisk model and to determine whether the preoperative risk estimates would be improved by incorporating intraoperative risk estimates by evaluating the performance of the RQI and POARisk models with and without the SAS.
Materials and Methods
Patient Population and Data Collection
This study received institutional review board approval from the Vanderbilt University Human Research Protection Program in Nashville, Tennessee. For validation of the RQI and POARisk risk-adjustment models, the authors identified adult (18 yr old or older) patients who were admitted after noncardiac surgery from the Vanderbilt Department of Anesthesiology Perioperative Data Warehouse (PDW) between 2008 and 2013. The PDW is a secure, centralized data warehouse that contains patient encounter data from multiple information systems across Vanderbilt University Medical Center. The PDW links patients to the National Death Index, a central index of death record information maintained by the National Center for Health Statistics division of the Centers for Disease Control11 and also contains in-hospital mortality data. After identifying the patient population of interest, we obtained intraoperative vital signs, EBL, mortality endpoints, diagnosis and procedure codes, and present-on-admission indicators from the PDW. We then reviewed the data set and excluded cardiac cases by surgical service. The following services were excluded: adult cardiac, cardiac, and electrophysiology service. We additionally checked for pediatric cardiac cases and identified that those were excluded as well due to filtering by patient’s age 18 yr old or older.
Surgical Apgar Score
The SAS is a 10-point score to rate surgical outcomes.6 It is calculated from the EBL, HR, and MAP during an operation. EBL is scored as 0 to 3 points, assigned for values of more than 1,000 ml, 601 to 1,000 ml, 101 to 600 ml, and 100 ml or less, respectively. Lowest MAP is scored from 0 to 3 points, for values less than 40 mmHg, 40 to 54 mmHg, 55 to 69 mmHg, and 70 mmHg or greater, respectively. Lowest HR is scored from 0 to 4 points, for values greater than 85 beats/min, 76 to 85 beats/min, 66 to 75 beats/min, 56 to 65 beats/min, and 55 beats/min or less, respectively. The SAS has been validated in multiple settings.7–10 Data for calculation of the SAS were obtained from the PDW, which contains patient vital data measured on a regular basis and recorded as frequently as every 30 s. We determined lowest HR and MAP and assigned points for each component of the SAS. Surgical score was calculated as a sum of the points for each category in the course of a procedure; for example, the score for a patient with 50-ml blood loss (3 points), a lowest MAP of 80 (3 points), and a lowest HR of 60 (3 points) would have been 9. By contrast, a patient with 1,000-ml blood loss (0 points), an MAP that dropped to 50 (1 point), and a lowest HR of 80 (1 point) would receive a score of 2.
We addressed the issue of possible artifacts in the measurements of our variables using thresholds corresponding to values that could reasonably be physiologic. Specifically, to reduce artifact, HR values outside the range of 15 to 200 beats/min were discarded. MAPs outside of the range of 25 to 180 mmHg were interpreted as artifact and not used for the computation of the SAS.
Risk Quantification Index is a risk-adjustment model for 30-day mortality and morbidity. The model includes the Current Procedural Terminology (CPT) code of the performed primary procedure, American Society of Anesthesiologists (ASA) physical status classification, and age (for mortality) or hospitalization (inpatient vs. outpatient, for morbidity). To compute the RQI for 30-day mortality, CPT codes corresponding to patients’ primary procedure were assigned weights and combined with ASA physical status classification and age. We further expanded the original RQI methodology by including primary scheduled procedure codes into the analysis. A primary scheduled procedure was identified as a scheduled procedure with the highest relative value unit per case. The SAS was added to create a risk model that combined patient, procedural, and intraoperative physiological factors. The SAS and a univariable score measuring procedure-associated risks (Procedural Severity Score [PSS]) were calculated using published methodology.2 The dalton.rqi R package available at the RQI Web site12 was used to calculate PSS.
POARisk is a risk-adjustment model for in-hospital mortality among inpatients undergoing one or more procedures. It was derived and validated using hospital discharge data from the California State Inpatient Database, specifically International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) present-on-admission diagnoses, principal procedures, and secondary procedures occurring before the date of the principal procedure (POARisk). We calculated this using published methodology combined with patient age and sex.3 The POARisk statistical analysis software macro available at the POARisk Model Web site13 was used to calculate POARisk.
10,000 bootstrapped samples were simulated for each of the models: RQI and POARisk. First, multivariate logistic regression with PSS, ASA physical status classification, and age was used to show potential predictive ability of 30-day mortality. The results obtained using the RQI models were compared with those obtained using a similarly derived model that included the SAS. Multivariate logistic regression with POARisk, age, and sex was used to show potential predictive ability of in-hospital mortality. The results obtained using the POARisk models were compared with those obtained using a similarly derived model that included the SAS. Discrimination was compared by the area under the receiver operating curves (AUROCs) (c-statistics). Calibration was compared by the calibration plots and Brier score. Relative quality was measured using Akaike information criterion (AIC) and Bayesian information criterion (BIC). Bootstrap with 10,000 replications was used for bias correction. Statistical programming was implemented in Statistical Analysis Software 9.4 (SAS Institute Inc., USA) and R (version 2.9.2, R Core Team; R Foundation for Statistical Computing, Austria).
A total of 44,835 noncardiac surgical encounters were identified with data required for the RQI. Patient characteristics are shown in table 1. The 10 most common CPT codes are shown in table 2. Summary of the prediction models’ performance for 30-day mortality, including Brier score, AIC, BIC, and AUROC, is shown in table 3. The SAS alone was a statistically significant (P < 0.0001) predictor of 30-day mortality. An association between the SAS and PSS (Pearson correlation ρ = −0.1199) was identified. The AUROC for SAS was 0.64, 0.8433 for RQI 30-day mortality based on the primary performed procedure, and 0.8422 for RQI 30-day mortality based on the primary scheduled procedure. Inclusion of the SAS improved model discrimination to 0.8529 and 0.8524, respectively. ROC curves for comparisons of the original RQI 30-day mortality with and without the SAS are shown in figure 1. Calibration plots are shown in figures 2 and 3. AIC and BIC for RQI 30-day mortality were 5939 and 6005, respectively, which changed with inclusion of the SAS to 5818 and 5893, respectively. The Brier score for RQI 30-day mortality was 0.02550, which changed with inclusion of the SAS to 0.02482.
A total of 110,273 noncardiac surgical encounters with data required for the POARisk were identified. Patient characteristics are shown in table 4. The 10 most common diagnosis and procedure ICD-9-CM codes are shown in table 5. Summary of the prediction models’ performance for in-hospital mortality, including Brier score, AIC, BIC, and AUROC, is shown in table 3. SAS alone was a statistically significant (P < 0.0001) predictor of in-hospital mortality. An association between the SAS and POARisk (Pearson correlation ρ = −0.1357) was identified. AUROC for the SAS in this data set was 0.63, with an AUROC of 0.8608 for POARisk in-hospital mortality. Inclusion of the SAS improved model discrimination to 0.8645. ROC curves for comparisons are shown in figure 4. Calibration plots are shown in figures 5 and 6. AIC and BIC for POARisk in-hospital mortality were 17920 and 17956, respectively, which changed with inclusion of the SAS to 17592 and 17637, respectively. The Brier score for POARisk in-hospital mortality was 0.03356, which changed with inclusion of the SAS to 0.03324.
We externally validated the POARisk model and found that neither the RQI nor the POARisk preoperative risk estimates were meaningfully improved by incorporating intraoperative data in the form of the SAS. We confirmed previous studies that have also shown that SAS alone is a statistically significant (P < 0.0001) predictor of both 30-day and in-hospital mortality. Importantly, the inclusion of the SAS did not substantially improve either of the multivariate models, as demonstrated by the AIC and BIC values, discrimination ability, and calibration.
The SAS is a good predictor of surgical outcomes.8 It is based on routinely available intraoperative data and provides simple objective means of measuring and communicating patient risk from surgery. The SAS has been validated in multiple patient populations7–10 and used to predict mortality, morbidity, intensive care unit admission, and hospital readmission. We expected that the SAS would significantly improve patient risk stratification, as it adds information regarding the hemodynamic course of the procedure through HR and blood pressure parameters, and the actual invasiveness of the procedure via stratification of EBL. Based on our findings, adding these data did not meaningfully improve predictions compared with the performed procedure, patient age, and ASA physical status combined (RQI). Expanding the RQI by adding the scheduled procedure resulted in a minor performance improvement. Similarly, although SAS was a statistically significant (P < 0.0001) predictor of in-hospital mortality, it did not meaningfully improve predictions compared with the present-on-admission diagnoses, principal procedures, and previously performed (POARisk). We speculate that procedures with higher rates of in-hospital mortality are also associated with hemodynamic derangement and greater EBL and thus this intraoperative information is already included within the preprocedural risk estimates. This is supported by the correlation observed between the SAS and PSS. It is possible that including vital signs besides lowest MAP, lowest HR, and estimated amount of blood loss along with laboratory data could further improve model performance, but this has not yet been demonstrated.
Estimating surgical risk before, during, and after surgery is critical for both preoperative and postoperative decision making. In addition to general indices of hospital mortality, such as the Charlson Comorbidity Index,14 the Deyo method,15 and the Elixhauser16 method, additional specific indices exist for surgical risk. These indices use administrative patient data to predict mortality and identify patients at a higher risk of adverse events. RSIs1 were developed using ICD-9-CM diagnosis and procedure codes for adult hospital inpatients, obtained from the Medicare Provider Analysis and Review (MEDPAR) database for the period of 2001 to 2006. The ACS NSQIP incorporates risk-adjustment indices for mortality.4 Although these models are highly predictive, they require collection of detailed information on a number of patient risk factors including laboratory values. Generalizable and practical risk-adjustment models would ideally use a limited number of risk factors that could be obtained for most patients. In addition, models using ICD-9-CM diagnoses are challenging to use preoperatively as these coded data are typically not available until coding is performed after discharge.
Effective implementation of risk stratification strategies should allow for improved identification of patients who face higher risks of poor outcomes due to preventable or manageable complications. These patients can be provided with evidence-based risk-reduction strategies to improve their likelihood for a good outcome, such as increased postoperative monitoring and transfer to a higher level of care. In addition, this approach also provides information to patients and their families on patients’ relative conditions after surgical procedures. Such risk models can also provide a target for surgical teams and researchers aiming to improve outcomes, such as rapid response teams,17 and a measure for quality monitoring and improvement programs, even in resource-poor settings. Overall, such models are important because of their practical implications for patient safety, level of care, and cost reduction.
There are a number of limitations to our study that must be considered. First, the current study has all the limitations of a retrospective study that includes administrative data. As administrative data are frequently used for billing purposes, data sets are more uniform. However, diagnostic and procedure codes tend to lack specificity for complex clinical cases.18 Second, the RQIs were derived using data from 2005 to 2008. The POARisk model was derived using data from 2004 to 2009, whereas the current time frame was 2008 to 2013. Review of the total number of CPT changes for the years 2005 to 2013 has been shown to total more than 2,500 changes.19 Furthermore, preexisting conditions can only include conditions that are known at the time of admission and may not necessarily include unknown patient conditions. In addition, multiple surgeries on the same individuals accounted for less than 6% and were ignored in the models. Records with missing data were excluded from the analysis. The predictive accuracy values were derived from the same data set that was used for developing the model, which combined SAS with the two risk scores. In theory, there is a risk of “optimism” of predictive accuracy measures.20 The risk of this “optimism” is low due to the large sample size and small number of parameters that needs to be estimated. Finally, we are limited by the fact that this study was a single-center evaluation.
In summary, we externally validated the POARisk model and evaluated the performance of the RQI and POARisk models with and without the SAS. The RQI and POARisk had excellent discrimination and overall good performance. Both of these preoperative risk models were not meaningfully improved by including the intraoperative course through the addition of the SAS.
This work was funded, in part, by the Department of Anesthesiology, Vanderbilt University, Nashville, Tennessee, and the Foundation for Anesthesia Education and Research and Anesthesia Quality Institute Health Services Research Mentored Research Training Grant (HSR-MRTG), Schaumburg, Illinois (to Dr. Wanderer).
The authors declare no competing interests.