Maternal complications during and after childbirth demonstrate wide variation across hospitals
National reporting systems do not integrate maternal and newborn outcomes when defining hospital obstetric care quality
Administrative data can be used to calculate hospital-level risk-adjusted maternal, newborn, and composite maternal–newborn performance
Maternal and newborn hospital performance were poorly correlated, suggesting that composite performance measures must also report underlying maternal and newborn performance separately
The number of pregnancy-related deaths and severe maternal complications continues to rise in the United States, and the quality of obstetrical care across U.S. hospitals is uneven. Providing hospitals with performance feedback may help reduce the rates of severe complications in mothers and their newborns. The aim of this study was to develop a risk-adjusted composite measure of severe maternal morbidity and severe newborn morbidity based on administrative and birth certificate data.
This study was conducted using linked administrative data and birth certificate data from California. Hierarchical logistic regression prediction models for severe maternal morbidity and severe newborn morbidity were developed using 2011 data and validated using 2012 data. The composite metric was calculated using the geometric mean of the risk-standardized rates of severe maternal morbidity and severe newborn morbidity.
The study was based on 883,121 obstetric deliveries in 2011 and 2012. The rates of severe maternal morbidity and severe newborn morbidity were 1.53% and 3.67%, respectively. Both the severe maternal morbidity model and the severe newborn models exhibited acceptable levels of discrimination and calibration. Hospital risk-adjusted rates of severe maternal morbidity were poorly correlated with hospital rates of severe newborn morbidity (intraclass correlation coefficient, 0.016). Hospital rankings based on the composite measure exhibited moderate levels of agreement with hospital rankings based either on the maternal measure or the newborn measure (κ statistic 0.49 and 0.60, respectively.) However, 10% of hospitals classified as average using the composite measure had below-average maternal outcomes, and 20% of hospitals classified as average using the composite measure had below-average newborn outcomes.
Maternal and newborn outcomes should be jointly reported because hospital rates of maternal morbidity and newborn morbidity are poorly correlated. This can be done using a childbirth composite measure alongside separate measures of maternal and newborn outcomes.
In the 15 yr since the publication of the Institute of Medicine report highlighting the need to reduce medical errors and improve patient safety,1 complications after childbirth have become more common, not less common.2,3 The number of pregnancy-related deaths in the United States increased from 7.2 to 17.3 per 100,000 between 1987 and 2013.4 Many pregnancy-related deaths, such as those due to hemorrhage and preeclampsia, are preventable5,6 and the quality of obstetrical care across U.S. hospitals is uneven.7,8 Rising rates of maternal deaths and severe morbidity led the American College of Obstetricians and Gynecologists and the American Society of Anesthesiologists to create the Maternal Quality Improvement Program (Washington, D.C.) outcomes registry to serve as a platform for reporting risk-adjusted outcome metrics and improving the quality of obstetrical care.9
Currently available outcome measures for obstetrical care10–13 are limited because they are not risk-adjusted and do not account for differences in hospital case mix. To differentiate between obstetrical teams that provide high- and low-quality care, we need statistical models that adjust for differences in patient risk and that properly account for statistical noise due to random variation.14 Our goal was to develop a risk-adjusted composite measure of severe maternal morbidity and severe newborn morbidity in the mother and her newborn based on administrative and birth certificate data. Although the Maternal Quality Improvement Program intends to base performance reporting on electronic quality measures, claims-based measures may be useful because the penetration of the Maternal Quality Improvement Program–compliant electronic medical records among U.S. hospitals are still very limited. In developing a composite measure of childbirth outcomes, we address the possibility that measures that focus individually on either maternal or newborn outcomes may not adequately assess the quality of childbirth outcomes that uniquely involve two sets of outcomes: one for the mother and one for her newborn. Relying on administrative instead of clinical data presents an opportunity to develop a measure for national reporting similar to the Centers for Medicare and Medicaid Hospital Compare without having to wait for electronic medical records–based clinical data to become widely available. This measure could serve as a team-based shared accountability measure for anesthesiologists, obstetricians, pediatricians, and intensivists.15 It could be used by clinicians to identify areas of improvements and help expectant mothers decide where they will deliver their babies.
Materials and Methods
This study was conducted using linked administrative data and birth certificate data from the California Office of Statewide Health Planning and Development (Sacramento, California). Data linkage was performed by the Office of Statewide Health Planning and Development, and more than 97% of maternal and neonatal hospital discharge records were linked with birth certificate data.16 The linked Office of Statewide Health Planning and Development data contain comprehensive information on maternal demographics and socioeconomic status (race, ethnicity, education), prenatal data (height and weight at delivery, multiple gestation, prior cesarean delivery, gestational age), labor diagnoses (preeclampsia, placental abnormalities, gestational diabetes, fetal presentation), International Classification of Diseases, Ninth Revision–Clinical Modification diagnostic and procedure codes, outcomes (Apgar scores, discharge status, maternal and newborn intensive care unit admission), and hospital identifiers and characteristics. The California Office of Statewide Health Planning and Development and the Institutional Review Board of the University of Rochester’s School of Medicine and Dentistry (Rochester, New York) reviewed and approved this study.
This study was based on 983,019 obstetric deliveries in 2011 and 2012, each including the mother and her newborn. Deliveries with missing maternal or newborn discharge status (10,347), maternal age (419), gestational age (20,720), and height or weight (31,461) were excluded from the analyses. We also excluded maternal–infant dyads with maternal body mass index less than 15 or greater than 80 (84), age less than 12 yr or greater than 55 yr (31), parity greater than 12 (259), gestational age more than 44 weeks (12,058), or less than 28 weeks (4,977), newborn congenital disorders (5,566), or cases where the maternal and newborn hospital identifier were not the same (151). For multiple gestations, we only included the firstborn infant (Supplemental Digital Content, figure 1, http://links.lww.com/ALN/B946).
Measure Development and Validation
Our outcomes of interest were severe maternal morbidity and severe newborn morbidity to focus on the most severe complications of childbirth. We used the Centers for Disease Control and Prevention (Atlanta, Georgia) algorithm based on International Classification of Diseases, Ninth Revision–Clinical Modification diagnostic and procedure codes for severe maternal morbidity.17 We used present-on-admission codes to identify conditions present before admission versus complications that occurred after hospital admission. We supplemented the Centers for Disease Control and Prevention algorithm using birth certificate data and also included maternal mortality in severe maternal morbidity. (Supplemental Digital Content, table 1, http://links.lww.com/ALN/B946). We adapted the International Classification of Diseases, Ninth Revision–Clinical Modification–based algorithm for Unexpected Severe Newborn Complications developed by the California Maternal Quality Care Collaborative and endorsed by the National Quality Forum18 as a measure of severe newborn morbidity. This measure is used by hospitals in California19 and hospitals participating in the National Perinatal Information Center reporting network.20 Because our goal was to create a risk-adjusted outcome measure broadly applicable to all pregnancies, we did not apply the denominator exclusions such as prematurity and low-birth weight used in the original specification of this measure. We supplemented this algorithm using birth certificate data and also included newborn mortality (Supplemental Digital Content, table 1, http://links.lww.com/ALN/B946). For patients who were transferred out, maternal and newborn outcomes were based on both the index and transfer hospital discharge data and were attributed to the index admission. Admissions were attributed to the hospital with the index admission to avoid deincentivizing higher-resource hospitals from accepting patients with complications from lower-resource hospitals. This also incentivizes lower-resource hospitals to transfer women with “complex maternal conditions and critically ill pregnant women and fetuses”21 to higher-resource hospitals where patient outcomes may be better.
We first constructed a patient-level nonhierarchical multivariable logistic regression model for severe maternal morbidity using data from 2011. The list of potential covariates was adapted from a comorbidity index developed by Bateman et al.,22 supplemented with birth certificate data. We included additional risk factors thought to be associated with severe maternal morbidity based on our literature review and clinical experience (Supplemental Digital Content, table 2, http://links.lww.com/ALN/B946). We specified a nonparsimonious model and included some risk factors that were not statistically significant but were assumed to be clinically important to minimize the risk of omitted variable bias (Supplemental Digital Content table 3, http://links.lww.com/ALN/B946). To optimize model fit, we used fractional polynomials23 to linearize the logit term. We then constructed a separate model for severe newborn morbidity using the same approach. We evaluated model fit using the C statistic, Hosmer–Lemeshow statistic, and calibration curves. Model fit was assessed in (1) the development data set (2011) and (2) the validation data set (2012) using the model coefficients estimated in the 2011 data.
Next, we specified a random-intercept model for severe maternal morbidity in which hospitals were specified as a random intercept in the development data. We excluded hospitals with 2011 delivery volumes less than 100 (262 cases) and hospitals with more than 5% incidence of missing data (2,395 cases). We adjusted for patient-level risk factors using the same set of risk factors used in the baseline nonhierarchical model by incorporating the linearized logit term estimated using the nonhierarchical model as a single covariate. We then calculated the risk-standardized severe maternal morbidity rate using the approach described by Krumholz et al.24 and used by the Centers for Medicare and Medicaid (Baltimore, Maryland; table 1). This rate accounts for clustering of observations within hospitals and is defined as the ratio of the hospital predicted-to-expected severe maternal morbidity rate multiplied by the population average severe maternal morbidity rate. The expected hospital severe maternal morbidity rate is calculated using the patient-level regression coefficients and assumes that patients are treated at an “average” hospital (table 1). The predicted hospital severe maternal morbidity rate is calculated using both the patient-level regression coefficients and the hospital random effect to include the hospital contribution to patient outcomes. The hospital predicted-to-expected ratio is a measure of hospital performance. By construction, hospitals with a predicted-to-expected ratio significantly greater than 1 are classified as low-performance outliers (the lower bound of the 95% CI for the predicted-to-expected ratio is greater than 1). Similarly, hospitals with a predicted-to-expected ratio significantly less than 1 are classified as high-performance outliers (the upper bound of the 95% CI for the predicted-to-expected ratio is less than 1). Hospitals were categorized as average-performance if the 95% CI for the predicted-to-expected ratio included 1. We use the predicted-to-expected ratio instead of the observed-to-expected ratio because the use of hierarchical modeling results in more stable estimates of hospital performance and better predicts future hospital performance.25 We calculated the risk-standardized rate of severe maternal morbidity by multiplying the predicted-to-expected ratio by the overall rate of severe maternal morbidity in the entire patient cohort. We used bootstrapping to estimate 95% CI around the point estimates for the severe maternal morbidity rate. We also used this approach to calculate the hospital severe newborn morbidity risk-standardized rates (Supplemental Digital Content, table 4, http://links.lww.com/ALN/B946).
To create a composite measure for severe maternal morbidity and severe newborn morbidity, we did not estimate a single regression model for a composite of maternal and newborn outcomes because we found empirically that the magnitude of the association between patient-level risk factors and maternal outcomes was different from the association between the same risk factors and newborn outcomes. Instead, we combined the risk-standardized rates of severe maternal morbidity and severe newborn morbidity to create a composite risk-standardized rate. We calculated the composite risk-standardized rate as the geometric mean of the severe newborn morbidity and severe maternal morbidity using the same methodology used by the Centers for Medicare and Medicaid to calculate risk-standardized readmission rates.26 The geometric mean is preferred to the usual additive mean because it is scale invariant: a 25% increase in the severe maternal morbidity will result in the same increase in the composite rate as a 25% increase in the severe newborn morbidity, despite the fact that the overall rates for severe newborn morbidity are higher than the overall rate for severe maternal morbidity (additive means are not scale invariant). We used bootstrapping to estimate 95% CI around the point estimates for the composite outcome.
Assessment of Reliability
The reliability, or signal-to-noise ratio, of a performance metric is reported on a scale of 0 to 1.27 High reliability implies that most of the variation in the risk-standardized rate is due to differences in performance between hospitals as opposed to uncertainty in the estimate of individual hospital performance. Low levels of reliability, on the other hand, imply that the difference in complication rates between hospitals is more likely due to random noise as opposed to true differences in performance.28 The reliability of a performance measure to evaluate a hospital’s performance is specified using the following formula28 :
Reliability increases with higher annual delivery volumes and higher outcome rates.29 When the uncertainty (hospital specific error) around a hospital’s performance is large compared to the overall variation in hospital performance, reliability is low. The reliability statistic is calculated for each hospital and then described using the median of the distribution. Although the National Quality Forum requires reliability evaluation, it does not specify an exact threshold for acceptable statistical reliability.28 Some authors recommend using 0.7 as a threshold for measures that are to be publicly reported or used for pay-for-performance.27 Performance metrics that are not reliable will not accurately distinguish high performers from low performers. However, a reliable performance measure that is not risk-adjusted well is not scientifically valid even though it is reliable.30
We first determined the degree of agreement when hospital performance was assessed using risk-standardized maternal outcomes versus newborn outcomes. We used the intraclass correlation coefficient31 to examine the level of agreement between hospital risk-adjusted rates of severe maternal morbidity and severe newborn morbidity. We used weighted κ analysis to examine the level of agreement for categorical measures of hospital performance (high-performance, average-performance, and low-performance) based on either severe maternal morbidity or severe newborn morbidity. We also examined the extent to which hospital performance based on the composite outcome agreed with hospital performance based either on severe maternal morbidity or on severe newborn morbidity using the intraclass correlation coefficient and weighted κ analysis. We rated the level of agreement using the κ statistic and the intraclass correlation coefficient using the following scale: values less than 0 suggest poor agreement, 0.00 to 0.20 slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement, and 0.81 to 1.00 almost perfect agreement.32
We then determined whether hospital performance in 2011 was a good predictor of maternal and newborn outcomes in 2012. Conceptually, performance measures that predict subsequent hospital performance have face validity as quality measures because patients can use them for selecting providers with the expectation that hospital performance remains relatively stable over time.33,34 We used logistic regression analysis in the 2012 data to examine the association between maternal outcomes (severe maternal morbidity) and hospital severe maternal morbidity performance (low-, average-, or high-performance in 2011), controlling for patient risk using the same set of risk factors included in the baseline severe maternal morbidity prediction model. Using the same approach, we also examined the association between maternal outcomes (severe maternal morbidity) and hospital composite performance. We then repeated this analysis using severe newborn morbidity as the outcome of interest to determine whether hospital performance based on the severe newborn morbidity measure in 2011 predicted newborn outcomes in 2012 and whether hospital performance based on the composite outcome in 2011 predicted newborn outcomes in 2012. To further characterize the extent to which hospital performance in 2011 predicted outcomes in 2012, we estimated the average marginal effects for low-, average-, and high-performance hospitals. The average marginal effect for patients treated in high-performance hospital is operationalized by taking all the patients in the 2012 sample and computing their probability of experiencing the outcome (e.g., severe maternal morbidity) by setting the value of the categorical value for hospital performance to indicate that they delivered in a high-performance hospital (regardless of where they delivered) and leaving all other patient risk factors (e.g., parity) the same. These patient-level predictions are averaged together to get the average marginal effect for patients treated in high-performance hospitals. The average marginal effect for patients delivering in low- and average-performance hospitals is estimated in a similar fashion. These analyses were based on the original cohort in 2012 after excluding patients in hospitals without a 2011 quality ranking and hospitals with more than 5% missing data. All statistical analyses were performed using STATA SE/MP (version 14.2; STATA Corp., USA).
The study was based on 883,121 obstetric deliveries in 2011 and 2012. Patient characteristics are described in table 2. The median age was 29 yr. Fifty-eight percent of the patients were white, 5.8% were African American, and 12.6% were Asian. Twenty-five percent were high-school graduates, and 41.7% had either graduated from college or had some college credit. Less than 1% did not receive any prenatal care. Less than 1% had gestational age between 28 and 31 weeks, and 7.7% had gestational age between 32 and 36 weeks. Eighty-four percent had not had a previous cesarean delivery, and 98.5% were singleton deliveries. The incidence of severe maternal morbidity was 1.53%, and the incidence of severe newborn morbidity was 3.67%. Of the components of severe maternal morbidity, blood transfusion (0.97%) and disseminated intravascular coagulation (0.23%) were the most common complications (Supplemental Digital Content, table 5, http://links.lww.com/ALN/B946). Of the components of severe newborn morbidity, sepsis (1.90%), respiratory complications (1.18%), and shock (0.91%) were the most common newborn complications (Supplemental Digital Content, table 6, http://links.lww.com/ALN/B946). Hospital characteristics are shown in table 3. Ten percent of the hospitals were teaching hospitals, 63% were nonprofit, and 61% had neonatal intensive care units.
Hospital Performance on Severe Maternal Morbidity versus Severe Newborn Morbidity
The risk-adjustment models for severe maternal morbidity and severe newborn morbidity are shown in Supplemental Digital Content (tables 5 and 6, http://links.lww.com/ALN/B946). For many of the risk factors, the adjusted odds ratios were very different for the severe maternal morbidity risk-adjustment model versus the severe newborn morbidity risk-adjustment model. The biggest difference was for gestational age: the association between gestational age between 28 and 31 weeks and severe newborn morbidity (adjusted odds ratio, 35.3; 95% CI, 32.7 to 38.1; P < 0.001) was much greater than for severe maternal morbidity (adjusted odds ratio, 2.11; 95% CI, 1.80 to 2.46; P < 0.001). This led to our decision not to model the composite outcome of severe maternal morbidity and severe newborn morbidity as an all-or-none outcome. We instead developed separate performance metrics for severe maternal morbidity and severe newborn morbidity and then used the geometric mean of the risk-adjusted rates for severe maternal morbidity and severe newborn morbidity to evaluate hospital performance using a single composite metric.
The severe maternal morbidity model exhibited acceptable discrimination (C statistic, 0.69) and acceptable calibration (Hosmer–Lemeshow statistic, 92.0; P < 0.001) in the validation data set (Supplemental Digital Content, table 5, http://links.lww.com/ALN/B946). Although the Hosmer–Lemeshow statistic was statistically significant, it is well recognized that this measure of calibration is sensitive to sample size.35,36 Our sample size in the validation data set was over 400,000 observations. Visual inspection of the calibration graph is also consistent with acceptable calibration in the development data set and in the validation data, with the exception of the highest-risk patients, for whom the model overestimated risk (Supplemental Digital Content, figure 2, http://links.lww.com/ALN/B946). The severe newborn morbidity model exhibited acceptable discrimination (C statistic, 0.74) and acceptable calibration (Hosmer–Lemeshow statistic, 11.7; P = 0.31) in the validation data. Visual inspection of the calibration graph also suggests that this model was well calibrated (Supplemental Digital Content, figure 3, http://links.lww.com/ALN/B946).
There was poor agreement when hospital performance was measured using risk-standardized severe maternal morbidity versus risk-standardized severe newborn morbidity (intraclass correlation coefficient = 0.016; 95% CI, −0.042, 0.081; fig. 1a). κ analysis showed poor agreement on which hospitals were classified as low-, average-, and high-performance hospitals using severe maternal morbidity versus severe newborn morbidity (κ = 0.094; 95% CI: 0.010, 0.21; Supplemental Digital Content, table 7, http://links.lww.com/ALN/B946). As expected, hospital performance based on the composite metric exhibited better agreement with severe newborn morbidity and severe maternal morbidity: (1) severe maternal morbidity versus composite metric (intraclass correlation coefficient, 0.31; 95% CI, −0.093 to 0.60; and κ = 0.49; 95% CI, 0.33, 0.52) and (2) severe newborn morbidity versus composite metric (intraclass correlation coefficient = 0.34, 95% CI, −0.034, 0.59; and κ = 0.60; 95% CI, 0.52, 0.68; fig. 1, b and c).
Predictive Value of Hospital Performance on Future Outcomes
After grouping hospitals into high-, average-, and low-performance groups, we found significant associations between hospital performance based on 2011 data and patient outcomes in 2012. Patients admitted to low-performance hospitals (based on the severe maternal morbidity metric) in 2012 had a two-fold higher risk of severe maternal morbidity (adjusted odds ratio, 1.95; 95% CI, 1.42 to 2.68, P < 0.001), whereas patients admitted to high-performance hospitals had a 40% lower risk of severe maternal morbidity (adjusted odds ratio, 0.60; 95% CI, 0.52 to 0.69; P < 0.001) compared to average-performance hospitals (table 4). Note that we describe our results in terms of relative risks because the relative risk is approximated by the odds ratio when the outcome of interest is less than 10%.37 Adjusted severe maternal morbidity rates were 2.81% in low-performance hospitals, 1.49% in average-performance hospitals, and 0.91% in high-performance hospitals (table 4; fig. 2a).
Newborns delivered in low-performance hospitals (based on the severe newborn morbidity metric) had 2.4-fold higher risk of severe newborn morbidity (adjusted odds ratio, 2.39; 95% CI: 2.06 to 2.77, P < 0.001), whereas newborns delivered in high-performance hospitals had a 28% lower risk of severe newborn morbidity (adjusted odds ratio, 0.72; 95% CI, 0.63 to 0.81; P < 0.001; table 4). Adjusted rates of severe newborn morbidity were 5.67% in low-performance hospitals, 2.67% in average-performance hospitals, and 1.98% in high-performance hospitals (table 4; fig. 2b). Patients who delivered in high-performance hospitals or low-performance hospitals in 2012, based on the composite metric estimated using 2011 data, also exhibited significantly better or worse outcomes, respectively (table 4; fig. 2).
The severe maternal morbidity, severe newborn morbidity, and composite metrics exhibited high levels of reliability. Median statistical reliability for the severe maternal morbidity metric was 0.81 (interquartile range, 0.72 to 0.88), 0.93 (interquartile range, 0.86 to 0.97) for the severe newborn morbidity metric, and 0.92 (interquartile range, 0.88 to 0.95) for the composite metric. Seventy-nine percent of hospitals had statistically reliable rates of severe maternal morbidity, 94.2% of hospitals had statistically reliable rates of severe newborn morbidity, and 97.5% of the hospitals had statistically reliable rates of the composite outcome using a reliability threshold of 0.7.
Feasibility of Public Reporting
Figure 3a shows caterpillar graphs for each of the three measures: severe maternal morbidity, severe newborn morbidity, and the composite metric for severe maternal morbidity and severe newborn morbidity. The link to this interactive graph is https://www.urmc.rochester.edu/sites/glance/. Each point on this graph represents the risk-adjusted rate for a specific hospital, along with the 95% CI. Hospitals with green shading are high-performance outliers, blue indicates average-performance, and hospitals with red shading are low-performance outliers. The user can highlight any hospital on one of the three graphs, and the same hospital will be highlighted in the other two graphs. For example, in figure 3a, we have highlighted hospital ID 141 based on its low risk-adjusted rate for severe maternal morbidity. The graph below shows that this hospital is a low-performance outlier for severe newborn morbidity. Alternatively, in figure 3b, we start with the bottom graph for performance based on the composite for severe maternal morbidity and severe newborn morbidity and have highlighted hospital ID 109, which is a high-performing hospital based on the composite metric; this hospital is also identified in the other two graphs as a high-performance hospital for severe maternal morbidity and severe newborn morbidity. A patient can highlight a specific hospital for one of the reports (e.g., severe maternal morbidity), and the performance of the same hospital will be highlighted in the other two panels (e.g., severe newborn morbidity and composite outcome). In a public report, it would be possible to add functionality that allows patients to select hospitals within a specific radius from their home: hospitals outside of this radius would still appear on the graphic but would be “grayed out.”
There are currently no publicly available report cards for risk-adjusted childbirth outcomes that women can use to select where they will deliver their babies. Existing web-based public reporting initiatives, such as the California Hospital Compare38 and the Leapfrog Hospital Ratings,39 offer information on specific maternal quality indicators such as cesarean-section rates and episiotomy rates but do not provide risk-adjusted information on severe maternal and newborn morbidity. Although Centers for Medicare and Medicaid’s Hospital Compare provides medical and surgical patients with information on complications and deaths, patient experience, unplanned readmissions, and value of care at over 4,000 hospitals,40 there is no comparable web site for expectant mothers. The absence of a national obstetrical report card is particularly significant given the large variation in obstetrical outcomes between hospitals,7,41 and the fact that Medicaid pays for nearly 50% of deliveries in the United States.42 Most currently available outcome measures used to assess the quality of obstetrical care focus on maternal and newborn outcomes separately and often only apply to uncomplicated patients.
In this article, we describe the development of a risk-adjusted composite measure of childbirth outcomes that incorporates severe maternal and newborn morbidity. Maternal and newborn outcomes should be jointly reported because hospital maternal and newborn outcomes are poorly correlated. This composite childbirth outcome measure may be useful for assessing the quality of obstetrical care of both uncomplicated and complicated patients, as well as providing hospitals with the feedback they need to lower the rate of adverse childbirth outcomes. However, the value of the composite measure is tempered by the finding that 10% of hospitals identified as average using the composite measure were ranked below-average using the maternal measure, and 20% of hospitals identified as average using the composite measure were ranked below-average using the newborn measure. This leads us to recommend that childbirth outcomes be reported using a composite childbirth measure alongside separate measures for maternal and newborn outcomes.
There are five National Quality Forum–endorsed nationally implemented measures of perinatal care: elective vaginal deliveries between 37 and 39 weeks of gestation, cesarean delivery for term singleton baby in vertex position, use of antenatal steroids in high-risk pregnancies, healthcare-associated bloodstream infections in newborns, and exclusive breast milk feeding during newborn’s hospitalization.12,13 The National Quality Forum has also endorsed a measure of unexpected complications in term newborns developed by the California Maternal Quality Collaborative.12 Although these measures create a foundation for quality measurement in obstetrics,43 none of these measures capture maternal complications. Furthermore, the newborn complication measure is only applicable to term newborns while excluding preterm newborns who are most at risk for complications and may thus benefit the most from high-quality obstetrical, anesthesia, and neonatal care.
In this study, we found that hospital rates of maternal complications were poorly correlated with hospital rates of newborn complications, a finding also recently reported by another group of investigators.44 We found that 13% of hospitals ranked as high-performance outliers based on newborn outcomes were low-performance outliers based on maternal outcomes. Nearly 20% of hospitals ranked as low-performance outliers based on newborn outcomes were high-performance outliers for maternal outcomes. Childbirth presents a unique challenge to the development of outcome measures because it involves two patients in one episode of care. In some cases, clinicians may select treatments that increase the risk to the mother to reduce the risk of complications for the newborn and vice versa. Few other areas of medical or surgical care involve caring for two patients whose outcomes are so interdependent. Other examples of childbirth composite morbidity measures that incorporate both maternal and newborn outcomes include the Adverse Outcome Index,10 the Weighted Adverse Outcome Score,10 and a composite childbirth measure developed by Korst et al.11 None of these measures are risk-adjusted. In the absence of risk adjustment, differences in hospital outcomes may be due to differences in hospital case mix as opposed to differences in hospital quality.14 Although never perfect, risk adjustment is necessary to level the playing field so that hospitals with sicker patients can be fairly compared to hospitals with healthier patients.
Our findings demonstrate the feasibility of measuring hospital performance for obstetrical care using administrative data and birth certificate data. These risk-adjusted measures meet a high standard of statistical reliability that exceeds the reliability of existing performance measures used by the American College of Surgeons to benchmark outcomes in noncardiac surgery.29 Because hospital performance on these childbirth measure based on data from a preceding year is strongly associated with hospital performance in the subsequent year, patients can use these measures to choose where they want to deliver their babies. These measures can be reported using a patient-friendly format with an interactive web-based approach that allows patients to simultaneously examine hospital outcomes for both the mother and baby (https://www.urmc.rochester.edu/sites/glance/).
Our study has important limitations. The most important limitation concerns the use of International Classification of Diseases, Ninth Revision–Clinical Modification coding and birth certificate data to identify adverse outcomes and risk factors. Coding accuracy for most of the individual conditions in the Centers for Disease Control and Prevention algorithm for severe maternal morbidity is high, with positive predictive values exceeding 80% in most cases.45 Main et al.46 have validated the use of the Centers for Disease Control and Prevention International Classification of Diseases, Ninth Revision–Clinical Modification algorithm for severe maternal morbidity, showing that the Centers for Disease Control and Prevention algorithm displays moderate-to-excellent discrimination for identifying severe maternal morbidity. The most common cause of false-positives for the Centers for Disease Control and Prevention algorithm are patients who received less than 4 units of blood,46 because the generally accepted threshold for transfusion-related severe maternal morbidity is transfusion with 4 or more units of blood,47 and International Classification of Diseases codes can only identify patients who receive a blood transfusion, not how much blood they received. Nonetheless, other claims-based performance metrics for severe maternal morbidity used for public reporting,48 such as the Adverse Outcome Index10 and the Weighted Adverse Outcome Score,10 are also based on the International Classification of Diseases-based Centers for Disease Control and Prevention algorithm, and the Centers for Disease Control and Prevention publicly reports severe maternal morbidity using its International Classification of Diseases-based mapping algorithm.49
More generally, other studies have also validated the accuracy of birth certificate and administrative data using data from the medical record.50,51 Nevertheless, it is likely that a significant portion of the observed variation in hospital childbirth outcomes is due to the lack of specificity of International Classification of Diseases codes and differences in coding practices across hospitals as opposed to true differences in hospital performance. By using the present-on-admission indicator to differentiate complications from preexisting conditions, we have mitigated the risk that (1) preexisting conditions are counted as hospital complications, and (2) hospital complications are misclassified as risk factors.52 In the absence of manually extracted clinical data, administrative data are the most feasible source of data for performance reporting in obstetrics in the near-term. Although manually collected clinical data are more accurate than administrative data, it is unlikely that the national reporting of obstetrical outcomes will be based on clinical data because of cost considerations.53 Despite the well recognized limitations of administrative data for performance reporting,54 the Centers for Medicare and Medicaid widely uses claims-based measures for public reporting55 and for pay-for-performance programs.56,57 There are ongoing efforts by the American Society of Anesthesiologists and the American College of Surgeons to incorporate discrete data fields in commercial electronic medical records (R. P. Dutton, M.D., M.B.A., Chief Quality Officer, US Anesthesia Partners; written communication, October 2017). However, the accuracy of electronic medical records data may not necessarily be better than current administrative data.58–60
Second, our risk-adjustment models are not based on International Classification of Diseases, Tenth Revision codes, and hospitals have been using International Classification of Diseases, Tenth Revision codes since October 2015. International Classification of Diseases, Tenth Revision data were not available when our analyses were conducted, and even now researchers have access to only 3 months of International Classification of Diseases, Tenth Revision data from California. Although our models, as specified, cannot be used to report risk-adjusted childbirth outcomes, they can serve as a template for developing International Classification of Diseases, Tenth Revision based models and could help accelerate the process of developing comprehensive risk-adjustment models for the Maternal Quality Improvement Program.9
Third, the use of composite outcomes for severe maternal morbidity and severe newborn morbidity assumes that all complications are equally important. Composite outcomes are used because most complications are so uncommon that it is not feasible to create separate performance metrics for each one. Composite outcomes have the added advantage that they reflect overall hospital performance. Other registries, such as the American College of Surgeons and Society of Thoracic Surgeons, also use composite metrics to report hospital performance.61,62 Fourth, our measures are based on linked administrative and birth certificate data that may not be available in many states. However, claims-based measures may be feasible to implement if the Maternal Quality Improvement Program requires hospitals to submit administrative and birth certificate data as part of its minimal data set. Fifth, our measures excluded maternal–infant dyads with a gestational age less than 28 weeks because these newborns will almost uniformly experience severe newborn morbidity. We also excluded childbirths with congenital disorders because these are a very heterogeneous population for whom risk adjustment might not be adequate. We also excluded about 6.5% of cases because of missing data, with gestational age, height and weight being the most common sources of missing data. Because these missing data points are not likely to be explained by our data, multiple imputation would not be appropriate.63 Finally, our approach will need to be replicated using national data to ensure generalizability.
We should also recognize that public reporting may have unintended consequences. For example, some hospitals may become less willing to offer a trial of labor after a cesarean delivery because a failed trial of labor is associated with a greater risk of serious maternal and newborn morbidity.64 Furthermore, it is possible that using the Centers for Disease Control and Prevention algorithm to identify severe maternal morbidity could in some cases discourage the use of appropriate medical interventions and lead to worse outcomes.
Our findings may have important policy implications. The Centers for Medicare and Medicaid has committed itself to transforming the traditional fee-for-service payment model to a value-based system, which incentivizes higher-quality care.65 The Centers for Medicare and Medicaid is likely to expand hospital- and physician-based incentive plans to include obstetrics66 because the Centers for Medicare and Medicaid pays for nearly 50% of deliveries in the United States.42 Existing Centers for Medicare and Medicaid measures are based on administrative data because clinical data are currently too expensive for most hospitals to collect. Basing obstetrical quality measures on administrative and birth certificate data submitted by hospitals to a clinical registry such as the Maternal Quality Improvement Program could serve as a transitional step toward establishing quality measures based on clinical electronic medical records data and could help lay the initial foundation for value-based purchasing in obstetrics.
In addition to promoting value-based purchasing, performance reporting could serve to promote more patient-centric care by providing the parturient with the information she needs to select where she delivers. Even if public reporting for maternal and newborn outcomes were available, it would be challenging for parturients to identify hospitals with good maternal and newborn outcomes unless childbirth outcomes were presented using a composite measure because hospital and maternal complication rates are poorly correlated. Using a web-based interface, information on overall childbirth outcomes (composite measure) could be linked to measures that focus separately on maternal and newborn outcomes, as we demonstrate. This could allow a parturient to more easily make a decision that involves her and her newborn.
Among the world’s developed nations, the United States stands alone as the only country where maternal mortality rates are increasing.67 Hospital rates of severe maternal morbidity and severe newborn morbidity vary widely. Closing this performance gap could lead to better childbirth outcomes. Performance feedback to all U.S. hospitals could be rapidly implemented using administrative and birth certificate data if hospitals were to submit this data to a national registry. We believe that this could be an important step toward reducing the rates of severe complications in mothers and their newborns and reversing the maternal mortality trend in the United States.
Supported by funding from the Department of Anesthesiology and Perioperative Medicine at the University of Rochester School of Medicine and Dentistry, Rochester, New York.
Drs. Glance, Dick, and Hasley serve on the Steering Committee for the Maternal Quality Improvement Program. The other authors declare no competing interests.