“In an era of ‘precision medicine,’... it is surprising that large scale personalized transfusion risk systems are not widely used.”

Image: Adobe Stock/J. P. Rathmell.

Image: Adobe Stock/J. P. Rathmell.

Close modal

Of the many roles an anesthesiologist fills each day, managing intraoperative transfusion remains one of the most important and complex. The evidence supporting different transfusion thresholds and consensus guidelines continues to evolve.1–3  Ensuring availability of crossmatched blood is counterbalanced by the increasing need to avoid unnecessary testing, wasting of scarce blood products, and healthcare costs, particularly during a pandemic.4  In this issue of Anesthesiology, Lou et al.5  present an important advancement in transfusion risk prediction by incorporating patient-specific factors and offering a publicly available open-source machine learning algorithm for implementation. Using more than 3 million records from the American College of Surgeons National Surgical Quality Improvement Program registry between 2016 and 2018, they developed an algorithm that combined historical procedure-specific transfusion rates and patient-specific demographics, comorbidities, and laboratory values to predict erythrocyte transfusion on the day of surgery. The algorithm was then validated against 2019 registry data for 1 million patients and 2020 local academic medical center data. More importantly, they compared their algorithm’s recommendations to the current standard of care, the Maximum Surgical Blood Ordering System, which only uses historical procedure-specific transfusion data.6  The algorithm reduced the number of predicted transfusions by a third, while maintaining 96% sensitivity, the current standard. These predictions can be used to guide preoperative type and screen orders.

The work has many strengths, including the reliability of the underlying scientific techniques and facilitators to clinical implementation. The use of national data across the hundreds of centers participating in the National Surgical Quality Improvement Program registry improves generalizability and decreases the sampling bias likely with algorithms based upon smaller, more homogenous data sets. This data set includes not only academic medical centers but also hundreds of community hospitals reflecting a more diverse range of care settings. For example, although the single-center external validation at the authors’ tertiary care center demonstrated a 6.7% transfusion incidence, the national data set used to train and internally validate the algorithm demonstrated a 2.4 and 2.2% transfusion incidence.5  Next, although the authors use a “black box” gradient boosting machine algorithm, they make the model and its predictions accessible through an interactive notebook.7  In addition, they implement Shapley values. Shapley values demystify the algorithm’s prediction for a given patient; instead of simply providing a probability of transfusion, they help explain which predictors increased or decreased the likelihood of transfusion and by how much. The authors provide a graphical example, known as a “force plot.” A theoretical force plot for a patient is presented in figure 1. It also demonstrates some of the patient-specific factors included in the algorithm, such as age, preoperative hematocrit, coagulation studies, platelet count, serum creatinine, and patient weight.

Fig. 1.

Use of Shapley values to quantify relative contributions of model features for an individual prediction. Shown is an illustrative example of Shapley values informing the predicted probability of transfusion for a single patient undergoing a laparoscopic robotic-assisted partial nephrectomy. Shapley values are properties of the model with respect to an individual prediction within a data set. Shapley values can be used to improve the transparency and explainability of machine learning model predictions by quantifying the direction and magnitude of salient factors influencing the individual prediction. In this illustrative example, a procedure-specific transfusion rate of 1.3% is a factor having a strong contribution to decreased risk of transfusion (large width, blue), whereas a serum albumin level equal to 4.0 g/dl is a factor having a weak contribution to decreased risk (small width, blue). Conversely, a serum creatinine equal to 1.7 mg/dl is a factor having a strong contribution to increased risk of transfusion (large width, red), whereas age equal to 73 yr is a factor having a weak contribution to increased risk (small width, red).

Fig. 1.

Use of Shapley values to quantify relative contributions of model features for an individual prediction. Shown is an illustrative example of Shapley values informing the predicted probability of transfusion for a single patient undergoing a laparoscopic robotic-assisted partial nephrectomy. Shapley values are properties of the model with respect to an individual prediction within a data set. Shapley values can be used to improve the transparency and explainability of machine learning model predictions by quantifying the direction and magnitude of salient factors influencing the individual prediction. In this illustrative example, a procedure-specific transfusion rate of 1.3% is a factor having a strong contribution to decreased risk of transfusion (large width, blue), whereas a serum albumin level equal to 4.0 g/dl is a factor having a weak contribution to decreased risk (small width, blue). Conversely, a serum creatinine equal to 1.7 mg/dl is a factor having a strong contribution to increased risk of transfusion (large width, red), whereas age equal to 73 yr is a factor having a weak contribution to increased risk (small width, red).

Close modal

Three factors increase the potential implementation value of the current work: the use of preoperative data commonly available in routine clinical practice, comparison with existing transfusion prediction systems, and the open sharing of the data preparation code (in the statistical software R) and the algorithm derivation and evaluation code (in Python). The current work follows good practices for building prediction models intended for clinical use. Rather than using all available data, the authors carefully identified the time point of prediction—the day of surgery before surgery start—and restricted the variables in the analysis to those that would typically be available at that point in time in most electronic health records. For example, although estimated or predicted blood loss may improve model performance, it is unlikely to be available in a typical preoperative electronic health record. Instead, the authors focused on available data, such as comorbidities, commonly used preoperative laboratory testing, and the historical procedure-specific transfusion incidence. Next, the authors carefully compared their algorithm’s recommendations to the current standard of care, the Maximum Surgical Blood Ordering System.6  This system, commonly used by blood banks at most modern medical centers, is based upon the historical incidence of transfusion for a surgical procedure at each institution. No patient factors, such as age, sex, or preoperative hematocrit, are used to establish the predicted probability of transfusion. Next, the current analysis elegantly recognizes the differential harm between false positives and false negatives. A patient requiring a transfusion but not having an active type and screen or crossmatched blood is a potentially more harmful situation than a patient undergoing a type and screen but not requiring any transfusion. The authors compared their proposed personalized algorithm with the standard of care and found that the personalized algorithm would recommend one-third fewer type and screen orders while maintaining the same false-negative rate. In their single-center external validation, the personalized algorithm would have recommended 2,360 fewer type and screen orders among 16,053 patients. Finally, to maximize scientific reproducibility and increase the ease of implementation, the authors have provided all of the code used to derive and implement the personalized algorithm in a public repository, as well as an interactive notebook to generate predictions on new patient data. To optimize the prediction at each facility, they recommend incorporating each center’s historical transfusion rate into the algorithm rather than using a national average. By making the underlying code and model publicly available, the authors facilitate the local model tuning and implementation process. Overall, these analytic and dissemination decisions demonstrate a new standard in transparency, reproducibility, personalization, and readiness for implementation.

However, several limitations warrant further consideration before widespread implementation. First, although a national data set was used for derivation and internal validation, the data set lacks hospital-level groupings, identifiers, or characteristics. As a result, it is unclear whether the model may perform more or less well depending upon specific hospital characteristics. This issue of model performance heterogeneity has been observed previously in sepsis prediction8  and may drive future research to optimize the personalized transfusion risk prediction across the wide range of hospitals performing surgery. Next, since the primary outcome of the algorithm is the clinician decision to transfuse rather than a patient-centered clinical outcome, all the biases inherent in clinician decision-making are incorporated into the algorithm. The algorithm does not establish whether a patient would benefit from a transfusion, simply whether or not they received one. As a result, the algorithm simply reflects the prevailing clinician transfusion behavior rather than “optimal” transfusion practices. In addition, any unconscious biases related to patient race, ethnicity, age, comorbidities, or social determinants of health would be incorporated into the algorithm and potentially propagated.9  Next, the underlying data sets have limitations that decrease the generalizability of the algorithm. For example, the external validation using single-center data could only be applied to 53% of cases at the institution because of limited sample size to calculate a historical transfusion rate. How to incorporate site-specific variation in procedure volumes should be considered in future work, particularly as smaller hospitals may not have sufficient volumes to calculate historical transfusion rates across the board. Moreover, subtle nuances regarding the patient (e.g., transfusion preferences, previous positive antibody screen, history of transfusion) or operation (e.g., proximity of a tumor to major vascular structure) that may influence transfusion or blood management cannot be captured in a structured electronic health record and were absent from the current analysis. Finally, clinical workflow realities must be considered; preoperative hematocrit, one of the most important factors in the algorithm, may be ordered concurrently with a type and screen rather than before it. Although the algorithm is resilient to missing laboratory, demographic, or comorbidity data, delaying a type and screen while awaiting such results for optimal use of the algorithm may cause surgical delays or blood product unavailability.10 

Despite these limitations, the exciting work by Lou et al.5  should inspire us. In an era of “precision medicine” where whole exome sequencing costs less than a unit of packed red blood cells, it is surprising that large-scale personalized transfusion risk systems are not widely used. The authors have donated the perioperative community their expertise and diligence via the publicly available code and algorithm. Now, as with all novel technology, we must begin the hard work of assessing the algorithm more broadly. Prospective validation of the algorithm’s impact, potentially using electronic health record adaptive platform randomized controlled trial methodologies, is necessary. Such efforts require integrating the algorithm into perioperative electronic health records, decision support systems, and clinical processes. Finally, development of clinical algorithm governance structures capable of robust, ongoing algorithm evaluation and mitigation against clinician and data set shift is essential for ensuring such algorithms do not do more harm than good.11–13  In summary, personalized surgical transfusion risk prediction algorithms and other similar efforts, if used wisely within a mature implementation infrastructure, offer us an opportunity to use machine learning in the real world to help our patients and clinicians.

Supported by National Institutes of Health grant No. K01HL141701 (to Dr. Mathis). All other funding comes from institutional and/or departmental sources.

Dr. Kheterpal is listed as a co-inventor on patent No. 11,288,445 B2 entitled “Automated System and Method for Assigning Billing Codes to Medical Procedures,” related to the use of machine learning techniques for anesthesia procedure billing. Unrelated to the topic of this article, Dr. Kheterpal has served as an investigator on grants and contracts to the University of Michigan sponsored by Apple Inc. (Cupertino, California), Merck Inc. (Kenilworth, New Jersey), and Blue Cross Blue Shield of Michigan (Detroit, Michigan). Dr. Singh serves on a scientific advisory board for Flatiron Health (New York, New York), an independent affiliate of Roche. Flatiron is a healthcare technology company focused on clinical research and data science for improving cancer care. Unrelated to the topic of this article, Dr. Singh has served as an investigator on grants and contracts to the University of Michigan sponsored by Teva Pharmaceuticals (Parsippany, New Jersey) and Blue Cross Blue Shield of Michigan. Dr. Mathis is not supported by, nor maintains any financial interest in, any commercial activity that may be associated with the topic of this article.”

1.
American Society of Anesthesiologists Task Force on Perioperative Blood Management
:
Practice guidelines for perioperative blood management: An updated report by the American Society of Anesthesiologists Task Force on Perioperative Blood Management
.
Anesthesiology
2015
;
122
:
241
75
2.
Zeroual
N
,
Blin
C
,
Saour
M
,
David
H
,
Aouinti
S
,
Picot
MC
,
Colson
PH
,
Gaudard
P
:
Restrictive transfusion strategy after cardiac surgery.
Anesthesiology
2021
;
134
:
370
80
3.
Hovaguimian
F
,
Myles
PS
:
Restrictive versus liberal transfusion strategy in the perioperative and acute care settings: A context-specific systematic review and meta-analysis of randomized controlled trials.
Anesthesiology
2016
;
125
:
46
61
4.
Stanworth
SJ
,
New
HV
,
Apelseth
TO
,
Brunskill
S
,
Cardigan
R
,
Doree
C
,
Germain
M
,
Goldman
M
,
Massey
E
,
Prati
D
,
Shehata
N
,
So-Osman
C
,
Thachil
J
:
Effects of the COVID-19 pandemic on supply and use of blood for transfusion.
Lancet Haematol
2020
;
7
:
e756
64
5.
Lou
SS
,
Liu
H
,
Lu
C
,
Wildes
TS
,
Hall
BL
,
Kannampallil
T
:
Personalized surgical transfusion risk prediction using machine learning to guide preoperative type and screen orders
.
Anesthesiology
2022
;
137
:
55
66
6.
Frank
SM
,
Rothschild
JA
,
Masear
CG
,
Rivers
RJ
,
Merritt
WT
,
Savage
WJ
,
Ness
PM
:
Optimizing preoperative blood ordering with data acquired from an anesthesia information management system.
Anesthesiology
2013
;
118
:
1286
97
7.
Blood product notebook
,
Google Colaboratory 2022
.
8.
Wong
A
,
Cao
J
,
Lyons
PG
,
Dutta
S
,
Major
VJ
,
Ötles
E
,
Singh
K
:
Quantification of sepsis model alerts in 24 US hospitals before and during the COVID-19 pandemic.
JAMA Netw Open
2021
;
4
:
e2135286
9.
Obermeyer
Z
,
Powers
B
,
Vogeli
C
,
Mullainathan
S
:
Dissecting racial bias in an algorithm used to manage the health of populations.
Science
2019
;
366
:
447
53
10.
McWilliams
B
,
Yazer
MH
,
Cramer
J
,
Triulzi
DJ
,
Waters
JH
:
Incomplete pretransfusion testing leads to surgical delays.
Transfusion
2012
;
52
:
2139
45
11.
Finlayson
SG
,
Subbaswamy
A
,
Singh
K
,
Bowers
J
,
Kupke
A
,
Zittrain
J
,
Kohane
IS
,
Saria
S
:
The clinician and dataset shift in artificial intelligence.
N Engl J Med
2021
;
385
:
283
6
12.
Gulati
G
,
Upshaw
J
,
Wessler
BS
,
Brazil
RJ
,
Nelson
J
,
van Klaveren
D
,
Lundquist
CM
,
Park
JG
,
McGinnes
H
,
Steyerberg
EW
,
Van Calster
B
,
Kent
DM
:
Generalizability of cardiovascular disease clinical prediction models: 158 independent external validations of 104 unique models
.
Circ Cardiovasc Qual Outcomes
2022
;
15
:
e008487
13.
Burns
ML
,
Kheterpal
S
:
Machine learning comes of age: Local impact versus national generalizability.
Anesthesiology
2020
;
132
:
939
41