“…machine-learning properties include an ability to capture…variables…[called] model features...[too numerous for humans] to perceive or simultaneously consider…[and] the ability…to model complex relationships between model features which otherwise eclipse human understanding.”
MACHINE learning, the quintessential tool currently driving forward the development of artificial intelligence, was discovered and developed decades ago. Nevertheless, it is only recently that machine learning has seen an exponential increase in growth, sophistication, and influence. Recent success stories outside of health care are numerous, including Facebook’s DeepFace, unveiled in 2014, which is a machine-learning technology capable of identifying faces with 97.25% accuracy (compared to human accuracy of 97.53%).1 In 2016, Google adopted a deep learning approach to language translation, using an algorithm that is fed massive amounts of data to effectively train itself to recognize patterns in speech, with translation errors reduced by 87%.2
Machine-learning techniques like these may be coming soon to an operating room near you; in this issue are three examples of machine learning applied to our field. These include articles by Lee et al.,3 where machine-learning techniques are used to predict postoperative mortality from electronic health record data, as well as articles by Kendale et al.4 and Hatib et al.,5 in which hypotension is predicted via machine-learning algorithms that leverage data available during induction of anesthesia and high-fidelity arterial line waveforms, respectively. Previously, in the March 2018 issue of Anesthesiology, Lee et al.6 used machine learning to predict bispectral index values produced by target-controlled infusions of propofol and remifentanil. An accompanying editorial provided a valuable summary of the history of artificial intelligence and an introduction to machine learning, the component of artificial intelligence that allows computers to make what humans describe as intelligent choices and predictions.7 Although disagreement exists as to whether artificial intelligence, as driven by machine-learning algorithms, portends an optimistic or ominous future, it is indisputable that machine- learning paradigms have gained widespread traction in every industry.
Within the articles featured in this issue, a rich underlying digital health dataset enabled the authors to leverage properties of machine learning to study old problems in new ways. These machine- learning properties include an ability to capture numerous variables, better known as machine-learning model features, which would otherwise elude human abilities to perceive or simultaneously consider (as is the case for the 2.6 million arterial waveform combinatorial features described by Hatib et al.5 ). These also include the ability of machine learning to model complex relationships between model features that otherwise eclipse human understanding (as is the case for the deep neural network model described by Lee et al.3 ).
Although some “transparent” machine-learning methods provide insight into associations discovered, machine-learning predictive models by nature do not require human comprehension in order to work. An ensuing challenge for scientific progress over the next decade will be to create and enforce standards for evaluating these methods, so as not to supersede the ability of authors to explain, or readers to understand. Concurrent with the rise of Big Data has been a rise in the inconsistency and uncertainty of applying machine-learning concepts to datasets. If not kept in check, spurious conclusions drawn from methodologically unsound studies threaten the credibility of this science. Answering this call to action, and importantly recognized by all three featured articles, are a set of multidisciplinary guidelines for developing and reporting machine-learning predictive models in biomedical research—well worth the read.8
Beyond a dire need for reporting standards in machine-learning predictive models, it is of equal burden for practitioners to have a basic literacy of machine-learning concepts in order to appraise machine-learning–based investigations, much in the same way current biomedical literature demands a basic literacy of classical statistics and study design. These machine-learning concepts include the use of training, testing, and validation datasets, used to develop, assess internal performance, and externally validate machine-learning algorithms, respectively (fig. 1). Additionally, just as clinicians are familiar with conventional statistical analyses, such as logistic regression (which consequently happens to be one simple type of algorithm supported by machine learning), it may behoove the perioperative clinician to be familiar with other machine-learning techniques, including naïve Bayes, support vector machines, and random forests, to name a few—others are highlighted by Kendale et al.4 in this issue.
As demonstrated by the studies in this issue, the principal advantage of machine learning is the boost in performance it achieves when attempting to predict an observed outcome for which the range of explanatory features is large, or the depth of interactions between features is overwhelmingly complex. To predict hypotension, Hatib et al.5 brilliantly tap into vast arrays of data within the arterial line waveform, extending far beyond simple characteristics such as heart rate and blood pressure (and furthermore, far beyond “complex” characteristics such as pulse pressure variation, systolic pressure rise [dP/dt], and waveform area). When posed with an analytic task in which potential predictive features are in the thousands or millions, or of nuanced complexity, the flexibility of machine-learning techniques to accommodate inputs simply outmatch any traditional analytic method. In biomedical literature, other fields leveraging machine learning to tackle complex tasks include image processing (e.g., computer vision) of radiographic9 or whole-slide pathology10 images, as well as text analysis (natural language processing) of clinical notes.11,12
In contrast, for predictive analytic tasks in which features remain countable, or relationships explainable, machine learning may still prove useful, but will likely be of more modest benefit. In the article by Kendale et al.,4 an ensemble of machine-learning methods indeed outperformed a classic logistic regression approach for predicting hypotension, but the overall performance of the machine-learning model remained far from perfect. In the case of the best-performing algorithm (gradient boosting machines), Kendale et al.4 demonstrate a relatively small improvement compared to a classic logistic regression approach. Similarly, whereas Lee et al.3 successfully demonstrate a deep learning approach to predicting postoperative mortality from intraoperative data, the authors fail to demonstrate improvement compared to logistic regression, a recurring issue in studies promoting the use of deep learning.
As with all methodologic approaches, machine learning is not without drawbacks. The most hotly contested is the difficulty of understanding mechanisms driving the prediction models presented. Herein lies the “black magic” of machine learning: although the predictive performance of a machine-learning algorithm can be precisely quantified—and sometimes, this performance is staggering—the question of how to interpret and act upon the information generated remains wholly unanswered. In cases where mechanisms are of limited concern, or penalties for incorrect predictions low—such as facial recognition in family photos—machine-learning techniques deftly succeed in their purpose. Conversely, in cases where mechanisms are critical, and penalties for error are high—as is often the case in health care, and particularly in anesthesiology—a machine-learning approach falling anywhere short of nearly perfect remains unviable. Hatib et al.5 importantly note that although prediction of hypotension can be established with high fidelity, it remains entirely unclear as to how a clinician should respond to such an alert. This issue is even more critical, considering the generalizability and reproducibility concerns of such models. In many studies leveraging machine learning, insufficient testing and validation of complex models, particularly those using deep learning, can lead to overfitting of even the largest of datasets.
Despite such limitations, the articles in this issue3–5 take courageous shifts in methodologic approaches, and unmistakably establish that machine-learning applications to anesthesiology are not just a fad. The authors should be commended as exemplars for assertively applying new scientific paradigms to our field. How such machine-learning techniques are harnessed in order to improve anesthesia, and more broadly advance health sciences, remains a challenge for decades to come.
All work and partial funding attributed to the Department of Anesthesiology, University of Michigan Medical School (Ann Arbor, Michigan). This editorial review described was supported in part by the National Heart, Lung, and Blood Institute of the National Institutes of Health (Bethesda, Maryland), grant No. 1K01HL141701-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Dr. Najarian has developed algorithmic technologies for prediction of cardiac events that have been licensed to Trove Analytics, Inc. (Silicon Valley, California), and New Vital Signs, Inc. (Ann Arbor, Michigan). The other authors are not supported by, nor maintain any financial interest in, any commercial activity that may be associated with the topic of this article.