It has been suggested that predicting difficult tracheal intubation is useless because of the poor predictive capacity of individual signs and scores. The authors tested the hypothesis that an accurate prediction of difficult tracheal intubation using simple clinical signs is possible using a computer-assist model.
In a cohort of 1,655 patients, the authors analyzed the predictive properties of each of the main signs (Mallampati score, mouth opening, thyromental distance, and body mass index) to predict difficult tracheal intubation. They built the best score possible using a simple logistic model (SCOREClinic) and compared it with the more recently described score in the literature (SCORENaguib). Then they used a boosted tree analysis to build the best score possible using computer-assisted calculation (SCOREComputer).
Difficult tracheal intubation occurred in 101 patients (6.1%). The predictive properties of each sign remain low (maximum area under the receiver operating characteristic curve 0.70). Using receiver operating characteristic curve, the global prediction of the SCOREClinic (0.74, 95% CI: 0.72-0.76) was greater than that of the SCORENaguib (0.66, 95% CI: 0.60-0.72, P<0.001) but significantly lower than that of the SCOREComputer (0.86, 95% CI: 0.84-0.91, P<0.001). The proportion of patients in the inconclusive zone was 71% using SCORENaguib, 56% using SCOREClinic, and only 32 % using SCOREComputer (all P<0.001).
Computer-assisted models using complex interaction between variables enable an accurate prediction of difficult tracheal intubation with a low proportion of patients in the inconclusive zone. An external validation of the model is now required.
What We Already Know about This Topic
Predicting difficult endotracheal intubation using a combination of simple physical characteristics is poor and, some have argued, nearly useless
What This Article Tells Us That Is New
In a cohort of 1,665 patients, 6.1% of whom had difficult endotracheal intubation, application of a computer-assisted model using complex interaction yielded excellent prediction
Whether the model generated from this group of patients can be generally applied to yield similarly excellent prediction awaits further research
Difficult tracheal intubation (DTI) remains an important cause of anesthesia-related hypoxic brain damage and death although a significant decrease has recently been observed,1related to improved airway management through adoption of guidelines with airway management algorithms,2,3widespread use of devices such as gum elastic bougie,4laryngeal mask, fiberoptic intubation, new optical devices,5and recommendation to prioritize oxygenation.6However, among the strategies proposed to decrease morbidity and mortality related to DTI the role of its prediction remains a matter of debate. Although many algorithms include preoperative assessment, some authors have suggested that attempting to predict DTI is unlikely to be useful.7
Several clinical signs have been identified as predictors of difficult laryngoscopy or DTI, including the Mallampati score, mouth opening (MO), and the thyromental distance (TMD), and body mass index (BMI).8However, the sensitivity and predictive positive values of these signs are low, precluding an accurate prediction of DTI and this has been confirmed by a recent meta-analysis.9Several studies have been proposed to derive a score from multivariate analysis. Although the predictive properties of these scores were higher than those of individual signs, they remain not very high.10–12Moreover, several methodological weaknesses were present in these previous studies: (1) some scores were developed to predict difficult laryngoscopy and not DTI;10(2) absence of assessment of the calibration of the score; (3) absence of internal or external validation;11and (4) use of a dichotomous approach although a separation into three classes (high, intermediate, and low risk) is now recommended.13Most of these studies did not verify that a logistic model accurately depicts the relationship between variables and outcomes. Last, the possibility that interaction between variables may play an important role was ignored although the key issue might be the proportionality of variables and not the variables themselves: a small individual (with a small MO and TMD) may not be difficult to intubate. In contrast to some authors,7,9we considered that a new paradigm is required for the accurate prediction of DTI, incorporating the more recent developments in biostatistical analysis,13which would provide a valuable tool from a dynamic approach including interactions between variables.
Therefore, we conducted an analysis of variables associated with DTI in a large multicenter cohort. First, we analyzed the predictive properties of each main signs to predict DTI. Second, we tried to build the best score possible using a simple logistic model and compared it with the best score previously described in the literature. Third, we used a more sophisticated technique using computer-assisted calculation. In fact, we consider that such scores could be also determined at the bedside, taking into account the new technologies available. In this study, we test the hypothesis that a better prediction (i.e. , higher area under the receiver operating characteristic [ROC] curve) of DTI using simple clinical signs is possible, pending a changing paradigm in this prediction and risk assessment of difficult airway management, taking into account both interactions between variables and the concept of the inconclusive zone (i.e., less patients in the “gray zone”).13
Materials and Methods
Institutional review board (Comité de Protection des Personnes Pitié-Salpêtrière, Paris, France) approval was obtained. Because there was no randomization and only routine care was performed, waived informed consent was authorized.
Consecutive adult (aged 18 yr and above) patients scheduled for surgery requiring general anesthesia and tracheal intubation using a standard technique with a laryngoscope were included prospectively in our study, including emergency cases and those requiring rapid sequence induction. Patients undergoing regional anesthesia, those undergoing general anesthesia without tracheal intubation (face mask ventilation, laryngeal mask) were excluded. We also excluded patients with indication of intubation with fiberoptic intubation or those in whom another method was used first (videolaryngoscope, fiberoptic intubation). All patients were included over a 19-week period (February 13, 2012–June 24, 2012) in three centers (Paris, Nîmes, and Beaune, all in France).
Information was collected by the anesthesiologists on a standard form. The following information was collected: age, sex, weight, height, BMI, Mallampati classification as modified by Samsoon and Young14performed with the patient in the sitting position with the head in extension, mouth fully opened, tongue out, and with phonation,15whenever possible, the TMD measured with the patient in sitting position and head extension,15MO measured as the interincisor distance,16presence of receding mandible, macroglossia, beard, and lack of teeth. Patients were also asked whether they were habitual (almost every night or every night) snorers or not,16without asking for the snoring loudness.
DTI was defined as a proper insertion of the endotracheal tube with conventional laryngoscopy requiring more than two attempts or lasting more than 10 min, or requiring an alternate technique (bougie, videolaryngoscope, Fastrach®[Laryngeal Mask Company Ltd., Le Rocher, Seychelles], fiberoptic).16Data collected concerning tracheal intubation were: use of paralyzing agents, scheduled versus urgent surgery, Sellick maneuver, characterization of tracheal intubation (easy, difficult, impossible), and grading of the best laryngoscopic view according to the Cormack and Lehane classification.17To minimize uncertainty and inaccuracy of the numerical grading system, schematic diagrams were provided for classification of the view of oropharynx and the glottis, according to Mallampati as modified by Samsoon and Young, and to Cormack and Lehane classifications directly in the data chart, as well as a tape-measure (in mm), as previously described.16We also recorded the characteristics of the individual who attempt first to intubate (nurse anesthetist, resident in anesthesiology, anesthesiologist).
In our institutions, the routine procedure for tracheal intubation has been standardized. The patient’s head and neck were placed in an optimal position (the sniff position)18to improve laryngoscopy and intubation outcome. Preoxygenation of each patient during 4 min by bag and mask with 100% oxygen without ventilation was required, and oxygenation was continued during mask ventilation except for rapid sequence in emergency cases. Each patient was routinely monitored during the whole procedure by electrocardiography, pulse oxymetry, and end-tidal carbon dioxide concentration. Because the type of laryngoscopic blade may influence difficulties in tracheal intubation, at least in emergency conditions,19,20all patients were intubated using a standard Macintosh metallic reusable blades (Heine classic+®; Heine, Herrshing, Germany). In patients requiring emergency surgery and with a full stomach, a rapid sequence using Sellick maneuver and succinylcholine (1 mg/kg) was performed.19,20After tracheal intubation, the correct positioning of the endotracheal tube was confirmed by the anesthesiologist using first detection and curve analysis of end-tidal carbon dioxide and then using bilateral auscultation of lungs. When facing a difficult airway, the French Society of Anesthesiology and Critical Care algorithm was followed.3
Diagnosis Properties of Clinical Tests or Scores
We determined the diagnostic properties of each main variable previously described in the literature (MO, TMD, Mallampati class, BMI) using classical diagnosis variables (sensitivity, specificity, positive and negative predictive values) as well as with the positive and negative diagnosis likelihood ratio. The discriminative power of different predictors of difficult intubation was assessed in the study population, using ROC curves. These curves were obtained by averaging 1,000 populations bootstrapped (sampling with replacement) from the original study population. This method limits the impact of outliers and allows the provision of more robust representations. (CI of the average ROC curves were depicted using box plots. Finally, to provide comparison between studies, the (sensitivity, 1− specificity) values calculated in the previous reports were plotted on the ROC curves related to the study population.
Threshold and Inconclusive Zone Determination
A gray zone approach was used to describe the value for which the variables of interest did not provide conclusive information, that is, the inconclusive zone.13This corresponds to a range of values, for which formal conclusions could not be obtained. To determine this interval of values, we first assessed the optimal threshold using Youden index (J = sensitivity + specificity − 1 = true-positive rate − false-positive rate).21Maximizing J corresponds to maximizing the overall correct classification rates and minimizes misclassification rates.13Youden index determination was then conducted for each bootstrapped population, conducting to a set of 1,000 “optimal” values. The mean value of these optimal values and its 95% CI were then estimated. The inconclusive zone was thus defined for each predictor as well as its 95% CI. This first approach was completed by an alternative approach, aiming at defining three classes of response: negative, inconclusive, and positive. We defined inconclusive responses for values presenting with either sensitivity lower than 90% or specificity lower than 90% (10% of diagnosis tolerance). To better present these results, a two-curve (sensitivity and specificity) representation was provided. These two approaches were not exclusive but complementary, and finally, the largest CI was retained as the inconclusive zone.
Construction of the Predictive Models
We used three scores in our study. The fist one was the score described by Naguib et al .12(SCORENaguib), which was considered as the best score previously described in the literature:
l = 0.2262 − 0.4621.TMD + 2.5516. Mallampati score − 1.1461.MO + 0.0433.height
Intubation would be easy if the numerical value (l ) in the equation is less than zero (i.e ., negative) but difficult if the numerical value (l ) is greater than zero (i.e ., positive).12Some previous studies have provided better predictive scores but they included as a risk factor the fact that the patients already experienced difficult intubation,11and we are of the opinion that these patients should be excluded from any prediction attempt becoming useless by definition, and just be considered as difficult to intubate. The second score was the best score, which could be built from our population but being also simple to calculate at the bedside (SCOREClinic), as previously described.22Last, without consideration for the easiness to construct this prediction score of a difficult airway, we determined the best model feasible using generalized boosting modeling (SCOREComputer).23This approach allows us to take into consideration not only single variables (age, sex, BMI, Mallampati classification, MO, TMD, macroglossia, receding mandible, snoring, were finally retained in the model, see appendix) but also the interactions between variables (as depth as the fourth level) and does not require hypothesis regarding the linearity of the relationship between continuous variables and the endpoint. Generalized boosting modeling is considered outperforming standard logistic regression.24We used this approach to demonstrate that interactions and nonlinearity of the relationship between predictors and the endpoint are of major importance to predict DTI.
Three steps were followed in the SCOREClinicconstruction. First, a multiple forward stepwise logistic regression was performed to assess variables associated with difficult tracheal intubation. We used a semiparsimonious approach, and only unbiased variables, which were available (table 1), were included. Discrimination of the final models was assessed by c-statistic and calibration by the Hosmer–Lemeshow statistic. An internal validation was performed using 10-fold crossvalidation.25Two terms were defined to present the internal validation of the models: the difference (optimism) between the c-statistic observed in the complete population and in the crossvalidation samples; and the optimism corrected c-statistic, expressed as the difference of c-index. Second, we tried to transform the continuous variables selected by the model. Third, we tried to simplify as much as possible the weight allocated to each variable retained in the model. Because ROC analyses have shown an advantage for a score with variable weights compared with a score with equal weights for all variables (data not shown), the weight of each variable included in the score was derived from the logistic regression coefficients. However, several methods were tested: direct sum of the odds ratio, sum of the logistic coefficient, and the process of converting the logistic regression output into a risk index.26
Because it is important to assess a risk score beyond a dichotomous approach,13the population was divided into three categories: patients at low, intermediate, and high risk for difficult tracheal intubation, the intermediate risk corresponding to the inconclusive zone. To provide an unbiased estimate (and its 95% CI) of the endpoint in each stratum of risk, the bootstrap method was used (random selection with replacement).
The number of patients to be included was calculated to obtain at least 100 events (i.e., 100 DTI). Data are expressed as mean ± SD and median (95% CI) for nonnormally distributed variables, and numbers (percentages) for proportion. Normality was assessed with the D’Agostino–Pearson omnibus test. Comparison of two means was performed using the unpaired Student t test, comparison of two medians was performed using the Mann–Whitney test, and comparison of proportions was performed using Fisher exact method. The comparison of two areas under the ROC curves was performed using the nonparametric technique described by DeLong et al .27All P values were two-tailed and a P value of less than 0.05 was considered significant. Statistical analysis was performed with R software** and specific package.28
Analysis of Main Variables
We included 2,048 consecutive patients. Some patients were excluded because a plastic blade was used (n = 146), an alternate technique (videolaryngoscope, fiberoptic) was used at the first attempt (n = 5), or age was below 18 yr (n = 2). Important missing values occurred in 240 patients. Thus, 1,655 patients were retained in the final analysis (fig. 1). DTI occurred in 101 (6.1%, 95% CI: 5.0–7.4%). Among these patients, DTI was determined because of prolonged intubation (more than 10 min) in 20 patients (20%), more than two attempts in 51 patients (50%), and use of an alternative methods in 74 patients (73%) (the sum differs from 100% because several criteria might occur in the same patients). The alternative method were the use of gum elastic bougie (n = 46), videolaryngoscope (n = 24), fiberoptic bronchoscope (n = 6), and Fastrach® (n = 2) (the sum differs from 59 because several techniques might be used in the same patients). No tracheotomy was performed and only one case of impossible tracheal intubation occurred. In this later patient, general anesthesia was performed with laryngeal mask.
Table 1 shows the main characteristics of patients with and without DTI. We determined the diagnostic properties as well as the proportion of patients in and out of the inconclusive zone for each main variable (table 2) previously described in the literature (MO, TMD, Mallampati class, BMI). Figure 2emphasizes the poor global diagnostic properties of each of them reflected by the low, although significant, area under the ROC curve values and shows the importance of the inconclusive zone.
Construction and Comparison of Scores
Table 3reports the model of the logistic regression with continuous variables. Seven independent predictors of difficult tracheal intubation were selected: BMI, Mallampati class, MO, TDM, sex, receding mandible, and the Sellick maneuver. The observed difference between the c-statistic in the entire derivation population and that obtained after 10-fold crossvalidation was below 0.02. Table 3 also reports the results of the logistic regression using stratified variables. The BMI was segmented into three categories (less than 25, 25–34, and more than 35 kg/m2), as well as MO (35 or lesser, 36–50, 50 or more mm), and TDM (60 or lesser, 61–99, 90 or more mm). Mallampati class was entered without any change in the model. The SCOREClinicwas defined using the results of this logistic regression (table 3). This transformation of the variables was associated with a nonsignificant reduction of the c-statistic between the model with continuous variables and the SCOREClinic(respectively 0.75 [95% CI: 0.72–0.77] and 0.74 [95% CI: 0.72–0.76], P = 0.22) and justified by the ease of obtaining the SCORECliniccompared with the model with continuous variables.
The global predictive ability of the SCOREClinicusing ROC curve analysis (0.74, 95% CI: 0.72–0.76) was significantly greater than that of the SCORENaguib(0.66, 95% CI: 0.60–0.72, P < 0.001) but significantly lower than that of the SCOREComputer(0.86, 95% CI: 0.84–0.91, P < 0.001) (fig. 3). Figure 4shows the relative influence of each variable (i.e. , the reduction of squared error attributable to each variable) used in SCOREComputer. Tracheal intubation was performed without muscular relaxation in 7.6% of our patients, a condition that might favor DTI. Muscle relaxant use was less frequently associated with DTI (5.7%vs. 11.1%, P = 0.01), but this variable was not significant in the multivariate model. However, we performed a sensitivity analysis excluding these patients, and the results were not significantly modified (data not shown).
Determination of the Inconclusive Zone for the Scores
Using the inconclusive zone, we defined three groups of patients: low, intermediate, and high risk for difficult tracheal intubation (fig. 5). Using this approach, 56% of the patients presented values of the SCOREClinicin the inconclusive zone, whereas 71% of them had inconclusive values of the SCORENaguib(P < 0.001) and 32%, using the SCOREComputer(P < 0.001).
The goal of our study was to understand better the precise mechanisms involved in the prediction of DTI. We observed that the main variables usually used (MO, TMD, Mallampati class, BMI) are poor individual predictors with a very large inconclusive zone. The diagnostic accuracy of simple scores built using multivariate logistic model is good but probably not sufficient for clinical practice and is associated with a relatively important inconclusive zone. Only a sophisticated model integrating interaction term and taking into consideration the lack of linearity of the main predictors provided an acceptable prediction tool. Boosted trees are the approach we retained in this study, but other similar approaches might also be adapted to predict difficult intubation.
In our study, MO, TMD, and BMI were associated with DTI, as previously reported.2,3,8–12We did not observe that emergency surgery or Sellick maneuver were associated with DTI (table 1). Intubation in emergency conditions has been reported to be more difficult,5,29although some studies reported opposite results.30Although cricoid pressure during the Sellick maneuver may increase difficulties in performing laryngoscopy and thus tracheal intubation, it is also the only risk factor that can be immediately modified when facing difficulties by simply releasing it. In our study, the Sellick maneuver was not associated with DTI and our results concord with previous study.30Most of the patients in our cohort were intubated by a nurse anesthetist, this procedure corresponds to the routine practice in our institutions. However, in the univariate analysis, intubation by a nurse anesthetist or resident was not associated with more frequent DTI (table 1). A low proportion of patients (7.6%) were intubated without muscular relaxation, using local anesthesia of the vocal cords and this practice was significantly associated with DTI (table 1). However, a sensitivity analysis showed that this did not influence our results.
The poor predictive value of each individual variable has been well recognized by previous studies and meta-analysis.7,9Our study confirms this issue and helps us to understand why this occurs. Importantly, the inconclusive zone was wide for each of these variables and the relationship between these variables and the outcome could not be simply described by a linear relationship (fig. 2). Several previous studies demonstrated that a multivariate composite risk score performs better than individual variables do.10–12,31However, these previous studies used simple logistic models, which assume, in most of the cases, both an extensive simplification of the complex relationship between continuous variables and outcome, and disregard for the interactions between variables. In our study a more sophisticated model enables us to improve markedly both the global predictive property as shown by an increase in the area under the ROC curve (fig. 2) and a decrease in the proportion of patients in the inconclusive zone (fig. 3). These results are new data that provide strong arguments to support the hypothesis that individual variables are associated with complex interaction and/or relationship with the outcome and thus that their respective proportions might be as important as their respective values themselves. This can be illustrated by the example of a small patient who has a small MO and a small TDM and is, however, easy to intubate. Height is not recognized as a risk factor for difficult intubation whereas some of the individual variables are obviously linked to the patient’s height.
A promising option for clinical discrimination is to avoid providing a single cutoff that dichotomizes the population, but rather to propose two cutoffs separated by a inconclusive zone, so-called the gray zone.13,32The first cutoff is chosen to exclude the diagnosis with near-certainty whereas the second is chosen to include the diagnosis with near-certainty. This approach is more widely used and results in less loss of information and less distortion than choosing a single cutoff, allowing the clinician to improve clinical decision making. This might be particularly important when the ROC curve is relatively flat and thus when it is difficult to choose a single cutoff, which was the case for most of our individual variables (fig. 2). Our study shows that the proportion of patients in the inconclusive zone was important for each individual variable (table 2) and even for most scores, explaining why most of them fail to be considered as accurate predictors of difficult tracheal intubation. Only the SCOREComputerwas associated with a highly significant decrease in this proportion (fig. 3). Therefore, we strongly suggest that it is important to assess the risk of difficult tracheal intubation beyond a dichotomous approach, the population being divided into three categories at low, intermediate, and high risk for difficult tracheal intubation to implement an airway-management strategy according to this identified risk. Moreover, a useful score should not only provide appropriate delineation of these three risk categories but also minimize the proportion of patients in the intermediate-risk category.
Some limitations in our study deserve consideration. First, this score was built in an adult population and thus may not apply to pediatric patients. Second, some variables were not taken into account in the predictive model, because of insufficient power because they concerned few patients. This is probably the case of radiotherapy sequellae or burns that modify the skin properties and airway abnormalities that may modify the anatomy of the larynx. We also did not consider a history of previous DTI, which has been identified as an independent risk factor.33Nevertheless, because this information is usually either reported or ignored by the patient, documented information may be only incompletely retrieved. Moreover, we think that tracheal intubation should be considered as undoubtedly difficult when such event is reported and no further attempt to predict it is warranted. Third, we did not assess the reproducibility of measurements of each variables and thus of the different scores. The reproducibility may be limited by significant interobserver variability, which can be expected to worsen in some clinical situations such as emergency. Fourth, boosting approach is one of the best modeling approaches ever developed, which has shown considerable success in predictive accuracy. Its strength is that it requires neither specification of the nature of the interactions between the variables nor hypotheses regarding the linearity of the included variables. However, we do not want to promote boosting approach per se but only to suggest that accurate prediction of DTI should take into account interactions and nonlinearities, whatever is the statistical tool used. Last, further studies should be performed in other countries to externally validate our scores.34However, it should be noted that our study was mainly devoted to a better understanding of the precise mechanisms involved in the prediction of difficult intubation and to open a new paradigm rather than proposing a definite scoring system. Overall, our cohort seemed to behave like most of previously described populations9and the value observed of crossvalidation optimism was low, suggesting that our results are robust. Nevertheless, because our model was data driven we must recognize that an external validation remains mandatory before its clinical application.
In conclusion, although each individual sign of difficult intubation has a poor predictive value, computer-assisted models using complex interaction among simple variables (BMI, MO, TMD, Mallampati class, the presence of receding mandible) enables an accurate prediction of difficult tracheal intubation with a low proportion of patients in the inconclusive zone. This paradigm change in prediction of difficult tracheal intubation might concern other important predictions such as difficult mask ventilation.16,35Further studies are needed to test this hypothesis.
The authors thank David Baker, M.D., F.R.C.A. (Department of Anesthesiology and Critical Care, CHU Necker-Enfants Malades, Paris, France), for reviewing the article, Hinda Meddahi and Djalila Saichi (Research Technicians, Department of Anesthesiology and Critical Care, CHU Pitié-Salpêtrière, Paris, France) for excellent assistance in database management, and colleagues and nurses from our institutions for their help in that study.
Appendix: Generalized Boosted Regression Modeling
Boosting is the process of iteratively adding basis functions in a greedy fashion so that each additional basis function further reduces the selected loss function. We used an implementation close from Friedman Gradient Boosting Machine.25This procedure is a part of the “gbm” package for R statistical software available on the CRAN package repository.
To improve the reproducibility of this analysis we would like to share the parameters we have retained to conduct this analysis: (1) the total number of trees to fit was set to 10,000; (2) the maximum depth of variable interactions was set to four, which implies a model with up to four-way interactions; and (3) the shrinkage parameter applied to each tree in the expansion was set to 0.001. The following lines of code estimate the GBM model, reproduce the figure 4(i.e. , relative influence of each variable) and draw a ROC curve of the SCOREComputer.
# Load libraries of statistical packages within R
# Read in your dataset
DATA < −read.csv(“yourdataset.csv”, header = TRUE)
# run GBM model
SCOREcomputer < −gbm(Your.endpoint ~ variable.1 + variable.2 +…+ variable.n,
Data = DATA,
interaction.depth = 4,
n.trees = 10,000,
shrinkage = 0.001,
keep.data = TRUE,
verbose = TRUE)
# Draw a figure with relative influence and ROC curve of the model
par(mfrow = c(1,2))
http://cran.r-project.org. Accessed November 2, 2012.
CRAN = Comprehensive R Archive Network (R name of the Foundation for Statistical Computing); GBM = Generalized Boosted Regression Modeling; ROC = Receiver Operating Characteristic