Existing methods to predict recovery after severe traumatic brain injury lack accuracy. The aim of this study is to determine the prognostic value of quantitative diffusion tensor imaging (DTI).
In a multicenter study, the authors prospectively enrolled 105 patients who remained comatose at least 7 days after traumatic brain injury. Patients underwent brain magnetic resonance imaging, including DTI in 20 preselected white matter tracts. Patients were evaluated at 1 yr with a modified Glasgow Outcome Scale. A composite DTI score was constructed for outcome prognostication on this training database and then validated on an independent database (n=38). DTI score was compared with the International Mission for Prognosis and Analysis of Clinical Trials Score.
Using the DTI score for prediction of unfavorable outcome on the training database, the area under the receiver operating characteristic curve was 0.84 (95% CI: 0.75-0.91). The DTI score had a sensitivity of 64% and a specificity of 95% for the prediction of unfavorable outcome. On the validation-independent database, the area under the receiver operating characteristic curve was 0.80 (95% CI: 0.54-0.94). On the training database, reclassification methods showed significant improvement of classification accuracy (P < 0.05) compared with the International Mission for Prognosis and Analysis of Clinical Trials score. Similar results were observed on the validation database.
White matter assessment with quantitative DTI increases the accuracy of long-term outcome prediction compared with the available clinical/radiographic prognostic score.
What We Already Know about This Topic
Traumatic brain injury is a major public health problem, and current methods to predict long-term outcome and resource utilization are not strong
Measuring white matter injury using magnetic resonance diffusion tensor imaging might improve prediction but has not been studied in a multicenter fashion
What This Article Tells Us That Is New
In a multicenter study of 105 patients with traumatic brain injury, diffusion tensor imaging, using a normalization process across different machine types, increased the accuracy of long-term outcome prediction compared with standard clinical and imaging approaches
Severe traumatic brain injury (TBI) represents a major public health burden, generally requiring resuscitation in an intensive care unit (ICU) and prolonged rehabilitation. For patients with TBI, there is considerable uncertainty regarding long-term outcome in terms of a broad range of cognitive, behavioral, and functional impairments. Available methods for the prediction of long-term outcome are inaccurate and unreliable, in part because clinical and electrophysiological evaluations are limited by coma or sedation. As a result, decisions regarding therapeutic intensity and goals of care are commonly made on the basis of limited evidence, leading to a potential mismatch between outcomes and resources mobilized to care for a patient, with associated psychological and financial burdens on patients, their families, and society.1
The most extensively validated scoring system for TBI outcome is the International Mission for Prognosis and Analysis of Clinical Trials (IMPACT) score.2It is based on a multivariate model that combines clinical, biochemical, and computed tomography (CT) variables at admission to provide a probabilistic estimate of the outcome at 6 months. The IMPACT score is accurate in predicting outcomes in populations of patients with moderate and severe TBI but has limited utility in making decisions regarding any individual patient. It has been proposed that clinical decision-making must be based on additional information that reflects the biological heterogeneity of TBI.3
White matter damage, a key feature of TBI, can be identified and quantified with a magnetic resonance imaging (MRI) sequence called diffusion tensor imaging (DTI). Single-center studies have demonstrated the diagnostic and prognostic value of DTI in patients with TBI.4–6However, for these results to be widely applicable, quantitative MRI methods must account for hardware and software disparities within and across institutions. The goal of the current study was to develop and validate, as a first step, an algorithm based on DTI for outcome prediction in severe TBI in a multicenter setup after implementation of a normalization process. We hypothesized that DTI would significantly increase our predictive ability to discriminate between favorable and unfavorable (death, vegetative state, or minimally conscious state) outcomes at 1 yr compared with the IMPACT score.
Material and Methods
The institutional review boards of participating institutions approved the study. Written informed consent was obtained for all study participants (patient’s next of kin during the acute stage, and patients themselves after recovery of consciousness). The protocol was registered on December 2007 (NCT00577954).
We enrolled patients in a prospective observational multicenter cohort between October 2006 and March 2010. Imaging and clinical data from comatose patients with TBI was collected at predetermined time points using agreed upon shared data elements. An outcome prediction model was developed using the information available in the acute setting. We tested the hypothesis that this model can differentiate between patients with favorable and unfavorable outcome with greater accuracy than the IMPACT model alone.
Patients were enrolled in ICUs at 10 participating institutions. Inclusion criteria were (1) adult patient between 18 and 75 yr of age; and (2) inability to follow simple commands that could not be explained by sedation at least 7 days, and not more than 45 days, after TBI. Exclusion criteria were (1) moribund patients (expected survival < 24 h); (2) physiological instability (e.g., due to hemodynamic instability, increased intracranial pressure, and/or rapidly deteriorating respiratory function) that would preclude MRI scanning; (3) contraindication to the MRI; (4) penetrating head injury; and (5) a central nervous system condition such as stroke, brain tumor, or a neurodegenerative disease preceding TBI. Five to 10 healthy volunteers were recruited at each center to serve as control subjects to account for potential variations in DTI values across centers.7
Clinical Data Collection
Using standardized case report forms, data were collected and stored in a central, web-based, encrypted database. These included patient characteristics, initial clinical status, and cranial CT scan; adverse events associated with MRI scanning; 1-yr outcome using the Glasgow Outcome Scale (GOS), the Disability Rating Scale, the extended Glasgow Outcome Score (GOSE), and the modified Rankin Scale. A central study monitor verified all data for accuracy, consistency, and completeness.
Head CT Scan
Head CT scans were performed within 48 h after ICU admission and rated using the Marshall score.8When more than one CT scan was obtained, the scan showing the worst radiologic findings was selected and scored.
An MRI scan was acquired as soon as a patient met inclusion criteria, and the CT scan was clinically feasible. During MRI acquisition, patient sedation, if any, was continued. At the 10 different sites, MRI scans were performed using 12 scanners with either 1.5 or 3.0 Tesla field strengths and from three manufacturers: GE Healthcare (Milwaukee, WI), Siemens Medical Solutions (Erlangen, Germany), and Philips Medical Systems (Eindhoven, The Netherlands). The precise parameters of each sequence were adapted to the individual scanner type, field strength, coil used, and departmental protocol. The following morphologic sequences were acquired: sagittal localizer, axial T2/FLAIR (Fluid Attenuated Inversion Recovery), axial T2, and T2*, 3D inversion recovery T1. In addition, DTI was acquired in an axial plane perpendicular to the main field B0. The DTI parameters used were field of view of 300 mm, matrix size 96 × 96, and slice thickness 3 mm (resulting in nearly isotropic voxels). Gradient (B1) was applied in at least 12 directions (range 12–50) with a value of 1000 mT/m. A series without the diffusion gradient (the B-zero image) was also acquired.
MRI results, including the morphologic sequences and unprocessed DTI, were assembled in a centrally administered imaging core laboratory at the Pitié-Salpêtrière Hospital (Paris, France). The clinical teams treating the patients had access to all MRI results, with the exception of the DTI data.
All MRI scans were reviewed to check for motion and other artifacts. DTI images were preprocessed using the FSL software.9†††††The diffusion tensor was estimated, and the local diffusion parameters, namely fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (L1), and radial diffusivity (Lt), were calculated for the entire brain in each patient and control. These parameters were computed from the three estimated eigenvalues that quantify the parameters of water diffusion in three orthogonal directions.10Correction for distortions caused by Eddy currents was performed using the B-zero images.
To make diffusion measures comparable between individuals, the FA, MD, L1, and Lt maps were registered on a 1 × 1 × 1 mm3standard space image (MNI152 space) using the Tract-based Spatial Statistics procedure.11The whole brain was registered using a nonlinear technique, and individual FA, MD, L1, and Lt values were projected on an alignment-invariant template for the brain. This procedure maps all available information to a common brain template and avoids misalignment between subjects.
The regions-of-interest (ROIs) for DTI analysis were selected from the atlas designed by Mori et al .,12the so-called ICBM-DTI-81 white matter atlas. This atlas, which is included in FSL, consists of 48 white matter tracts. For the purpose of this analysis, these were merged in 20 larger regions shown in figure 1and were used to extract the diffusion parameters. For each patient and control, the average values of FA, MD, L1, and Lt in these 20 ROIs resulted in 80 DTI biomarkers for each subject.
The regional DTI parameter extraction consisted of three steps: a nonlinear registration of FA map to a template (provided by FSL), a projection of FA onto the FA template skeleton representing the centers of all tracts (also provided by FSL), and averaging of FA measures within the 20 ROIs restricted to the skeleton. The resulting maps were checked by Dr. Galanaud; any patient with major distortion was excluded.
DTI Parameters Normalization
Ninety-nine normal controls underwent the same imaging protocol as that used for the patients. All were free of previous neurological diseases and gave written informed consent to participate in the study. Given the variability in raw DTI values, which arises because of differences in MRI vendors, scanners, and field strengths across centers, a normalization procedure had to be performed. The raw value of each derived diffusion parameter was standardized using the data from control subjects for each center as described below.
The raw DTI parameter values for each patient were normalized with respect to the average of each parameter measured in the control group for that center. Specifically, for each patient, the raw FA, MD, L1, and Lt values in each of the 20 preselected ROIs were divided by the corresponding mean value for the control group from his or her center.
Outcomes were determined via a telephone interview conducted by the investigative team at each participating center. The principal outcome was modified GOS 1 yr after injury. For this research, we modified the GOS (as previously described)6by dividing the score of 3, which is a heterogeneous category of severe motor and cognitive disabilities, into 3− and 3+ subcategories. The 3− subcategory denotes the minimally conscious state ( as defined by Giacino et al .13); the 3+ score represents severe disability excluding minimally conscious state. A score of 1, 2, or 3− at 1 yr was classified as unfavorable, whereas higher scores (3+, 4, and 5) were classified as favorable.
Three investigators (Drs. Galanaud, Puybasset, and Sanchez) performed an audit of recorded clinical outcomes to check for data completeness, accuracy, and consistency.
Outcome Prediction Algorithm
DTI variables (specifically, the radial, axial and MD, and FA) from the 20 preselected regions were integrated with the following eight IMPACT score variables: age, motor score, pupillary reactivity, hypoxia, hypotension, CT classification, subarachnoid hemorrhage, and mass effect from epidural hematoma. Each patient was associated with a class label (favorable or unfavorable outcome), and the above 88 DTI and clinical variables.
Support Vector Machine (SVM), a supervised learning method, was used for classification.14This classification method is known to maintain its reliability when the number of features is close to the number of subjects. SVM classification, implemented via the libsvm library,14‡‡‡‡‡is based on two main concepts: decision planes and the nonlinear mapping. Given a set of input vectors, where each vector is also associated with one of two class labels (e.g., favorable and unfavorable), the goal of classification algorithm is to find the optimal surface that maximizes the margin between the two classes. When this surface is a plane, we get a linear classifier. However, the surface that separates the two classes may not always be linear. To overcome this limitation, SVM transforms the input vectors into a high-dimensional space using a kernel function in such a way that the two classes are separable by a linear hyperplane. The algorithm fits the maximum-margin hyperplane in the transformed feature space. The goal of the SVM algorithm is then to optimize the parameters of the kernel function to enable the search for such a maximum-margin hyperplane.
Because of the heterogeneity of the GOS 3 patients, the SVM training process was computed on the multicentric cohort without the GOS 3 patients (n = 73 patients). The relevant variables and the optimal SVM kernel parameters were selected by a joint stepwise and grid search procedure using cross-validation (leave-one-out) maximizing the classification accuracy. In addition to the predicting class label, the classification process also assigned to each patient an estimated probability that he or she belongs to the class of favorable or unfavorable outcome. This probability was termed the DTI score. The IMPACT score was also computed for each patient and compared with the DTI score.
Thirty-eight patients and 15 controls described in a previous study6were used as a validation dataset. MRI examinations were performed using the same DTI acquisition parameters but on a different MRI unit. They were blindly processed by an external observer (Dr. Dinkel) using the DTI classification method. The classification model selected with the training database (see previous paragraph) was used to evaluate the DTI score of the patients of this validation dataset. IMPACT scores were also calculated for all patients of this dataset.
Date are expressed as mean + SD or median (scores). Comparison of two proportions was performed using the chi-square test, comparison of two means was performed using the Student t test, and comparison of several means was performed using one-way multivariate analysis of variance.
The ability of the IMPACT and DTI scores to discriminate between favorable and unfavorable outcomes at 1 yr was evaluated and compared by the area under the receiver operating characteristic (ROC) curve analysis.15The sensitivity of the classifier was calculated for 95% specificity for unfavorable outcome prediction. In addition, we computed net reclassification improvement and integrated discrimination improvement indices to compare our DTI score with IMPACT16by using R Software.§§§§§Confidence interval (95 % CI) of each indice (area under the ROC curve, net reclassification improvement, and integrated discrimination improvement) was provided by bootstraping the studied populations. This provided a large sample of each index, and thus the median and its associated 95% CI.
All comparisons were two tailed, and a P value of less than 0.05 was considered significant.
Study Population and Enrolment Pattern
Of 167 patients enrolled, 33 were excluded from analysis because of suboptimal MRI acquisitions; 19 could not be processed because of a lack of acquisitions of healthy volunteers in two centers. An additional 10 patients were lost to follow-up at 1 yr (fig. 2). Clinical and imaging characteristics of patients are summarized in tables 1and 2. Patients were predominantly young adult males with severe TBI at admission. MRI was performed on average 21 ± 9 days after trauma (20 ± 9 days for patients with favorable outcome and 23 ± 11 days for patients with unfavorable outcome, P = 0.08). The automatic segmentation software accurately recognized the 20 ROI in all 105 patients, even in the presence of intracranial or subdural hematoma, midline shift or decompressive craniotomy. No adverse events related to MRI scanning were reported.
During the 12-month follow-up period, 21 patients (20%) died, 14 in the ICU and 7 after ICU discharge. At 1 yr, 40 patients (38%) had an unfavorable GOS; of these, 21 were dead, 5 were in a vegetative state, and 14 were minimally conscious.
For the 99 controls, DTI measures were extracted in each of the 20 preselected regions. The mean DTI measures for the controls were significantly different (P < 0.001) between centers before the normalization procedure (data not shown). This variability in raw DTI values was due to differences in MRI vendors, scanners, and field strengths.
Unfavorable Outcome Prediction
The ROC curves for the prediction of unfavorable GOS are shown in figure 3. The best model for prediction of unfavorable outcome used only 32 of the 88 parameters available (80 DTI and 8 clinical) that are summarized in table 3. As can be seen, all parameters used by this model were indeed DTI metrics.
For the cohort of patients used for deriving the prediction model (training base), area under the ROC curve increased from 0.64 to 0.84 (table 4) when the DTI score was used instead of the IMPACT score (P < 0.001). The sensitivity for predicting unfavorable outcome, with a specificity of 95%, was 64%. Net reclassification improvement and integrated discrimination improvement indicated also a significant improvement with the DTI score compared with the IMPACT score (P < 0.001).
The ROC curve was also plotted using the above-derived prediction model on the independently acquired database of 38 patients with severe TBI (testing base). The area under this ROC curve was 0.80, and the overall shape of the curve was similar to the one obtained from the training database. Significant improvements from IMPACT to DTI score were also observed for net reclassification improvement (P = 0.04) and integrated discrimination improvement (P = 0.009). Figure 4shows the likelihood of unfavorable outcome as a function of the DTI score in the training database. The sigmoidal shape of the DTI score is unmistakable and suggests that low and high scores, an outcome can be assigned with high specificity.
The DTI scores of the patients who survived with unfavorable outcome was not significantly different from the scores of those who died in the ICU (P = 0.75). This indicates a lack of systematic selection bias introduced in the ICU by the availability of morphologic MRI.
Figure 5shows morphologic MRI scans, DTI images, and individual FA values for the 20 regions in two patients, one with favorable outcome and the other with unfavorable outcome. The morphologic images for both patients show widespread signal abnormalities secondary to TBI. The FA values for the patient with favorable outcome are nearly normal, with the exception of subcortical regions 9, 10, and 18. On the other hand, the patient with unfavorable outcome has markedly decreased FA values (i.e ., in the lowest quartile of patients with unfavorable outcome) in all 20 regions. The DTI score for the first patient was 0.17 compared with 0.92 for the patient with unfavorable outcome.
The task of predicting long-term outcome in severe TBI is challenging. Patients with similar clinical and radiologic characteristics in the acute phase may have markedly different outcomes ranging from death to complete recovery, with intermediate states of impaired consciousness or neuropsychological dysfunction. Neurological examination is hindered in the acute setting by factors such as endotracheal intubation, sedation, and systemic metabolic alterations. Such examinations, therefore, lack accuracy and reliability in measuring the severity of TBI and in predicting outcome. The prognostic value of cranial CT scan is also limited.16
The IMPACT score, based on a multivariate model combining clinical, biochemical, and CT scan characteristics, has been developed from a dataset of more than 9,000 patients and has been validated externally against similar large patient populations.17However, this score does not have the discriminative power needed for clinical decision-making in individual patients.3
A number of studies have described the utility of conventional MRI sequences,18–20diffusion-weighted imaging,21,22DTI,23,24and susceptibility-weighted imaging25,26in patients with TBI. This literature demonstrates the superiority of MRI over CT for the visualization of lesions.18It is also known that damage to critical areas such as the brainstem5,27or corpus callosum24is indicative of poor prognosis. However, current imaging methods do not allow one to reliably predict the long-term clinical outcome of an individual comatose patient.
There is extensive evidence that diffuse axonal injury is a hallmark of severe TBI. Recent experimental data indicate that white matter abnormalities detected using DTI correlate closely with neuropathological evidence of diffuse axonal injury.28,29In a prospective study, Sidaros et al . evaluated 30 patients with severe TBI at 8 weeks and 12 months after injury. They found that decreased regional FA at 8 weeks was predictive of unfavorable outcome at 12 months. DTI evidence of white matter damage is more marked in subjects with moderate and severe TBI compared with those with mild TBI, with longlasting changes detectable years after injury.30,31
Our prospective multicenter cohort is the largest study so far to demonstrate that the extent and severity of white matter damage evaluated in the acute setting is a major predictor of outcome after severe TBI. Based on this observation, we have developed a prognostic model that integrates quantitative diffusion variables into a composite DTI score for predicting outcome. This score, which is based solely on DTI variables, has a better prognostic accuracy than the IMPACT score that uses clinical and CT variables.
For prediction of unfavorable outcome, the area under the ROC curve was 0.64 for the IMPACT score; this metric increased to 0.84 when DTI score was used for prognostication. A high specificity, potentially at the expense of a lower sensitivity, is critical for the prediction of unfavorable outcome. The ROC analyses presented in this article have these characteristics and suggests that patient-level predictions of outcome are feasible in the acute setting.
Ideally, brain tissue diffusion measurements should be independent of the MRI scanner and image acquisition parameters. However, we noted significant variations in apparent diffusion coefficient and FA values across individual scanners and at different sites. The variance was significant enough to seriously undermine the generalizability of our results and led us to implement a normalization step via control subjects at each center. Our normalization approach differs from the more widely used practice of scanning the same control subject(s) on all scanners, which is impractical in a multicenter study spanning a large geographical area. In addition, use of the same control subjects does not lend itself to the development of a broadly applicable outcome prediction algorithm that any center can implement using its own set of control subjects.
Some limitations of this study should be noted. First, the variables used in the IMPACT score were collected in the hyperacute phase (<48 h after injury), whereas MRI data were acquired at a later time point (an average of 3 weeks after TBI). Second, outcome evaluation at 1 yr was performed through a telephone interview conducted by each participating center. A central review was performed on the final data to correct for center-specific bias. Although direct clinical examination would have provided more detailed and accurate data, this was logistically prohibitive because of cost, geographical distance, and severe handicaps affecting some of the patients. Other studies have shown that remote evaluation of patients with altered consciousness provides acceptable accuracy while minimizing time and cost in data gathering.32Third, we used a modified version of the GOS that has not been independently validated; nevertheless, we believed that this modification is essential to account for the fact that the “severe disability” (GOS 3) is inherently heterogeneous. Despite the difficulty of dichotomization, an issue that is intrinsic to much clinical outcome research, it is not a fundamental limitation. The methodology used here can also be used to predict the outcome probability on any given measurement scale. A scale with finer gradations and granularity (e.g ., the extended GOS) will need an increased cohort size and will necessitate a larger, multicenter research consortium.
This study demonstrates that it is feasible to build a standardized, generalizable prediction system for long-term neurological outcome in critically ill comatose patients. It further establishes the feasibility of multicenter normalization to overcome data heterogeneity in quantitative DTI studies. Based on these results, one can envision a public-domain expert system that would allow a clinician anywhere in the world to upload basic clinical information and DTI data pertaining to a particular case; the system could check the completeness and acceptability of the information and provide the probability of various outcomes on a user-selected outcome scale. Other models will need to be evaluated including ones exploiting the differential association between selected white matter tracts and specific outcomes. In addition to outcome prediction, such a database could also be analyzed for other purposes including research on the genetic basis for variance in brain injury outcomes and mechanisms of postinjury repair and plasticity.
Appendix: Supplementary List of Investigators
Didier Dormont, M.D. (Professor, Department Neuroradiology, Pitié Salpêtrière Hospital, Paris, France); Lamine Abdennour, M.D. (Doctor, Neurosurgical ICU, Pitié Salpêtrière Hospital, Paris, France); Delphine Leclercq, M.D. (Doctor, Department Neuroradiology, Pitié Salpêtrière Hospital, Paris, France); Pascale Poete, M.D. (Doctor, Neurosurgical ICU, Pitié Salpêtrière Hospital, Paris, France); Bernard Riegel, M.D. (Doctor, Neurosurgical ICU, Roger Salengro Hospital, Lille, France); Benoit Tavernier, M.D., Ph.D. (Professor, Neurosurgical ICU, Roger Salengro Hospital, Lille, France); Patrice Jissendi, M.D. (Doctor, Department of Neuroradiology, Roger Salengro Hospital, Lille, France); Christine Delmaire, M.D., Ph.D. (Doctor, Department of Neuroradiology, Roger Salengro Hospital, Lille, France); Jean-Pierre Pruvo, M.D., Ph.D. (Professor, Department of Neuroradiology, Roger Salengro Hospital, Lille, France); Philippe Gouin, M.D. (Neurosurgical ICU, Centre Hospitalier Universitaire, Rouen, France); Pierre Gildas Guitard, M.D. (Neurosurgical ICU, Centre Hospitalier Universitaire, Rouen, France); Emmanuel Gérardin, M.D., Ph.D. (Professor, Department of Neuroradiology, Centre Hospitalier Universitaire, Rouen, France); Guillaume Perot, M.D. (Doctor, Department of Neuroradiology, Centre Hospitalier Universitaire, Rouen, France); François Sztark, M.D. (Neurosurgical ICU, Centre Hospitalier Universitaire, Bordeaux, France); Vincent Dousset, M.D., Ph.D. (Professor, Department of Neuroradiology, Centre Hospitalier Universitaire, Bordeaux, France); Alain Boularan, M.D. (Neurosurgical ICU, Guy de Chauliac Hospital, Montpellier, France); Pierre François Perrigault, M.D. (Neurosurgical ICU, Guy de Chauliac Hospital, Montpellier, France); Emmanuelle Le Bars, Ph.D. (Neuroradiology, Guy de Chauliac Hospital, Montpellier, France); Alain Bonafé, M.D., Ph.D. (Neuroradiology, Guy de Chauliac Hospital, Montpellier, France); Claire Charpentier, M.D. (Neurosurgical ICU, Centre Hospitalier Universitaire, Nancy, France); Antoine Baumann, M.D. (Neurosurgical ICU, Centre Hospitalier Universitaire, Nancy, France); Claudio Di Roio, M.D. (Neurosurgical ICU, Pierre Wertheimer Hospital, Lyon, France); Dominique Sappey-Marinier, Ph.D. (Research Director, CERMEP, Lyon, France).