THE Neurologic and Adaptive Capacity Score (NACS) was described in 1982 by Amiel-Tison et al. 1It was developed to evaluate the neurobehavior of term, healthy newborns; specifically, to detect central nervous system depression from drugs administered to the mother during labor and delivery and to differentiate these effects from those associated with perinatal asphyxia and trauma at birth.
The 20-item instrument contains items from the Brazelton Neonatal Behavioral Assessment Scale (NBAS), 2the Scanlon Early Neonatal Neurobehavioral Scale (ENNS), 3and the Amiel-Tison Neurologic Evaluation. 4Several review articles have described these tests in detail and compared the NBAS and ENNS to the NACS. 5–7The NACS emphasizes muscle tone more than the ENNS and NBAS. It takes less time to complete without subjecting neonates to aversive stimuli such as pinpricks and repeated Moro maneuvers.
The original article 1provided instructions on how to use the scale, outlining the appropriate testing environment, the state of the neonate at the time of testing, and the order in which to conduct the test items. Instructional text and photographs illustrated the correct technique for performing the assessments.
The NACS items are organized into two scales: adaptive capacity and neurologic assessment. The latter is further divided into four subscales: passive tone, active tone, primary reflexes, and a general neurologic status assessment. Items are scored 0 (absent or grossly abnormal), 1 (mediocre or slightly abnormal), or 2 (normal), for a maximum score of 40. A score of ≥ 35 was arbitrarily deemed to be normal. 1
The NACS has been used extensively for research purposes during the past 16 yr to evaluate the neurobehavior of neonates. We undertook this systematic literature review to examine how it has been used in obstetric anesthesia research and to determine whether there is evidence that it is a reliable and valid tool for detecting drug effects in neonates. In addition, we assessed whether the scale had been used as originally described, paying particular attention to the timing of the tests and the neonatal environment.
The review included published articles that met the following inclusion criteria: English language, NACS administered in its original form, and NACS used to assess neonatal outcome after maternal administration of medications during labor and vaginal or cesarean delivery. The following databases were used for computer-assisted searches of published literature from 1982 to 1997: MEDLINE using Pubmed, EMBASE, Scientific Citation Index, the Cochrane Library, CINAHL, and HealthStar. “Neurologic and Adaptive Capacity” and “NACS” were searched as text words, with NACS limited to human studies. The subjects of neurobehavior and neurologic examination were limited to studies on newborn infants. The “see related articles” option was used in Pubmed when relevant articles were identified. The references of selected articles and textbook chapters on neonatal neurobehavioral assessment were useful in identifying some articles not found through computer searches. The authors’ personal files were also reviewed.
A computerized database was developed to record the data of interest in the articles. This consisted of the publication date and country of origin, type of medication administered, method of delivery, and which neonatal outcomes were measured. Details of study design, including sample size, number of comparison groups, number of study dropouts, randomization of subjects, and blinding of data collectors, were recorded.
To determine whether the instrument was used as the original article intended, we compared information about the testing times and environment with the instructions provided in the article. We also noted whether preterm or breech neonates were included in the analyses.
To establish the reliability of the test, we recorded information about the training of those who performed the NACS, including the training process, inter-rater reliability, and whether a single scorer was used for the entire study.
We sought to verify the validity of the scale by determining whether a dose–response relationship could be demonstrated between the dose of medication administered and the NACS. We noted if differences in the NACS were detected in studies with a no-drug control group. When opioids were administered to the treatment group, we observed whether the NACS differed significantly between control and treatment groups, because opioids are known to depress the central nervous system. We examined whether studies correlated the NACS with umbilical concentrations of the drugs administered to the treatment group. Finally, we observed whether the NACS increased over time in the same sample of newborns.
To determine whether the NACS was a sensitive test that could differentiate between groups of subjects, we sought statistically significant differences in the NACS and other indicators of neonatal well-being. The results were tabulated, and descriptive statistics were used to summarize the characteristics of the studies.
The NACS has been used to evaluate fetal exposure to a variety of medications used in labor and delivery (table 2). Thirty studies (42%) included neonates that had been delivered by cesarean section alone, 8 (11%) included spontaneous vaginal or instrumental deliveries alone, 28 (39%) included a combination of vaginal and operative deliveries, and the remaining 5 (7%) did not report the method of delivery. Breech presentation or malpresentation was listed as an indication for cesarean section in 7 (10%) of the investigations. 8–14It was not stated how the data from these newborns were used in the analysis of the NACS results.
The sample sizes for the studies are presented in table 3. The number of neonates tested was not reported in 15 (21%) of the studies. 15–29In the studies in which the number of neonates tested was reported, 20 (36%) did not test all babies whose mothers participated in the study. 10,11,30–47Both mothers and neonates were divided among two to eight groups in 66 studies, with a two-group design being the most common (n = 44; 67%). Five case series (7%) contained only one treatment group. All newborns that underwent NACS testing were considered to be at term gestation. However, term gestation was not defined in 41 (58%) of the studies, and some included 36 weeks’ gestation. 10,18,31–36,38,48
Sixty-one (86%) of the studies were randomized controlled trials. The methods used to randomize subjects were described in 15 reports (25%). 18,25,36,40,42,48–57Of the remaining studies, 5 (7%) were clinical trials without randomization of subjects, 9,13,58–60and 5 (7%) were case series. 8,16,61–63
Twenty-one (30%) studies gave no details about the NACS administration other than the times that assessments were completed. 20,28–31,33,34,38,43,45,48,50,53,61,63–69The most common schedule was at 2 and 24 h after birth, omitting the recommended measurement in the delivery room at 15 min 1(table 4). Specific details about the testing environment were provided in one article. 15The investigators stated that attempts had been made to conduct all tests in a dimly lit room at a constant temperature. The neonate’s initial spontaneous state was also noted, and the 24-h assessment was performed between feedings. In another study, the 24-h assessment was postponed until 2 h after the last meal. 10
The NACS examiner was blinded to treatment group and obstetric history in 57 (80%) of the studies. The qualifications or training of the NACS examiner was described in 46 (65%) of the reports. The most common examiners were pediatricians or pediatric nurses (n = 18; 25%), or anesthesiologists, anesthesia research fellows, and anesthesia residents (n = 12; 17%). The examiners were said to be “trained” in 13 (18%) of the studies, 12,22,26,27,40,44,46,47,52,58,59,62,70with two (3%) providing details about the training. 22,58An additional three (4%) described the observers as “qualified,”54,56,60and one (1%) reported that the observer was “experienced” in performing NACS assessments. 55In 16 articles (23%), it was explicitly stated that one examiner conducted all NACS testing. 9,11,14,21,35,36,39,54,55,70–76None of the remaining studies reported the NACS inter-rater reliability.
We identified three studies in which the control group of neonates had not been exposed to any medications. The unmedicated groups consisted of 112, 19, and 15 neonates, respectively. 47,59,60In the first study, the investigators reported the mean NACS without specifying the range or SD. 60At 2–4 h after birth, the mean NACS was 33.5, and at 24 h it was 36.1. The NACS for unmedicated neonates was not significantly different when compared with neonates whose mothers had received various doses of intravenous fentanyl from 50 μg to > 200 μg. The other two studies reported the percentage of good scores (2/2) on each test item. No differences were found between neonates whose mothers had received no medications and those who had received one of three local anesthetics 47or alphaprodine. 59In all three studies, the total NACS for control and treatment groups improved from the 2-h to the 24-h assessment. By the 24-h assessment, control and treatment groups achieved similar scores.
Three studies looked for a relationship between umbilical drug concentration and the NACS. There was a negative correlation between the NACS and umbilical vein propofol concentration at 15 min (r not reported;P = 0.01) but not at 2 and 24 h in one investigation. 21There was no correlation between the NACS and umbilical vein fentanyl levels with the combined spinal–epidural technique (r = 0.06;P > 0.05). 61In the third study, there was a weak positive correlation between the fetal plasma concentration of rocuronium and the NACS (r = 0.23;P > 0.05). 63
A group of researchers demonstrated a relationship between the intravenous fentanyl dose received by the mother and the umbilical cord fentanyl concentration (r not reported;P < 0.05). 60NACS were similar for neonates in the control group that had no drug exposure and the treatment groups that had been exposed to varying doses of fentanyl. Other investigators found a weak relationship between the maternal epidural dose of fentanyl per kilogram of body weight and the NACS at 15 min (r = 0.12;P = 0.30), 2 h (r = 0.11;P = 0.37), and 24 h (r = 0.10;P = 0.39), 10but these results were not statistically significant.
In the 31 opioid studies, few significant differences were found between treatment and control groups. One study reported superior NACS at 2 h with intrapartum administration of epidural bupivacaine alone compared with bupivacaine with fentanyl added (P < 0.05 by chi-square test, comparing the number of babies with NACS < 35 in both groups). 33In this same study, median NACS results did not differ by the Mann–Whitney U test. Another group of investigators found that epidural bupivacaine alone and bupivacaine with sufentanil yielded higher NACS than bupivacaine with fentanyl. 53This difference was found at the 24-h assessment (P = 0.02), with no differences noted at 15 min or 2 h. The NACS of the group that received bupivacaine with fentanyl decreased slightly from the 2-h to the 24-h assessment.
It was not possible to assess whether the results of other opioid studies trended toward statistical significance because P values were not reported. There were instances where the group that received opioids had higher, although not statistically different, neurobehavioral scores than controls. 10,45,72
Sixty-six (93%) of the studies included more than one NACS examination. Of these, 10 studies (14%) did not report the NACS results. The NACS improved over time in 46 (82%) of the remaining studies. Scores decreased over time in at least one of the groups in 10 studies (18%). 18,22,26,27,33,45,53,58,59,64
Of the 71 studies reviewed, nine (13%) reported statistically significant differences in total NACS between groups, 14,33,53,55,58,66,71,75,76an additional two (3%) reported significance in individual items in the scale, 15,59and one (1%) found significance when comparing subscales. 10
Most investigators obtained Apgar scores (n = 70; 99%) and umbilical cord acid-base measurements (n = 57; 80%) in addition to neurobehavioral testing to assess neonatal condition. Less frequently, fetal heart rate, umbilical drug concentrations, time to sustained respirations, need for ventilatory support, naloxone requirements, and presence of meconium were reported. Significant differences in these indicators are shown in table 5.
With the exception of one study that reported significantly higher NACS along with significantly higher Apgar scores, 71those that found significant differences in the NACS between groups did not report differences in other indicators of neonatal well-being.
Publication of the original description of the NACS was accompanied by two editorials. 79,80Tronick 79predicted that researchers would find that the scale lacked sensitivity and that it would fail as a research tool. Michenfelder 80recognized that the work was in its developmental phase and challenged readers to apply and evaluate the examination such that its validity, sensitivity, and merits could be determined. Since then, only one study has specifically assessed the validity of the scale, 58and another assessed inter-rater reliability 81; however, both of these studies were small and limited in scope.
The use of the NACS as a research tool to evaluate the neurobehavior of term, healthy newborns—specifically, to detect central nervous system depression after intrapartum drug exposure—is widespread. However, few studies have reported statistically significant differences in the total NACS. We propose three explanations for this finding: (1) the simplest explanation is that differences did not exist, i.e. , that the medications used did not influence neonatal neurobehavior; (2) limitations in the design, execution, and sample size of the studies may have resulted in type II errors, the incorrect conclusion that no differences existed; and (3) the tool itself may be too insensitive to detect small effect sizes or deviations from normal when they exist.
Design issues in some of the studies may have compromised the ability of the NACS to detect small drug effects if they existed. Limitations included small sample sizes, the inclusion of fewer neonates than mothers, inappropriate inclusion of preterm and breech newborns, and lack of attention to the environment and timing of the tests.
Kuhnert et al. 7noted that in studies of neonatal behavior, the influences of obstetric and pharmacologic confounding variables on neonatal outcome are often not considered. Examples of obstetric variables include maternal age, gravidity, length of labor, cord vein p H, and delivery methods. Examples of pharmacologic variables include the interval between drug administration and delivery, cord vein drug levels, and the presence of active metabolites.
Because the NACS was not the primary outcome being measured in many of the studies, exclusion criteria for conducting this assessment were seldom specified. For instance, according to Amiel-Tison et al. , 1a fetal breech presentation contraindicates the testing of passive tone in the lower extremity and affects neonatal reflexes such as automatic walking. It was more common for investigators to list exclusion criteria for admitting cesarean section patients into the study than to mention the indications for operative deliveries. However, when the indication for cesarean section was explicitly stated, breech presentation was commonly included, with no mention that these neonates were excluded or treated differently in terms of administering or analyzing the NACS. 8–14
It is important that the neonate be at term gestation when using the NACS because the development of reflexes and muscle tone depends on postconceptual age. Term gestation was defined by Amiel-Tison et al. 1as > 37 weeks. The inclusion of neonates as young as 36 weeks’ gestation may have compromised the validity of the assessments by reflecting deficits related to age rather than drug effects. We recommend that if the NACS is applied to newborns at < 38 weeks’ gestation, care must be taken to ensure that all groups have an equal proportion of newborns that are not full term.
Although no clinical or scientific rationale for the proposed assessment times are offered in the original article, more than half of the studies deviated from these times without reporting the motivation for doing so. Amiel-Tison et al. 1recommend that the 24-h assessment be performed only if abnormalities are detected at 2 h. It seems likely that if drug effects are present in the neonate from drugs administered to the mother during labor and delivery, they will more likely be detected shortly after birth than after 24 h have passed. However, it was the 15-min assessment that was omitted most commonly. Perhaps investigators believed that this delivery-room evaluation would have conflicted with the normal newborn care that takes place during that time or interfered with early bonding between mother and baby.
Only one of the studies reported that the environment where testing took place was controlled. 15However, the potential influence of the environment on test results cannot be ignored. Another group of researchers retrospectively noted that the low NACS in all groups of their study were likely related to environmental factors at the hospital, including a high level of ambient noise and light stimulation that caused rapid habituation to light and sound. 52
A 20-item scale is optimal in terms of its reliability. 82Having too few items threatens reliability, and too many results in increased time, expense, and rater and subject fatigue. Inter-rater reliability was performed on 61 newborn infants by two observers who were trained by the original investigators. 1The blinded observers simultaneously and independently scored these newborns at three time points (15 min, 2 h, and 24 h). They reported an inter-rater reliability of 92.8%. The method for calculating this value was not mentioned, nor were the confidence limits provided. No other form of reliability was tested.
In addition to the reliability testing conducted by the instrument’s developers, one other study tested the inter-rater reliability of the NACS. In this report, the NACS was not used to detect drug effects as it was originally intended, but, rather, to examine the neurologic and adaptive capacity of newborn infants born to mothers with clinical depression. Four pediatricians independently rated five newborn infants, resulting in intraclass correlation coefficients for the subscales between 0.91 and 1.00. 81However, the small sample size of this study limits the ability to determine the 95% confidence interval for the lower limit of reliability.
A potential threat to the reliability of the NACS is the possibility of individual interpretation of the written instructions. For example, the article states that the neurologic portion of the examination “can always be completed even if the infant is lethargic, irritable, or almost inconsolable.” For testing passive tone, however, it is suggested that the infant should be quiet, and for active tone it is recommended that testing of the neck extensors be postponed until the child is consoled. 1
To maintain objectivity with testing, the examiner should be blinded to the obstetric history and treatment group. Research data will be more reliable if a single examiner is used throughout the study or if multiple examiners who have demonstrated inter-rater reliability are trained to perform the assessment.
The NACS developers assessed validity by comparing the NACS to the ENNS, the most commonly used neurobehavioral test at the time. 1Of the examinations in which neonates scored high on the NACS, 92% scored “equally well” with the ENNS. The investigators attributed this to similarity between items contained in the two scales. The methodology was not provided, and the confidence limits were not stated. Correlations of the NACS with clinical outcomes to confirm validity were not reported.
Although the validity of the NACS when compared with the ENNS by the instrument’s developers was satisfactory, Tronick 79criticized this approach and argued that given the similarity of the two instruments, it did not provide strong evidence of validity. The NACS has 9 of its 20 items in common with the NBAS and 6 in common with the ENNS. Four subsequent studies have administered the NACS along with the NBAS 52or the ENNS. 15,22,51Three of these did not detect significant differences between treatment and control groups with either of the instruments used. 22,51,52These results support the validity of the NACS, i.e. , similar results were reported using the experimental instrument and the established instruments (NBAS and ENNS). In contrast, the fourth study reported significant differences in 10 of the 20 NACS items between groups, whereas no significant differences were found with the ENNS, despite the two instruments being similar. 15
A benefit over existing instruments that the developers of the NACS had hoped to achieve was to have “a single number (like the Apgar score) that [could] immediately identify a depressed or vigorous neonate.”1When the NACS was first published, a score of 35–40 was arbitrarily chosen to indicate a neurologically vigorous neonate. The investigators explained that future experience with large numbers of neonates might necessitate alterations in these indicators. 1Since 1982, several investigators have questioned the clinical significance of a NACS < 35, 14,21,53,68,74,75even when differences between groups were statistically significant. When as many as half of the NACS were less than the arbitrary cutoff of 35, neonatal outcome has been described as excellent 9; other investigators have described scores > 30 as “reassuring.”25To date, standardization of the NACS instrument with normal neonates that have not been exposed to any medications in utero has not been achieved, and the normal range for the NACS has not been determined.
The NACS was devised to “differentiate between the infant who has drug-induced depression and one whose depression results from asphyxia, birth trauma, or neurologic disease.”1It is not clear from the instrument’s original directions how this differentiation is to be made. In a critique of the NACS, it was observed that a score of < 35 might suggest drug effects, intracranial hypertension, or prolonged head compression. 6
In general, comparisons between studies were difficult because the NACS results were analyzed and reported in numerous ways. Some investigators reported the mean, median, or modal scores, whereas others calculated the percentage of good scores (2/2) on all tests or the number or percentage of scores greater than or less than a given value (usually 35). With the latter method, information is lost when a range of scores is restricted to two categories (greater than or less than). This is concerning because the cutoff value was selected arbitrarily.
If the NACS is a valid tool for detecting drug effects, the umbilical drug concentrations of central nervous system–depressing drugs should correlate negatively with the score. A study that evaluated propofol clearly demonstrates such a relationship. 21Studies evaluating combined spinal–epidural opioid 61and rocuronium 63did not.
We would expect that the effects of maternally administered medications on neonates would decrease with time after birth. Correspondingly, the NACS would increase. We found that the NACS decreased over the 24-h testing period in 10 studies. Although P values were not reported, and measurement error may be a factor, this inconsistent result undermines the instrument’s validity.
Of the 31 opioid studies examined, 2 provided some additional support for the validity of the NACS. One study compared neonates who were exposed to fentanyl in utero with those that were not. 33Median NACS results were similar (35 vs. 36;P = not significant) in the two groups. When the number of babies who had “low” NACS (< 35) was compared between groups (13 of 46 in the fentanyl group, 6 of 50 in the unexposed group), a statistical difference was found (P = 0.046). However, the marginal P value and the fact that two statistical tests were performed on the same data undermine the validity of this finding. A second study compared newborn infants who were exposed to either fentanyl (n = 14) or sufentanil (n = 9) in utero to those whose mothers received no epidural opioid (n = 13). 53In all groups, the NACS increased slightly over time, although statistical comparisons were not performed. This study reported a statistically significant difference in the NACS between infants exposed to fentanyl and those who were not at 24 h (P = 0.02), but not at 15 min or 2 h of age, when the fentanyl concentrations should have been highest. There was no difference between NACS in neonates exposed to sufentanil and the control group. The small number of subjects and the large number of comparisons between groups (nine total) limit interpretation of the data.
With the breech neonate, Amiel-Tison et al. 1suggest that the scores for popliteal angle and recoil of the lower limbs be omitted and, instead, scores for scarf sign and recoil of elbows be included twice. This is justified by the statement “…if passive tone is normal in the upper limbs, one can assume it would also have been normal in the lower limbs had the baby been born in the vertex position.” If this is true, there may be redundancy within the scale. However, if the statement were false, including breech neonates would compromise the validity of the results.
The subscales for the NACS were based on the developers’ clinical expertise and observations. 1They questioned whether future experience with the instrument would indicate that low scores within a particular category of assessments may be more revealing than the total NACS. Examination of individual items or subscales may be preferable, because there is no evidence that all assessments should be assigned the same value or weight. To date, the factor structure of the instrument has not been confirmed, and homogeneity within the subscales has not been reported, nor has the correlation of the subscales to the total NACS.
Recently, investigators evaluated whether intrapartum analgesia reduces the effectiveness of breast-feeding. Some investigators have suggested that breast-feeding may be more successful if the frequency of epidural analgesia was reduced. 83One investigator proposed that the opioid sufentanil may be superior to fentanyl with respect to breast-feeding. 84The results of neurobehavioral testing in general, and the NACS in particular, have been used to support these contentions. Therefore, it is as important today, as in the past, to have a valid measure of neurobehavior.
In 1985, Abboud et al. 58noted that none of the studies reporting NACS results detected an adverse response to anesthesia. They evaluated the validity and the sensitivity of the NACS examination by comparing the scores of neonates whose mothers had received general or regional anesthesia. The study had a three-group design: general, n = 20; spinal, n = 18; epidural, n = 14. NACS were significantly lower at 15 min and 2 h (P < 0.05) in neonates delivered with general anesthesia than those delivered with epidural or spinal anesthesia. Based on this finding, the investigators concluded that the scale was valid and sensitive. Subjects were not randomized, and the NACS examiner was not blinded to the treatment group.
A similar study performed in 1992 randomized equal numbers of women to receive epidural, spinal, or general anesthesia. 55The sample size was larger (N = 90), and a single, experienced examiner who was blinded to the treatment group conducted all NACS evaluations. This study also found significantly lower NACS at 15 min (P < 0.001) and 2 h (P < 0.01) with general anesthesia. These results support the sensitivity of the NACS by demonstrating its ability to differentiate between the neurobehavioral effects of general and regional anesthetic techniques.
It is not clear how the sensitivity of the NACS compares with other measures of neonatal well-being. Because the NACS was also devised to differentiate between depression resulting from drugs and the effects of birth trauma and asphyxia, Apgar scores, fetal heart rate, or umbilical acid/base measures might also vary with the NACS. It would be reasonable to expect that a significantly depressed baby would be more likely to require naloxone and ventilatory support, have a longer time to sustained respirations, and a higher umbilical drug concentration.
Only one study, which compared two volumes of intravenous preload for cesarean section, detected both significantly higher NACS and higher Apgar scores in the same group of babies. 71However, this result was not clinically meaningful because the mean 5-min Apgar scores only differed by 0.4 (mean of 9.4 and range of 8–10 vs. mean of 9.8 and range of 9–10). No other studies have found differences in measures of neonatal well-being where differences in the NACS were reported, nor were there differences in the NACS when the other measures detected differences.
As Michenfelder expressed in his editorial, 80the NACS was published as a work in progress. However, researchers have been using the NACS as a finished product without further testing its reliability and validity beyond the preliminary work performed before publication of the scale’s description.
The two reports 1,81describing the NACS reliability indicate that it may be a reliable instrument. Further research with larger samples is needed to confirm these results. Studies verifying that the NACS can differentiate between the neurobehavioral effects of regional and general anesthetics have supported the validity of the scale. Other indicators of validity have had equivocal results that must be investigated further.
This review shows that researchers in obstetric anesthesia have not consistently used the tool as it was originally intended. Furthermore, the NACS has not been regarded as a preliminary work and evaluated thoroughly. Research is needed to establish the reliability and validity of the NACS. Future testing should include inter-rater reliability, internal consistency reliability, and an analysis of the NACS factor structure. Studies that correlate umbilical cord drug levels of drugs known to depress the central nervous system with the NACS will provide insight into the construct validity of the scale. Correlating the NACS with clinical outcomes such as umbilical cord p H, Apgar scores, and breast-feeding can assess validity. This information is necessary to determine if the NACS is a sensitive test for detecting subtle neurobehavioral effects when they exist.