Accepted for publication September 14, 1999.
ONE of the primary concerns of obstetric anesthesia is its safety for both mother and neonate. Much has been written about this issue, in particular consequences for the neonate. Clinical and laboratory measurement scales, including Apgar scores, 1umbilical blood venous and arterial acid-base balance analysis, 2and neonatal neurobehavioral testing scales, 3have been developed to assess neonatal well-being. In 1982, a report by Amiel-Tison et al. 4was published in ANESTHESIOLOGY that described an assessment scale called the Neonatal Neurologic and Adaptive Capacity Score (NACS). The NACS was proposed as a simple, noninvasive, quick neurobehavioral examination to assess subtle effects of drugs on neonates and to distinguish such drug effects from birth trauma, perinatal asphyxia, or neurologic disease. This publication was accompanied by a critical editorial that claimed the test to be deficient as a valid research instrument. 5It has now been almost 20 yr since the NACS was described, and initial criticism notwithstanding, it has been widely embraced by the obstetric anesthesia community and used worldwide by investigators examining neonatal effects of peripartum medications. In this issue of ANESTHESIOLOGY, Brockhurst et al. 6conduct a systematic review of the NACS in obstetric anesthesia research and conclude that the reliability and validity of this test has still not been established. Here we examine this issue in greater depth.
Why has the NACS become so popular? The answer: Simplicity. The test is easy and quick (< 5 min per examination), it can be performed with minimal training, it is non-noxious (thus easily performed in the presence of parents), and lends itself to simple statistical analysis. More traditional measures for neonatal performance, such as the Brazelton Neurobehavioral Assessment Score, 3require approximately 20 min for a trained examiner to perform; include a large number of items (or clusters), each scored on a nine-point scale; and include statistical analysis that can be complex. Many studies using the Brazelton Neurobehavioral Assessment Score also include testing at age 14 and 30 days, allowing for integration into a variety of infant developmental paradigms. 7This is virtually never performed with the NACS. In contrast, the NACS has 20 items, each scored as 0, 1, or 2, for a total possible score of 40. Individual items are summed, and a single score is assigned to the neonate. No special training or certification is required to perform the NACS. This enticing simplicity was part of the editorialist’s original concern in 1982:“For such an instrument, speed of administration is hardly the primary concern (should it be clinically?), but rather its ability to find or not to find effects of the variables of concern on the functioning neonate.”5As noted by Brockhurst et al. , virtually all of the studies using the NACS show no differences between groups of infants. In the few studies in which differences are noted, the circumstances are such that they would be expected to occur and expected to be obvious, e.g. , general versus regional anesthesia for cesarean delivery. Studies using the NACS to assess neonatal effects of maternally administered local anesthetics or opioids for either vaginal or cesarean delivery have yielded inconsistent results, frequently showing no differences between groups or differences that may be questioned on statistical grounds.
Why was the test so controversial? It is noteworthy that the original publication of the NACS was accompanied by not one, but two editorials. One editorial by a researcher prominent in infant developmental psychology criticized the NACS as being statistically flawed, improperly conceived, overly simplistic, and inappropriate as a research tool. 5The other editorial, by the then Editor-in-Chief of ANESTHESIOLOGY, John Michenfelder, lamented the difficult position of an editor considering a manuscript for which there are widely varying recommendations by the editorial review board. 8Michenfelder noted that outright rejection might result in premature condemnation, whereas publication requires that the readers be informed of the limitations of the work. He concluded that “determination of the validity, sensitivity, and merits of the examination will follow.”7In other words, punt—let the chips fall where they may, and challenge the scientific community to determine if the initial criticisms were valid. Brockhurst et al. conclude that such validation is still lacking despite widespread use of the NACS examination, and we concur. Widespread use of a test is not evidence of validity, and investigators should use caution and discretion in interpreting its results. Moreover, in some instances, this test has been used (or misused) on the assumption that validation has been established. In our opinion, this misuse has resulted in some interesting conclusions, examples of which follow.
Consider the definition of a “normal” NACS result. The authors of the original article on the NACS arbitrarily claim that a score of ≥ 35 (of a possible 40) is “normal.”4They also acknowledge that validation of this figure requires additional data. Such data do not exist. No study to our knowledge has correlated specific NACS results, a score of 35 or otherwise, with any other measure of neonatal or early childhood performance. The consequences— either short-, medium-, or long-term—for neonates scoring, e.g. , 25, 30, 35, or otherwise on the NACS are not known. A recent study compared the effects of labor epidural analgesia using ropivacaine versus bupivacaine on neonatal outcome. 9The NACS was performed on all infants at 2 and 24 h after birth; the results were analyzed by a comparison of median scores and a comparison of number of infants with scores >versus < 35. No differences in median NACS were noted at 2 or 24 h, but there were more infants at 24 h (not at 2 h) with NACS > 35 in the ropivacaine group. Based on this finding, advertisements for obstetric use of ropivacaine claim better neonatal performance versus bupivacaine. In light of no meaningful justification for a NACS of 35 as an appropriate measure of “normality” and no difference in median NACS at any time in that metaanalysis, this claim must be viewed with caution:caveat emptor .
Now consider the analysis of individual portions of the NACS. An overall score of 30 or 35 or 38 does not reveal which items resulted in lost points, just as an Apgar score of 6 or an American Society of Anesthesiologists Physical Status classification of III does not reveal the specifics of the underlying abnormalities. Very few studies using the NACS report individual subscores; usually only the total NACS is reported. In that the NACS has items related to habituation, active tone, passive tone, and reflexes, it may be useful to know which items, if any, are consistently affected by any perinatal intervention. Such subgroup analysis might allow the NACS to differentiate drug effects from insults such as birth trauma or perinatal asphyxia. Nonetheless, the original report on the NACS 4does not tell how such distinctions are to be made, and Brockhurst et al. note that we still do not know how to use the NACS to make such distinctions. Consider a recent publication claiming that epidural analgesia reduces the efficacy of breast-feeding. 10
This diatribe against epidural analgesia assumes (based on no data and no specific examples) that even infants scoring in the “normal” range (as if we know what normal is) on neurobehavioral tests may have specific subgroup deficiencies that could impair breast-feeding. A curious finding indeed, because so few studies actually report subgroup scores on the NACS. Moreover, the evidence that epidural analgesia actually has any effect on breast-feeding outcomes is nothing more than anecdotal at best. As the author of that article readily admits, no studies examined breast-feeding specifically as an outcome correlated with intrapartum analgesia. Rather, the admonition against epidural analgesia is based on a conjecture about what might occur if certain items are depressed—despite not knowing which items these are and if depression of any specific items (such as muscle tone), transiently or otherwise, actually has any effect on breast-feeding. Again, caveat emptor .
What can one conclude? Babies are complex and subject to a constellation of parental, socioeconomic, and environmental factors that have the potential to modify any intrauterine effects that may have occurred. To hope that any one assessment tool (e.g. , an Apgar score, acid-base balance, or, in this context, neurobehavioral testing) can predict developmental outcome (e.g. , breast-feeding success, early parental bonding, and growth, or later outcomes such as learning difficulties, behavioral problems, school performance, intelligence quotient, or even adult personality qualities) is overly optimistic. A statistical adage is relevant here: A statistically significant difference is only a difference if it makes a clinically important difference. One must first show, in a scientifically rigorous manner, that meaningful outcomes relevant to families and society are actually affected by intrapartum analgesia before the results of machinations like the NACS are to be taken seriously. The publication of the NACS in 1982 was accompanied by strong claims of lack of validity and applicability. The review by Brockhurst et al. in this issue of ANESTHESIOLOGY claims that additional work is still necessary to establish this validity. For now, the NACS will certainly continue to appear, like barnacle on a ship’s masthead, in many studies of obstetric anesthetics. If the NACS does nothing else, at least it forces us to remember that neonatal concerns are an important part of obstetric anesthesia. That in itself is a worthwhile goal.