We would like to thank Dr. Gunter for his interest in our article1 and for his thoughts regarding alternative analytical approaches to the data. In fact, we presented our data in a manner that would permit such a reanalysis and we are pleased to see him taking the effort. We do not disagree with what we think is Gunter’s overarching theme in his letter; our results do not provide general support for the belief that anesthesia and surgery during infancy produce detrimental effects on academic performance during childhood. However, we have concerns regarding the possibility that there may be a small group of infants in whom there may be an association between anesthesia and surgery and subsequent performance—although, as we clearly stated, it is impossible to draw any cause-and-effect conclusions. This was the reason for one of our two predefined questions: do a disproportionate number of children who had anesthesia and surgery during infancy subsequently have very low achievement test scores (for instance, because anesthesia and surgery might be associated with large adverse effects on a small percentage of patients)? Our second question was closer to that proposed by Gunter: do children who had anesthesia and surgery during infancy subsequently have lower mean test scores (for instance, because anesthesia and surgery might be associated with mild adverse effects on a large percentage of patients)? Again, both questions were formulated a priori, and we have no idea why Gunter suspects that the first question was formulated on a post hoc basis. We cannot find anything in our article that would give this impression. Neither question was explicitly stated as a hypothesis in the Introduction, but we dealt with the two questions in parallel in the Abstract, Statistical Analysis subsection, and Results section.
As we discussed at some length in a recent review article concerning assessment of cognition after anesthesia and surgery in the elderly,2 physicians commonly categorize continuous variables, setting cutoffs for extreme values to label individual patients as having or not having certain attributes, such as “hypertension”—and it is appropriate to analyze both continuous variables (blood pressure) and dichotomized variables (percentage of patients with hypertension), each type of analysis having advantages and disadvantages. For example, another recent study of effects of anesthesia and surgery during early childhood analyzed both continuous variables (achievement and cognitive test scores) and dichotomous variables (diagnoses of learning disabilities and needs for individualized educational programs).3 There is no consensus among psychological or educational researchers concerning a standard cutoff for a “very low” score on achievement or cognitive tests. In our study, we used scores below the fifth percentile as the cutoff. This cutoff classified 1 in 20 scores in the normative population as “very low” and seemed remotely analogous to the 1 in 20 criterion for Type I errors embodied in the conventional P value less than 0.05 significance level for statistical tests.
Gunter presents a post hoc analysis of the data in our figure 1 classified by deciles and states that the distribution across deciles does not differ according to a chi-square test. This is not surprising, considering that we reported that the mean score of these patients was not significantly lower than expected, relative to the normative population, and the overrepresentation of low scores diminished at the level of median performance. However, it is easy to imagine that anesthesia and surgery might be associated with large adverse effects on scores of a small percentage of patients without necessarily shifting the entire distribution of scores toward lower values, because many drugs and surgical procedures can produce diverse, serious adverse effects that are restricted to small percentages of patients. Only a much larger data set would be able to reveal a change in the overall median score.
Gunter indicates that it would be inappropriate to focus on a post hoc basis on particular, arbitrary quantiles of a distribution that have excessive numbers of cases and thereby reach conclusions about significant differences. He illustrates this with the excess number of 4’s in his imaginary example of rolling a die or the excess number of patients between the 60th and 70th percentiles in our figure 1. This is correct, but irrelevant, because (1) we focused on very low scores on an a priori basis, and (2) focusing on very low scores is meaningful and commonplace, in contrast to arbitrarily focusing on scores between the 60th and 70th percentiles.
Gunter presents an additional chi-square test based on removing the four patients who had additional operations outside the selected groups of operations on additional dates during infancy. The article indicates that, when these four patients were excluded, significantly more of the remaining 54 patients scored below the fifth percentile, relative to a normative population.
Gunter requests some additional information: in addition to the selected groups of operations (inguinal hernia repair/orchiopexy, pyloromyotomy, and circumcision), 5 (9%) of the 58 patients without central nervous system problems or potential risk factors had additional procedures done on the same date: orchioectomy for two patients, and removal of skin tags, diagnostic laryngoscopy, and removal of implantable venous access port for one patient each. Four patients (7%) had additional operations outside the selected groups of operations on additional dates during infancy. These additional operations for these four individuals were (1) tympanostomy; (2) removal of abdominal lesion, needle biopsy of liver, and insertion of implantable venous access port; (3) anal and bladder surgery; and (4) patent ductus arteriosus repair, atrial septal defect repair, appendectomy, and colon surgery (the insertion and removal of the implantable venous access port occurred in the same individual). This information was included in our original manuscript as Supplemental Digital Content, which the journal, during the review process, decided to omit.
Gunter requests some additional analyses: when we exclude the patients who had additional procedures done on the same date as the selected groups of operations from figure 3, the correlation for the remaining 50 patients (not 45, as Gunter erroneously states) is unchanged at r = −0.33; P = 0.0189; 95% CI = −0.56 to −0.06. When we exclude instead the patient who had the longest duration of anesthesia in figure 3 (who did not have an additional procedure done on the same date), the correlation for the remaining 53 patients becomes marginally significant, r = −0.23; P = 0.0991; 95% CI = −0.47 to 0.04. This patient’s duration of anesthesia appears somewhat less extreme in figure 2 (where it is 1.0 SD above the mean) than in figure 3. All patients in figure 2 met all predefined inclusion criteria. The inclusion criterion with respect to surgery was that patients had one or more of three stipulated groups of operations during infancy. There was no criterion that excluded patients who had some other, additional operation during infancy. We reported an additional analysis in figure 3 and in the text excluding the four patients who had additional operations outside the selected groups of operations on additional dates during infancy. From one viewpoint, Gunter’s suggestion to exclude the patient who had the longest duration of anesthesia in figure 3 yields an interesting result, illustrating the influence of this patient on the magnitude of the correlation. From another viewpoint, his suggestion seems to be an example of the post hoc, cherry-picking analytical approach that his letter decries—in this case, aimed at rendering a result nonsignificant, rather than significant.
The article already includes information pertinent to the question that Gunter poses as to whether the larger data sets, in addition to including excess numbers of subjects with the very lowest scores, are otherwise indistinguishable from the expected normal distribution. Both the larger groups of 133 patients and 287 patients were distinguishable from a normative population in another respect: for both, the mean scores were significantly lower than the expected value of 50.
Finally, because we disagree with Gunter’s arguments as detailed above, we disagree with his statement that our “ … results only support a conclusion that the distribution of academic achievement scores in otherwise neurologically normal children with a single exposure to anesthesia in the first year of life for minor, peripheral surgery is completely consistent with that seen in the population at large.” However, for numerous reasons detailed in the Discussion section of our article, we do not believe that our results established that exposure to anesthesia during infancy was causally related to the disproportionate number of children who had very low test scores. We made clear in the article that causation could not be determined from our study and that the findings should be considered tentative until further verification.