This article describes a systematic review on the research into postoperative cognitive dysfunction (POCD) in noncardiac surgery to ascertain the status of the evidence and to examine the methodologies used in studies. The review demonstrated that in the early weeks after major noncardiac surgery, a significant proportion of people show POCD, with the elderly being more at risk. Minimal evidence was found that patients continue to show POCD up to 6 months and beyond. Studies on regional versus general anesthesia have not found differences in POCD. Many studies were found to be underpowered, and a number of other methodologic difficulties were identified. These include the different types of surgery in studies and variations in the number and range of neuropsychological tests used. A particular issue is the variety of definitions used to classify individuals as having POCD.
FIFTY years ago, prompted by the number of anecdotal reports from his patients and their families regarding problems with cognitive function after surgery, Bedford1published a retrospective observational report of 251 older patients who underwent surgery with anesthesia. He noted that although minor degrees of dementia were common in this group of patients, 7% experienced extreme dementia, giving rise to his conclusion that “Operations on elderly people should be confined to unequivocally necessary cases.”
This study encouraged investigators to conduct more rigorous prospective studies examining changes in cognitive performance from pre to post surgery as assessed by neuropsychological tests. The change in cognition, when “significant,” is now commonly referred to as postoperative cognitive dysfunction (POCD). POCD is to be distinguished from postoperative delirium, which tends to be a transient and fluctuating disturbance of consciousness that tends to occur shortly after surgery, whereas POCD is a more persistent problem of a change in cognitive performance as assessed by neuropsychological tests.2,3
Until recently, the majority of the research in this field had focused on cardiac surgery, where studies have indicated that a proportion of patients have POCD manifesting as problems with memory, attention, concentration, speed of motor and mental response, and difficulties with learning.4The proportion found to have POCD after cardiac surgery varies as a result of a number of issues, including patient-related factors (e.g. , age), how soon after surgery the tests are administered, the tests used, and the analysis and criteria for determining deficits.5Although the causes of POCD in cardiac surgery are multifactorial, the use of cardiopulmonary bypass has often been cited as the major contributor to the problem. However, evidence is accumulating that off-pump cardiac surgery produces a similar effect on neuropsychological performance to that with the use of cardiopulmonary bypass.6–8
In contrast to cardiac surgery and other investigations of cognitive function and deterioration in diseases such as human immunodeficiency virus/acquired immunodeficiency syndrome and Alzheimer disease, the study of POCD in noncardiac surgery is in its infancy. Because the field is relatively new, a number of the studies on this topic are speculative and descriptive and often on small samples. Nonetheless, we believe that it is important to bring these together with the more recent research in a systematic fashion where the extent of the evidence can be assessed. Consequently, the aim of this article is to bring together the studies on this newer field in a systematic review to examine the evidence in relation to POCD in noncardiac surgery.
Methods: Search Strategy and Selection Criteria
A review of citations from MEDLINE, EMBASE, PsychInfo, and the Cochrane Library (CDSR, DARE, CENTRAL) was conducted without time limits until December 2005. Full-text articles were retrieved of any citations that were considered potentially relevant. Supplementary methods of retrieving studies included a review of relevant article bibliographies. Our search strategy was as follows: (surg* or operat* or anaesth* or anesthes* or postoperat* or postoperat*) and (neurocogniti* or cogniti* or neuropsycholog* or cerebr* or neurobehaviour* or microemboli*) and (effect* or outcome or decline or dysfunction or impairment or function or production). Journal articles were also searched by hand for relevant articles.
Randomized controlled trials and observational studies were included subject to description of a study population of greater than 10 patients with an analysis of postoperative cognitive decline after surgery, as assessed by preoperative neuropsychological assessment and postoperative neuropsychological assessment at not less than 7 days after surgery. We have limited the article to studies that performed postoperative assessments after at least 7 days for two reasons: first, to avoid any confusion with delirium after surgery, and second, in an attempt to avoid the general effects of any anesthetic agents. We think it is unlikely that any anesthetic agent may effect neuropsychological assessment after 7 days, although this remains unproven. Articles were included if authors performed statistical analyses over time or between groups or made comparisons with normative data.
Exclusion criteria were surgery on the heart or the brain, including carotid artery surgery or angioplasty. In addition, we excluded noncardiac transplantation and surgery for thyroid disease because these are known to have a significant effect on the brain/cognition, but this is normally an improvement in cognition.9–11In addition, studies with unclear timing of test administration and/or articles describing the same or an overlapping patient sample as other articles already included in the review were excluded. Studies investigating only subjective reports of cognitive dysfunction or observational ratings of cognition were also excluded because the relation between these reports and formally assessed cognition is either not apparent or not clear.12–15
Articles retrieved were limited to the English language and peer-review publications. To assess the quality of the search strategy, eight studies that were known to be relevant to this field were sampled.16–23The search strategy was able to identify all these articles. Forty-six articles met the inclusion criteria. A further article by Abildstrom et al. 24assessed a subgroup of the Moller et al. 16study at 1–2 yr after surgery. These two studies have therefore been combined, and reference to Abildstrom et al. 24only appears in the cohort studies at more than 1 yr.
Table 1Describes the papers identified. The studies are divided into three categories:
Single-group and controlled studies: Twenty studies that examined a single group and a control group to estimate POCD and, in some cases, factors associated with POCD.
Comparison between general (GA) and regional anesthesia (RA): Seventeen studies compared RA and GA.
Comparisons between other techniques: Nine studies compared two groups in which a comparison was made that was considered to have a possible influence on the development of POCD.
Eight of the cohort studies examined a single group and applied a definition of change to estimate the proportion of patients showing POCD (table 1). Twelve studies compared the performance of the group of interest with a control group (table 1). Of these 12, 10 compared the findings with a contemporaneously gathered control/comparison group, and 2 used data from a previously collected study.25,26
Although three studies from the cohort group27–29also compared the effects of different types of anesthesia on cognition, 17 studies were specifically designed to compare GA with RA (table 1). In 15 of these studies, patients were randomized. Nonsurgical control groups were also used in two studies.30,31Flatt et al. 30used a group of 23 nonpatient individuals age and sex matched with the GA group, and Jones et al. 31assessed 50 patients on the waiting list for major joint replacement. In the other technique comparison studies (table 1), 7 studies used random allocation to groups,21,23,32–36one used successive allocation,37and in one study, allocation to groups was not clearly stated.38
Number of Participants.
In studies without a control group, the mean number of patients was 111 (range, 29–288). The largest samples were in those undergoing cataract surgery (mean, 254).
The mean number of patients in the 12 studies that used controls was 235 (range, 35–1,218). Of the RA/GA comparison studies, the mean number of participants was 100 (range, 20–428). The mean number of participants in the studies that compared different techniques was 169, with a range of 27–861.
Type of Surgery and Anesthesia.
The type of surgery examined in the studies ranged from minor, such as cataract surgery, to major vascular and thoracic. Of the cohort studies, three of the eight studies that examined single group changes in performance over time included patients undergoing orthopedic surgery, a further two examined vascular and thoracic surgery, two assessed those undergoing cataract surgery, and one assessed abdominal surgery (table 1). In studies where the findings of the index group were compared with a control group, the type of surgery varied and included three studies with patients undergoing abdominal surgery, two studies that included some orthopedic patients, two cataract surgery, one gynecologic, one vascular, and three that described surgery as minor or mixed. In addition to being minor, cataract surgery has the potential confounding variable of individuals having improved visual acuity that in turn may lead to improvements on some cognitive tests.28,39–41
General anesthesia alone was used in the majority (14 of 20; 70%) of the cohort studies, whereas one study combined GA and epidural anesthesia (EA),29two studies combined GA and local anesthesia (LA),40and one study combined GA, LA, and what they termed neuroleptanalgesia .28Two studies did not report the type of anesthesia used.23,39In the studies where different types of anesthesia were combined, it is not possible to attribute any findings to the effect of specific anesthetic techniques.
Specific comparisons between types of anesthesia were examined in 17 studies. Seven (41%) compared GA with spinal anesthesia (SA),31,42–473 (18%) with EA,19,22,483 with LA,20,30,49and 3 with SA/EA.17,18,50One study compared GA with EA and EA plus GA.51Again, the type of surgery investigated varied widely and included three transurethral resection of prostate,22,44,46one transurethral resection of prostate/pelvic floor repair,45two cataract,20,49six orthopedic,19,31,42,47,48,51one plastics,30and three mixed.17,18,50
Each of the nine technique comparison studies examined a different surgical group. In four of the studies, the participants underwent GA32,33,35,38; in one, EA21; and two studies used GA for study patients and examined patients receiving LA as a comparison group.36,37One included patients receiving GA or RA, and one did not report the type of anesthesia used.23
Number and Timing of Assessments.
The timing of assessments is an important issue because early assessments may identify a transitory cognitive problem (i.e. , postoperative delirium), whereas those assessing patients at more remote times from the surgical intervention are able to establish POCD that may be persistent or permanent. In cardiac surgery, the timing of the assessments after surgery has been found to be one of the most significant factors in the number of patients found with POCD or the extent of postsurgical changes.5In the cohort studies reported here, 11 (55%) performed a follow-up within 10 days of surgery. At the other extreme, one study conducted the first follow-up approximately 1 yr after surgery. Fifteen studies (75%) conducted more than one follow-up; conversely, in the comparisons with GA and other comparison of techniques studies, the majority (69%) conducted only a single postoperative assessment, with 50% of these being performed during the first 10 days after surgery. Whereas 35% of cohort studies examined patients 300 or more days postoperatively, the latest assessment in the RA versus GA studies was 6 months, and in the studies that compared different techniques, the latest assessment was 4 months after surgery. Conducting more than one follow-up assessment enables an evaluation to be made of the time course of the progression of POCD. This is, however, complicated by the confounding effect that learning may have where repeated assessments are performed.
Age and Sex of Participants.
Early reports of cognitive change after surgery implicated “old people.”1Both for this reason and because most surgical interventions occur in the latter years of life, the bulk of studies examined individuals with a mean or median age over 60 yr. It is also of note that age is the patient-related factor that has been found to be associated with greatest change in neuropsychological test performance in cardiac surgery.5In single-group cohort studies (table 1), the age for participants ranged between 22 and 93 yr, with all but one of the studies reporting a mean or median of 60 yr or greater for their sample. Where age was reported in the cohort studies with a control group (table 1), the majority of samples also had a mean or median age of over 60 yr (7 of 12), with the study on gynecologic surgery recruiting the youngest group (mean, 41.4; SD, 5.2).52With the RA versus GA studies (table 1), the age range, which was only documented in 5 studies (42%), was 18–93 yr, and, where reported, the mean age was over 60 yr in all but one of the studies. The youngest participants (with a mean age of 42.5 yr) were in a group of patients undergoing plastic surgery. In 66% of the studies that compared techniques, the participants’ mean age exceeded 60 yr. Four studies (44%) reported an age range which was 18–89 yr.
Many studies specifically selected “older” participants over a specific age,25,26,28,29,31,40,51,53–55although the age cutoff varied between studies. One study56selected “patients between 40 and 59 yr of age to assess POCD in middle aged patients,” and in one study57a specific comparison was made between two age groups. In comparing age groups, it is difficult to remove other confounders such as comorbidities or concurrent medications with the result that comparisons are not made on age alone. For example, Chung et al. 57compared those younger than 60 yr to those 60 yr and older and documented that the latter group had more medical problems. Some other studies examined whether age had an influence on the extent of decline, whereas others controlled for age effects.53
Of all the studies in table 1, eight focused on single-sex surgical groups; six focused on males,22,32,37,44,46,55whereas two dealt with females,49,52of which one study52was designed to assess the differences found in surgical as opposed to physiologic menopause. Four studies did not report sex.18,42,47,51
Assessment and Definition of POCD.
Many studies have chosen to define POCD using “individual change” scores. In this type of analysis, each participant acts as his or her own control, and a classification is made as to whether a particular participant showed evidence of sufficient decline to be defined as having POCD. The advantage of this approach is that it defines and categorizes individual performance. However, regardless of the definition chosen, these will be a statistically defined criterion that has no intrinsic meaning or reference to brain damage. The difficulty with these conventional definitions of POCD is that “sufficient decline” is variously defined; e.g. , Treasure et al. 58defined a decrease in performance equal to or greater than 1 SD from the preoperative score on two or more tests to indicate POCD, whereas Shaw et al. 59regarded the same decline as indicative of POCD, but it only had to be present in one or more tests. Deficit was rated as being either 1 SD decline in 1 of 21 tests or 1 SD decline in 4 of 21 tests in one study21and as being 20% decline in 20% of the tests in another study.60Williams-Russo et al. 19,21examined individual change that was based on establishing a clinically important difference score for each test and then converted the participant’s raw within-subject change score to a −1, 0, or +1 score reflecting whether the observed change was worse than the clinically important difference, within one clinically important difference, or better than the clinically important difference. This score was then summed, and any participant with a score of −3 or less was defined as having a deficit. One study57also compared participants postoperatively with normative data to examine whether there was a significant difference. Five studies used a standardized cutoff score as an indication of decline on a screening measure.23,40,44,46,61
A number of studies used group change scores in neuropsychological tests to determine whether surgery or a comparison between two or more surgical groups resulted in differences.
Studies applied various statistical procedures (e.g. , t tests, analysis of variance, analysis of covariance) to examine group differences. In addition, others used multivariate techniques such as multiple regression to explore variables that influence POCD or cognition after surgery.
Number of Tests Used.
Because of the time constraints of the surgical environment, the neuropsychological assessments are limited when contrasted with a clinical neuropsychological assessment that would take approximately 2.5 h and attempt to cover most cognitive domains.62As a result, the tests selected end up being a compromise to fit within the restrictions imposed by the environment.
Establishing the number of tests used in studies is made more complex by the fact that in some studies, researchers used a comprehensive battery to assess a wide range of cognitive domains. In some cases, the tests in these batteries (e.g. , Mini-Mental State Examination [MMSE]) are accumulated to produce a single score. In others, a number of scores are produced.29Nineteen (41%) of the studies in table 1used a comprehensive test battery either alone or in combination with other neuropsychological tests (see appendixfor key). As can be seen from tables 2–4, there was a wide range in the number of tests used in studies.
Domains and Types of Tests Selected.
When neuropsychological tests were first introduced into the study of POCD, they tended to be traditional “intelligence tests” or screening tests such as the MMSE. A problem with screening tests is that some are liable to show ceiling effects if cutoffs are applied.40The overall batteries tend to be highly reliable but are unlikely to have the sensitivity required to detect the subtle (but important) changes after surgery. For example, the Wechsler Adult Intelligence Scale has proved itself to be insensitive to assess change after cardiac surgery.63Recently, a number of tests have been specifically designed for repeated administration, and some have been computerized to improve the standardization and ease of administration.
Overall, in the studies reported here, 70 different neuropsychological tests have been used in this area along with 9 composite batteries ( appendix). The domains assessed by these tests in the studies are displayed in table 1. The domain most assessed was memory and learning (B), where 33 of the studies applied specific tests. To this must be added those studies where composite batteries were used, because these also examine some aspects of memory and learning. Comparisons between studies are made extremely difficult because of the differences in the tests selected. Although different tests may assess a similar domain, their sensitivity to assess change is likely to differ.
Dealing with Learning.
Despite attempts to restrict learning on repeated administration of neuropsychological tests, it is customary for some learning to be found. These can occur as a result of increased familiarity with the test structure and alterations in strategy in relation to the test. In studies of POCD, patients undergo at least two assessments, frequently with only a fairly short time separation. Many researchers have specifically selected tests that keep learning effects to a minimum, and parallel equivalent forms have also been used to reduce learning effects. Nonetheless, learning is apparent in most studies under review.(e.g., 40,55) In studies with two groups, the control group enables the impact of learning to be assessed. One approach by the multicenter International Study of Post-Operative Cognitive Dysfunction (ISPOCD) group16,26,53,56has been to analyze their data by comparing the mean of the neuropsychological change score from a healthy control group over three assessments corresponding to the assessment intervals of the surgical group. The mean of the control group changes were used as an estimation of learning and were subtracted from the surgical participants’ change score, and the result was divided by the control group SD to obtain a Z score for each test. This calculation allows for each test to be analyzed separately and also enables the scores be combined into a total neuropsychological score because the difference in dispersion of scores on each test is removed by scoring them in SD units from the mean. However, the authors did apply a cutoff score to define POCD by rating the participants as having POCD when Z scores on two individual tests or the combined Z score reached 1.96 or more (the higher the score the more deterioration).
Controlling for Alternative Explanations: Education, Intelligence, and Mood.
In a number of studies, either an estimate of general intelligence was performed before surgery or level of education was recorded. This was done to examine either whether individuals with high or low intelligence or education are particularly susceptible to the negative effects of surgery or, in studies with more than one group, to ensure the groups are balanced on this potential confounder or whether there is a need to use education/intelligence quotient (IQ) as a control in the analyses. The “cognitive reserve” hypothesis suggests that individuals with relatively low intelligence should be more susceptible to an equivalent brain injury than individuals with higher intelligence or education (e.g. , Elkins et al. 64). On the basis of this hypothesis, it would be expected that a higher rate of POCD should occur in those with lower intelligence or limited education. There is little evidence to support an association of general intelligence with decline after cardiac surgery, although there is some evidence to suggest that higher levels of education protect against decline after cardiac surgery.65As displayed in table 1, 18 of the cohort studies (90%) assessed either education or performed an assessment of IQ before surgery. However, these assessments were conducted in only 41% (7 of 17) of the RA versus GA studies and 33% (3 of 9) of the intervention studies.
Two related factors are used to justify the need to make assessments of mood in studies of cognitive change after surgery. The first is that mood changes may occur from before to after surgery55and that mood, in particular depression and anxiety, has been found to correlate in some studies with performance on some neuropsychological tests.62In the articles under review, 70% (14 of 20) of the cohort studies, 59% (10 of 17) of the GA versus RA studies, and 33% of the studies that compared different techniques assessed mood. Both education/IQ and mood were assessed in 70% (14 of 20) of the cohort studies, but the percentage decreases to 29% (5 of 17) in the GA versus RA studies and 22% (2 of 9) of the technique comparison studies.
Because of the diversity of types of assessments of both cognitive function and mood or psychiatric state, it would be unlikely that a clear picture would emerge from the studies performed. In addition, many researchers have selected neuropsychological tests that are not particularly susceptible to the effects of, or changes in, mood.
In table 2, the cohort studies are divided into five periods according to the time of postoperative assessment and, within these periods, by the design of the study (no controls and with controls).
The number of those recruited and completing the follow-up assessment is indicated in table 2. It shows that overall attrition rates, where reported, were lower, with shorter follow-ups: 5.4% for assessments between 7 and 21 days; 19% for assessments between 22 and 132 days; and for the few studies reporting attrition at the times beyond 6 months, 17%. It is unclear whether this attrition is selective and that participants with certain characteristics were lost to follow-up. Selective attrition raises questions regarding the validity of findings, and this is particularly pertinent when considering POCD because there is a need to establish whether those lost to follow-up are more or less likely to have had POCD. Some reports on POCD after cardiac surgery have suggested that there may be selective attrition, with sicker patients being more likely to be unavailable for follow-up66(see also Newman and Stygall5). Long follow-up studies in cardiac surgery have shown that attrition is higher among those with lower IQ67and lower education.68
7- to 21-Day Assessments.
Of the eight cohort studies conducted without a control group, four conducted the first assessment within 21 days of surgery. Three reported a decline in performance ranging from 41%27to 71%.28However, the classification of POCD varied between these studies, i.e. , Rodriguez27adopted a decrease in performance of ≥ 0.5 SD in 20% of tests, Treasure30> 1 SD drop in two or more tests, and Ancelin29> 1 SD drop in 1 of 21 summary scores. POCD was not defined in one study.28
In the cohort studies with a control group, seven conducted an assessment at 7 days after surgery. One study reported no change in performance, and the other six described decline occurring in 6.8%26to 31%.59Three definitions of decline were used: ≥ 1 SD drop in one test,59a comparison between preoperative and postoperative scores by analysis of variance,40and the remainder defined decline as Z scores of 2 from seven tests or a combined score of 1.96 or greater. Of those that made specific comparisons of the prevalence of POCD, only one study26found no significant difference in performance between the control and surgical groups.
These early findings showed a tendency for the studies with the least stringent definitions to report a greater proportion of patients with POCD,27,29greater deficits occurring in the more severe forms of surgery,58,59and the most minor forms of surgery showing no40or minimal POCD.26It is also of note that the controlled studies produced deterioration rates between 3.4% and 6% in the control group. Only in the case of minor surgery were differences between the control and the study group found to be not significant. This relatively clear pattern of results attests to the robustness of POCD soon after surgery, given all the differences in methods of assessment between the studies (e.g. , number, type, and sensitivity of the neuropsychological tests). Differences in rates of early POCD between minor and major surgery are reinforced by comparing the ISPOCD studies of Canet26on minor surgery with those of Moller16and Rasmussen,25who both assessed major surgery using similar methodologies and neuropsychological tests. This indicated that major surgery produced between 26% and 33% POCD compared with 7% for minor surgery.
It is of note that two studies used the MMSE either alone28or with two other neuropsychological tests.40In both of these studies, no overall differences were found between preoperative and early postoperative performance, with the exception of the oldest age group (85 yr and older) in the Stockton28study. These findings further suggest that screening tests such as the MMSE do not have the sensitivity to examine for POCD.
The findings do suggest that older people are more likely to have early POCD. Two of the single-group studies reported that older patients were more susceptible to early decline,28,29and one of the controlled studies26found that age over 70 yr was a risk factor for early POCD. Further support that older age is associated with early POCD comes from a comparison between ISPOCD group studies where an identical methodology was used. Johnson56examined a middle-aged sample (40–59 yr) and found POCD in 19.2% and Moller16and Rasmussen25found POCD in 25.8% and 32.7% of their samples who were older than 60 yr (see also Rasmussen et al. 69).
In addition, there is some suggestion in two studies that patients who may have been sicker or requiring more extensive surgery may be more likely to have POCD. In one study,26those selected by the hospital to undergo inpatient rather than outpatient surgery were more likely to show POCD. In the other study,27an association was found between postoperative complications and cognitive dysfunction at 7 days. It is possible that these increased rates of POCD in those with complications may reflect this and the additional medication to deal with the complications.
22-Day up to 6-Month Assessments.
The majority of studies reported no evidence of POCD, or no decline or an improvement in neuropsychological performance. Where reported, POCD prevalence in the surgical groups ranged between 6.2% and 56%. Ignoring the one study with a high incidence of POCD,70the other studies produced POCD rates of between 6.2% and 9.4% in the surgical group and between 2% and 4% in the control groups studied. In most cases, the scores in the surgical group were greater than those found in the controls, but in only two studies did this reach significance.
By examining the eight studies that performed an assessment at both this and the previous time point, it is possible to assess the changing rates of POCD with increasing intervals after surgery. In all but one of these,58POCD decreased from the first to the second time point. The percentage decrease in POCD ranged between 3% and 71%. Besides the study by Treasure,58the lowest change occurred in the study by Canet26on minor surgery that did not find any differences in the surgery and control groups at 7–21 days postoperatively. The four other ISPOCD studies using the same methodology showed POCD rates at 12 weeks after surgery of between 62% and 71% lower than were found at 7 days. These data provide clear evidence that rates of POCD decline from the acute phase (7 days) to longer periods after surgery.
Although a number of studies examined the possible effect of education and/or IQ on the occurrence of POCD, few effects were found. In one study,29those with low educational attainment had more POCD, and another study60found that those with lower education showed greater declines after surgery. Whereas no relation was found between mood and cognitive decline in two studies,30,24depression or the risk of depression before surgery was found to be associated with decline in a number of studies.28,29
Significant differences in POCD between inpatient and outpatient treatment reported soon after surgery by Canet26was not apparent at this later assessment, but the inpatient group had higher POCD than the controls. The report by Rodriguez27of an association of POCD and postoperative complications at the early assessment was found to persist at this later assessment.
Assessments > 6 Months.
Eight studies reported assessments at 6 months or longer after surgery, with one study55having four assessments over this period and two other studies assessing patients on two occasions.28,40The bulk of studies reported no decline or an improvement from before surgery. Importantly, none of the studies with a control found any difference from the control group.
Abildstrom,24who examined a subset of the ISPOCD study of Moller at 1–2 yr after surgery, found no differences between the elderly group undergoing surgery and the controls. The authors estimated that POCD persists to this time period in only approximately 1% of patients. They did identify age as a risk factor and, in common with work on the long-term impact of cardiac surgery,71showed that an early deterioration increased the likelihood of long-term POCD. One difficulty identified by the authors is that only 3 of the 35 patients with POCD at 1–2 yr had POCD at the earlier assessment points.
General versus Regional Anesthesia.
One hypothesis regarding POCD after noncardiac surgery is that the mechanism of damage occurs through the use of GA. Consequently, the use of alternative methods of anesthesia for the same procedure should result in a reduction or a removal of POCD. A number of studies have considered this issue, and their findings are displayed in table 3, organized by whether random allocation to groups was used and the time of the assessment after surgery.
7- to 21-Day Assessments.
One nonrandomized study and nine randomized studies performed follow-up assessments at 7–21 days after surgery. At this time, one study49of patients undergoing cataract surgery found differences between GA and LA, but it is unclear whether the analysis in this study took account of the higher preoperative scores of the LA group. This would increase the likelihood for this group to show a greater decline for statistical reasons.
The study by Rasmussen and the ISPOCD investigators17examined patients aged 60 yr and older undergoing a range of surgeries requiring a hospital stay of at least 4 days. The ISPOCD investigators’ analysis protocol included accounting for learning by subtracting from the performance of the GA and RA groups the changes in performance of healthy controls that were collected in an earlier study. The investigators’ intention-to-treat analysis showed a higher incidence of POCD in GA (19.7%) compared with RA (12.5%), which just failed to reach significance (P < 0.06). However, a further per-protocol analysis that excluded 56 participants showed the difference between GA (21.2%) and RA (12.7%) to be statistically significant (P < 0.04).
22-Day up to 6-Month Assessments.
None of the 12 studies (2 nonrandomized) that assessed patients between 1 and up to 6 months after surgery found differences between the performance of those undergoing GA or RA. Rasmussen,17who had reported differences at 7 days, assessed participants 3 months postoperatively and detected cognitive dysfunction in approximately 20% of their sample at 3 months, but with no differences between RA and GA (intention-to-treat GA 20.4%/RA 20.2% and per-protocol GA 19.7/RA 21%). Of the remaining 11 studies, some reported no decline in both RA and GA groups,44–47,50whereas others reported some improvements in performance.30,31,43,51The study with the longest follow-up19of 6 months found modest improvement from earlier declines in both anesthetic (GA and EA) groups.
The evidence suggests that using RA as an alternative to GA does not result in any reduction in POCD. The one large well-designed study that on early changes suggested a better outcome in RA on a per-protocol analysis, did not show any differences at the 3-month assessment.17
Studies Comparing Different Techniques.
Normotensive versus Hypotensive.
Hypotensive anesthesia offers advantages of a dry surgical field and potential reductions in blood loss. However, it has been suggested that hypotensive surgery may increase the likelihood of ischemic damage to the brain. Three studies have examined the effects of deliberate hypotensive anesthesia on POCD21,32,38and satisfied the inclusion criteria of this systematic review. GA was used in two studies,32,38and EA was used in one.21Each study examined a different form of surgery (joint replacement, prostatectomy, maxillofacial), and the specification of “hypotensive” differed between studies. The times of follow-up assessments also varied with each study. None of the studies found any differences in cognition between hypotensive and normotensive anesthesia.
Intravenous versus Inhalation Anesthesia.
Enlund et al. 33compared the effect of isoflurane or propofol on neuropsychological performance after major orthognathic surgery. At 4–8 weeks after surgery, they detected a significant decline compared with baseline in the Luria verbal learning test and a significant improvement in the Taylor-Rey-Osterreith (copying), with no differences between groups.
Various techniques have been introduced to reduce the occurrence of hypoxemia during surgery. In a randomized study, Moller et al. 34found that using a pulse oximetry in and after surgery to identify and indicate the need to intervene to reduce instances of hypoxemia did not affect neuropsychological performance at discharge (2–16 days after surgery). Forty patients who had poor memory performance were followed up 3 months later, and at that point, their median scores had returned to baseline. Casati et al. ,35using a decrease of 2 or more points in the MMSE as a definition of decline, found no difference between groups when comparing those undergoing surgery using pulse oximetry and those without oximetry. However, when comparing those patients who had an intraoperative episode of desaturation, a decline of cognitive function was observed in 10 patients in the control group only (P = 0.001). Prior et al. 37assessed 60 prostatectomy patients before and 7 days after surgery. The participants were divided into four groups: (1) extradural and air, (2) air and ether, (3) air–trichloroethylene, and (4) nitrous oxide–oxygen. Improvement was detected in all groups; however, there was no difference between groups.
Normocapnia versus Hypocapnia.
The neuropsychological effects of hypocapnia were investigated in one study.36Comparing patients undergoing cataract surgery who received either ventilation to a mean arterial carbon dioxide tension (Paco2) of 4.9 kPa, hyperventilation to a mean Paco2of 2.9 kPa, or LA, no decline was found in neuropsychological performance in any group after surgery, and no difference was found between groups.
The cognitive effect of the intravenous administration of vitamins (B complex and C) given to patients undergoing surgery for a fractured femur was compared with that of randomized nonsupplemented controls by Day et al. 23They assessed participants on three occasions, 7, 14, and 84 days postoperatively, and found no decline and no difference between groups on any assessment.
General Discussion and Conclusions
This article reviewed studies of postoperative cognitive decline after noncardiac surgery. However, a major difficulty in trying to compare investigations or establish an incidence of POCD was the diversity in participants, types of surgery and anesthesia, methods of assessment, definition of POCD, and mode of analysis. Despite all the diversity, the findings in cohort studies present relatively clear evidence of POCD 1 week after major surgery. In the large well-designed studies (largely the ISPOCD group), the data suggest that POCD is only evident after major surgery. This conclusion is supported by the reanalysis of the ISPCOD data set that attempted to control for the variability in performance by taking account of improvements and deteriorations after surgery.69
At periods between 22 and 132 days, only two studies found evidence of greater declines than control groups. Although this data are persuasive that well-controlled studies are able to demonstrate POCD at this later time, it is of note that further analysis has indicated that at this time the number of participants showing significant improvements in their performance is similar to those showing declines. On the basis of this, Rasmussen and Siersma69suggest that the findings may reflect random variation rather than POCD.
One area that requires further examination is the possibility that symptoms such as pain and/or some types of postoperative medication may lead to poorer neuropsychological performance. It is possible that these factors may also lead to larger declines in the days after surgery when pain and the use of medication may be at its greatest and to less POCD at later assessment times.
In interpreting these findings, it is important to recognize that the numbers of participants in many studies were well below what may be considered adequate to assess POCD in noncardiac surgery. This is not surprising in a new field, but it is instructive to consider the numbers required for an adequately powered study. Assuming 80% power and an α of 0.05, where one group showed 10% of patients to have POCD and the second group to have twice that proportion (20%), the sample size for each group would need to be 199, assuming groups of equal size. If the background incidence of POCD was 50% and the index group had an incidence of 60%, the numbers per group would need to be 388. Therefore, many of the studies are underpowered (table 1). Only five studies recruited groups of 200 or more. In the cohort studies, 40% (8 of 20) recruited 50 or fewer. In the studies comparing different anesthetic methods with GA, this percentage is 70% (12 of 17), and in the studies that compared different techniques, it is 66% (6 of 9).
The importance of sample size in relation to the timing of the assessments is provided by a consideration of the odds ratios and 95% confidence intervals of the cohort studies with controls that provided data on the percentage of participants with POCD. Figure 1Ashows these findings for the early assessments. It is apparent that the effects at this early period are sufficiently large to produce 95% confidence intervals that do not cross with unity even with relatively small samples. The one study that does cross the line of unity (indicating no difference to controls) is where minor surgery was examined. In contrast, in studies with postoperative assessments from 22 days up to 6 months, it was only in the largest study, that assessed patients at 84 days, where the confidence intervals did not cross with unity (fig. 1B). These figures demonstrate that to get a clear signal in this area, studies with large samples are required.
The studies that compared GA and RA as well as those comparing other techniques provide little evidence as to what may be responsible for any changes in cognition observed after surgery. It is only in orthopedic surgery that a putative mechanism has been identified in the form of microemboli (probably fat) that have been identified through transcranial Doppler studies of the middle cerebral artery during surgery.72The one study that did use transcranial Doppler in orthopedic surgery did not find any relation between the numbers of microemboli and changes in neuropsychological performance.27However, great caution must be exercised in interpreting the studies that varied aspects of anesthesia or surgery, because the majority were significantly underpowered.
The research has concentrated mainly on an older age group, with only nine studies30,33,34,38,52,56,59,60,70examining participants with a mean age of less than 60 yr. The evidence suggests that older participants are more likely to show POCD. In a large study with a control group (ISPOCD2 group), Johnson et al. 56compared patients aged 40–60 yr with a previous group aged over 60 yr and concluded that the younger group showed significantly less POCD at both 7 days (P = 0.0064) and 3 months (P = 0.026).
Designs that used a single group and examined change in performance over time cannot control for the influence of extraneous variables, especially the learning that occurs in many neuropsychological tests. This is an important consideration because any learning and resultant improvement in performance will have the effect of reducing the prevalence of POCD. The use of a design that involves a control group makes it possible to control for alternative explanations and specifically for any improved performance through learning. These designs do, however, raise the question of what constitutes an appropriate control group for this type of study. In this review, some studies have selected healthy individuals as controls53or relatives of the individuals under study,52whereas others selected patients with other conditions to those under study,54patients with the same condition but who did not undergo surgery as determined by the surgeon,55or patients who elected to have or not have surgery. Each of these approaches to select a control group has its strengths and weaknesses. In the case of healthy controls, the ability to learn is controlled for, but it is assumed that the patients under study would evidence an equivalent rate of learning. Studies using patients with another condition as controls assume equivalence between patients with different conditions, whereas those that use controls with the same condition are able to control for the illness. The problem for the latter is the ethical difficulty of random allocation to receive surgery or not to receive surgery. Where group allocation is determined by the surgeon, clinical factors may introduce bias. Whether the patients elect for surgery or not, it is likely that patient-related factors would differ between groups. The ethical issues of randomizing to receive or not receive surgery are obvious in cohort studies, but in studies comparing different techniques, this is more easily achieved (tables 3 and 4).
Neuropsychological assessments have been found to have sufficient sensitivity to be able to detect small and subtle cognitive changes that may occur after surgical procedures or medical treatment.62By necessity, the neuropsychological batteries chosen for investigations into the impact of surgery on the brain are often a compromise, balancing of the time constraints imposed by the clinical environment with the selection of sensitive and reliable tests. Ideally, to gain information on cognitive change these tests should be comprehensive and assess more than one domain. Where the definition of POCD involves a deterioration on a specified number of tests, conducting more tests will increase the probability of finding deficits, not only because of the number of tests used but also because more domains will be assessed.73,74The number of tests used in the studies under review was large, and these differences make it difficult to compare studies. Not only is the number of tests important, but whether they were drawn from separate domains can influence the findings. For example, one study22used five tests but only examined memory. It is likely that tests from the same domain are likely to show a greater correlation than tests drawn from different domains. Seventy different neuropsychological tests were used in these studies ( appendix). Studies used anywhere between 1 and 13 tests. Six studies28,39,41,43–45used only a generic screening test such as the MMSE or Abbreviated Mental Test, and seven further studies23,40,46,52,54,57,61used these assessments in conjunction with other tests; however, five of these studies26,40,46,54,61based their primary definition of decline solely on the generic measure. Nineteen (46.3%) explored three or more domains. Memory was assessed in all studies (including batteries), with 4 studies assessing memory only.
The extent of decline in neuropsychological scores necessary to be defined as POCD in the studies reviewed has made comparisons of the percentage of individuals with POCD across studies particularly complex. This is especially important because the numbers identified by different techniques show little agreement. For example, Mahanna et al. 75used five different criteria to define neuropsychological deficits after cardiac surgery and found a sixfold difference in the incidence of deficits (3.4–19.4%). Where two surgical groups or a surgical and a control group are compared within studies with the same criteria for POCD, the relative incidence of POCD can be established. However, the use of conventional cutoffs even in studies with two or more groups results in detailed continuous measures being reduced to a binary decision of POCD or no POCD. This is especially problematic because the point of demarcation is arbitrary and, if increased or decreased, may lead to different findings. It also applies to studies where it has been found that individuals assessed at different times may move from having POCD at one point to not having it at a later point and vice versa (e.g. , Rasmussen and Siermsa69). Small changes for those on the boundary of one category are likely to lead to significant shifts in the individuals identified as having POCD. At a more general level, there are a host of difficulties in making binary classifications of continuous data in general76and in examining POCD in particular.63
An alternative approach in studies where more than one surgical group is assessed is to consider postoperative cognitive change and to examine differences in scores between groups without applying a cutoff. This approach recognizes that some learning with repetition should be expected with most neuropsychological tests and assumes that this is the background against which the impact of surgery needs to be considered. Even when parallel forms are used to minimize learning, participants have been found to develop different strategies that can lead to an enhancement of their accuracy or speed on that test. One measure of cerebral damage is the inability to demonstrate learning on neuropsychological tests with repeated administrations. In this way, the presence or extent of learning can be used as the index of the relative success of an intervention to reduce the impact of surgery on neuropsychological function. This approach accepts that the retention of learning ability, coupled with a reduction in the extent of deficits, may lead to the intervention group showing greater learning than the surgical control group. In this approach, both learning and deterioration are taken into account in examining group differences in performance.77The use of group scores, however, does not enable individual differences in change in cognitive performance to be considered.
Overall, the research in this review has demonstrated that in the early weeks after major noncardiac surgery, a significant proportion of people show POCD, with the elderly being more at risk. Although the research here is generally negative, there is a little evidence that a reduced proportion of patients continue to show POCD up to 6 months after major surgery, although it has been suggested that this finding may be due to random variation. None of the studies have elucidated the possible mechanisms for any cognitive changes.
The research area suffers from a large number of underpowered studies and a range of other methodologic difficulties. These include the differences in surgery, participants, the diversity, number, and range of neuropsychological tests used with varying sensitivity to change and learning, and the variety of definitions used to classify individuals as having POCD. These differences make it difficult to compare across studies. To overcome some of the methodologic issues, it would be useful to recognize the arbitrariness of any definition of POCD and the difficulties that a binary definition introduces into a continuous measure of cognition. It may be useful to consider whether the term postoperative cognitive dysfunction has outlived its usefulness and acknowledge a need to examine cognition and cognitive change as a continuous measure such that changes in scores may be analyzed.
Given the difficulty of funding adequately powered studies, it is useful to consider whether it is timely to establish a consensus that specifies a limited number of tests to be used in all studies and the value of pooling data across studies to increase power in secondary analyses.
Appendix: Measures Used
Verbal and Language Skills
A1 Wechsler Adult Intelligence Scale–Revised (WAIS-R) Vocabulary
A2 WAIS-R Information
A3 WAIS-R Similarities
A4 Speed of writing
A5 Controlled oral word association test
A6 Boston Naming Test
Memory and Learning
B1 Rey Auditory Verbal Learning Test
B2 Continuous paired associate leaning
B3 Bushke verbal selective reminding test
B4 Wechsler Memory Scale (WMS)–Logical memory
B5 WMS–Visual reproduction
B6 WMS–Associative Learning
B7 WMS–Mental control
B9 WMS–Digit Total
B10 WMS–Personal and current information
B12 Taylor-Rey-Osterreith test battery
B13 Word list
B14 Randt memory test
B15 Prose passage/story recall
B16 Benton visual retention test
B17 Chandigarh memory scale
B18 Luria memory test
B19 Delayed recall test
B20 Visual Gestalt learning
B21 Picture recognition
B22 Recognition memory task
B23 Mattis-Kovner verbal recall
B24 Mattis-Kovner verbal recognition
B25 Benton visual recognition test
B26 Object learning test
B27 WAIS Digit Span
B28 Visual verbal leaning test
B29 Memory scanning test
B30 Unknown or unclear memory test (self-devised)
B31 Rivermead behavioral memory
B32 Fuld object memory
B33 Free recall task
B34 Baibizet and Cany visual recognition
Attention, Concentration, and Perception
C1 Attention and Concentration Index
C2 WAIS-R digit–symbol or similar
C3 Symbol digit modalities test
C4 Trailmaking test A
C5 Trailmaking test B
C6 Unclear vigilance task
C7 Letter or symbol cancellation
C8 Reaction time tests
C9 Digit vigilance
C10 Ishihara color plates
C11 Concept shifting tasks (trails)
C12 Flicker fusion threshold
C13 Two-point discrimination
C14 Visual search
Visual and Spatial Skills
D1 Hooper test (visual organization)
D2 WAIS-R block design
D3 Stroop color word interference
D4 Line drawings
D5 Bender-Gestalt test
D6 WAIS-R object assembly
Visuomotor and Manual Skills
E1 Finger tapping
E2 Purdue pegboard
E3 Digit/words copying tests
F2 Serial sevens subtraction
G1 Maze test
G2 Card sort test
H1 Mini-Mental Status Examination
H2 Shipley Hartford examination
H3 Wechsler Adult Intelligence Scale–Revised
H4 Abbreviated Mental Test
H5 Examen Cognitif per Ordinateur
H6 Mattis Organic Mental Screening Examination
H7 Wechsler Memory Scale
H8 Iowa Battery of Mental Decline
H9 Rivermead Memory Scale