Good postoperative recovery is increasingly recognized as an important outcome after surgery. The authors created a new Post-operative Quality Recovery Scale (PQRS) that tracks multiple domains of recovery from immediate to long-term time periods in patients of varying ages, languages, and cultures.
The parameters of importance to both clinicians and patients were identified. After an initial pilot study of 133 patients, the PQRS was refined. It consists of six domains (physiologic, nociceptive, emotive, activities of daily living, cognitive, and overall patient perspective). An observational study of 701 patients was performed with the refined PQRS to assess its capacity to evaluate and track recovery and to discriminate between patients. It was conducted in eight countries and in five languages, involving patients more than or equal to 6 yr undergoing elective surgery with general anesthesia. Recovery was assessed before surgery and at multiple time periods postoperatively. Recovery was defined as return to baseline values or better.
Seven hundred one patients completed the PQRS. Mean completion time was 4.8 (SD 2.8) min. Recovery scores improved with time. Physiologic recovery was complete in 34% of subjects by 40 min. By the third postoperative day, complete recovery was obtained in 11% of cases (all domains): 48.7% nociceptive, 81.8% emotive, 68.8% activities of daily living, and only 33.5% cognitive. Overall, 95.8% of the patients reported that they were "satisfied or totally satisfied" with their anesthetic care.
The scores on the PQRS demonstrated an improvement over time, consistent with an expected recovery after surgery and anesthesia, and an ability to discriminate between individuals. Many patients had incomplete recovery by the third postoperative day.
What We Already Know about This Topic
❖ Multiple domains of recovery, including physiologic, cognitive, and functional recovery, are important to patients and their caregivers, yet a single and simple assessment tool to include these domains has not been validated
What This Article Tells Us That Is New
❖ In more than 700 postoperative patients in eight countries, a Post-operative Quality Recovery Scale was applied, and recovery was distinguished among individuals across multiple domains
ANESTHESIOLOGISTS have frequently sought to evaluate the quality of anesthesia care that they provide. In the early years, specialty efforts were focused on the assessment of mortality, with the development of national surveys.1,2These surveys were initially based on voluntary reporting but later evolved into compulsory national studies, which in recent years have been focused on specific surgical areas.3,4As the science in the delivery of anesthesia improved and the rate of anesthetic-related mortality and morbidity declined, it became difficult to demonstrate improvements using these outcomes.5
With the development of ambulatory surgery, recovery from anesthesia has focused on a return to acceptable physiologic parameters, and simple scoring systems have been devised to denote readiness for discharge from hospital.6–9It has become increasingly evident, however, that to capture the broader and potentially long-term impact of anesthesia and surgery, these recovery scales are limited.10
At the start of this century, Myles et al. examined the concept of the functional quality of recovery. The approach adopted was to use patient ratings as the key to assessing the quality of recovery. This was followed by the increase in the use of patient-reported outcomes in recent years.11Although the reports given by patients are an important approach in assessing recovery, other crucial aspects are in dimensions likely or known to be influenced by surgery and anesthesia but which may not fall within the patient's conscious experience. For example, the patient-reported scales do not address the important issue of cognitive recovery.12–15The importance of neurocognitive decline after cardiac surgery has been identified for many years.16There is also a controversy that neurocognitive decline can also occur after noncardiac surgery and that it may be related to anesthesia delivery. This has raised new interest in recovery as a possible measure of the quality of anesthesia care and a target toward which innovation and improvement can be directed.17–19Other scales have been developed to assess postoperative recovery, some of which have focused on particular forms of surgery and others have approached this in a more general way.10
In 2007, a group of anesthesiologists with an interest in recovery commenced the development of a brief measurement tool to assess multiple domains of recovery, including cognition, over time. It was also intended for the tool to be applicable to patients with a wide age range, diverse languages, cultures, and physical abilities. The intention was to produce an instrument for assessment of recovery for multiple time periods to assess early and long-term recovery. This report describes the development and the initial feasibility study of the Post-operative Quality Recovery Scale (PQRS).
Materials and Methods
The use of the PQRS protocol was approved by the relevant human research ethics committees of each participating center, and written informed consent was obtained from all patients.
Research Group
The research group comprised nine anesthesiologists and two neuropsychologists, and it was assisted by a statistician. The project was funded by a research grant from Baxter Healthcare (Deerfield, IL). The first group meeting occurred in March 2007. This was followed by regular face-to-face meetings, several times per year, and teleconference and e-mail communication.
Development of the Tool
The group recognized that adoption of a tool would be related to its perceived value, ease of use, and brevity. At the first meeting, the group defined its objective to develop a measurement tool to evaluate postoperative recovery that could be performed by minimally trained staff and be suitable for repeated measurements that would be brief yet sufficiently complex to measure recovery in these multiple domains. The process of the development of the PQRS and arrival at the final scale is displayed in table 1.
The PQRS Tool
An example of the PQRS tool data collection sheet is shown in the appendix. Six domains of recovery (physiologic, nociceptive, emotive, activities of daily living (ADL), cognitive, and overall patient perspective) are described in table 2. Each domain comprised a series of questions. Nociceptive, emotive, ADL, and overall patient perspective contain observations that can be scored in a categorical fashion. In the cognitive domain, tasks receive a performance score. In the physiologic domain, values are transformed and categorized as acceptable, somewhat or far outside of the desirable boundaries, based on normative population data.
The multiple time points allow the assessment of recovery over time and at periods of critical clinical decision-making. Baseline testing in all domains is performed between 1 and 14 days preoperatively. Time zero (T0) was defined as the point after which anesthesia is no longer required. This is a complex definition as it is not the same for all operations. Five criteria were defined for the user to choose from. These include last skin stitch or painful stimulus, plaster dressing set, removal of endoscopic device, removal of intravascular device and completion of arterial compression, or application of final dressing. The immediate time point postoperatively is performed at 15 min (T15) and is principally designed to assess issues reflective of physiologic recovery, with relevance to patient safety and triage. The extent of early recovery has significant impact on perioperative workflow, respiratory complications, and rare adverse events.20–23Early measurement is performed at 40 min (T40) and is principally designed to assess recovery at the point of discharge from the postoperative anesthesia care unit. Late recovery refers to the measurements performed in the first week after surgery, and in the current study, it was performed at 1 and 3 days postoperatively (D1and D3). Long-term recovery, not reported in this article, is assessed at 3 months postoperatively (M3). In late and long-term measurements, the focus changes from physiologic and home-readiness recovery to cognitive recovery and return to previous or expected level of functioning at home or workplace.
After an initial pilot study of 133 patients, the PQRS was revised. This process identified that some aspects of recovery could not be satisfactorily performed at T15and T40. These included the ADL and overall patient perspective items. Equally, after hospital discharge, the physiologic domain was meaningless. Accordingly, the assessments were constrained with the physiologic domain being tested at T15and T40, and the ADL and overall patient perspective domains from D1onward. A number of items, including one cognitive test (letter deletion test), that were both time consuming and difficult to complete, were removed. The ADL tests were decreased from eight to four items to reduce duplication. The group considered that a reasonable minimum age to answer the PQRS questions would be 6 yr, subject to pilot and feasibility studies. Experience from the pilot study showed that children as young as 6 yr could answer the questions satisfactorily, although a greater incidence of failure to complete the questionnaire occurred in this group in the early time periods.
Scoring the Tool
Baseline measurements are critical to the use of the tool as the definition of recovery adopted by the PQRS group is “return to baseline values or better.” The overall patient perspective has no baseline measure and is not included in the assessment of recovery. It is a patient-reported outcome and as such mirrors previous work that attempted to assess recovery.13–15Measurements at each time point in the five other domains are scored and compared with values assessed before surgery. This is a conservative definition of recovery in the cognitive domain, as repeated measurement may enhance performance through learning effects.24This definition required further refinement in the physiologic domain as physiologic variables have broadly defined normal values. Recovery in this domain was classified into three levels that were scored at baseline and at all subsequent time points. They were scored as 3 if their values fell into accepted ranges, 2 if the values were abnormal, and 1 if they were extremely abnormal. (These levels were derived from the literature and are displayed in the appendix.25–34)
For each patient, values at each measurement time point are compared with baseline values as either recovered (return to baseline values or better) or not recovered. This is scored for all test items and then grouped by domain or by all domains. Any failure to recover for any questions within a domain renders the whole domains as “not recovered.” An example of this is shown in figure 1.
Fundamental to the design of the PQRS tool is that it is a flexible instrument. The assessment of recovery can be expanded or collapsed from a simplistic indicator of recovery (return to baseline values in all tests and all domains) or identify the failure of recovery in particular domains. Alternatively, it can be used to probe deeper and identify which aspect of a domain is problematic or even assess severity indicators within each domain. In addition, the timing of the assessments can be adjusted to the focus of interest. The PQRS is not a summative score, but is rather indicative of whether patients have either recovered or not recovered. This may apply for all domains, or by domains, depending on the research question. Comparisons can be made between groups on the incidence of recovery.
Feasibility Study
A prospective observational study of 701 patients was conducted to measure recovery with the PQRS for repeated time periods and provide the initial feasibility and “face validity” data on its use. One of the purposes of the study was to establish the feasibility of assessing patients over the repeated time points. Eight centers in Australia, Canada, China, France, Germany, Mexico, the United Kingdom, and the United States were involved.
The PQRS was used by research staff trained in its conduct, who were not responsible for the direct care of the patients. Patients were included if they were 6 yr or older, undergoing elective surgery with general anesthesia, and able to complete the PQRS testing in the provided language at baseline. Patients were not enrolled if they had any current psychiatric disturbance or were undergoing neurosurgery that could impair their ability to participate with the assessment. Sampling was “convenience sampling” with patients being recruited from the member institutions of the PQRS group and when research staff were available to conduct the testing.
Baseline measurement of the PQRS was conducted up to 14 days before surgery. Baseline demographic and intraoperative data were recorded by the attending anesthesiologist. After anesthesia was no longer required, the PQRS was repeated at 15 min, 40 min, first day, third day, and 3 months postoperatively. The 3-month dataset is not reported here and will be the subject of a subsequent publication. The PQRS was conducted face-to-face for baseline, T15, T40, and at D1and D3if the patient was still in hospital; otherwise it was conducted via telephone interview once the patient was discharged. This approach has been used with other assessments.35To help standardize the telephone assessment, the “faces” diagrams pertaining to questions were supplied to the patients to use at home. The questions and answers were read from the prescribed PQRS script. Data were recorded and scored according to the definitions and rating scales used in the PQRS ( appendix; table 2).
If patients were unable to complete all or part of the PQRS, the missing data were scored as patient refusal, assessor unable to initiate, or patient unavailable. In a number of cases, the assessors were unavailable to complete all the time points of the assessments. These data were excluded from the respective time-point analyses for each domain, and the numbers completing the assessments at each time point are displayed in the relevant tables. We considered this to be reflective of the feasibility of gathering data at all the time points. If a patient attempted a question in the cognitive domain but was unable to answer, then a score of 0 was assigned. This article presents the initial descriptive analyses of the data of 701 patients up to 3 days postanesthesia and surgery.
Statistical Methods
Data were collected and verified at each participating center before submission to the data manager for analysis. Data are presented as mean and SD or range and percent, and univariate analyses were conducted using chi-square analysis or Fisher exact test where appropriate and were analyzed using SPSS version 14.0 for Windows (SPSS Inc, Chicago, IL).
Results
From April 2008 to January 2009, 701 patients from eight countries were enrolled and participated using the PQRS (Australia, 92; Canada, 242; France 40, Germany, 143; Mexico, 89; United Kingdom, 46; United States, 42; and China, 7). Baseline and demographic details of the cohort are shown in table 3.
The time taken to complete the scale, rate of refusal, and those unable to complete testing are shown in table 4. The percentage of patients unable to answer at least 50% of the questions within each domain and categorized by age bands, gender, and language, for each time point is shown in table 5.
The percentage of patients returning to baseline or better (recovery) is shown for each test within domains in table 6. The percentage returning to baseline by domain and on the full PQRS is displayed in figure 2. The responses for overall patient perspective are shown in figure 3. Note that this measure did not have a baseline.
To determine the scale's ability to distinguish between individuals experiencing good versus poor recoveries, specific cases were examined. Examples of individual patient's performance demonstrating differences in good and bad recovery are shown in figure 4.
An example of the discriminant ability of the PQRS was examined by determining the relationship between the duration of surgery and the various subscale performances and is shown in figure 5. For this purpose, the duration of surgery was divided into three ranges based on the tertiles of data (< 60 min, 60–120 min, and > 120 min). By using the agreed definition of return to baseline or better, each test within each domain's score was dichotomously characterized as having returned to baseline or not. The resulting score of the number of tests returning to baseline was used to compare recovery in relation to the three durations of surgery and anesthesia. Analysis was performed at each time of assessment to account for the differing numbers of participants at each assessment point.
The physiologic domain consists of nine items. Chi-square analyses were conducted to examine the relationship between recovery and the three categories of anesthetic duration. A significant relationship between recovery and anesthetic duration was apparent at both time points where physiology was assessed, T15: chi-square = 34.4 (df = 12), P < 0.001 and T40: chi-square = 37.1 (df = 10), P < 0.001. Figure 5shows this relationship for T40. The nociceptive and emotional domains showed a trend toward improved recovery with shorter durations of anesthesia and surgery at each time point. These were statistically significant at the day 1 assessment of the nociceptive domain, chi-square = 18.11 (df = 4), P = 0.001 and the emotive domain, chi-square = 22.9 (df = 4), P < 0.001 (fig. 5). ADLs were assessed at days 1 and 3, and both time points showed significant differences between the extent of recovery and the anesthetic duration, day 1: chi-square = 60.8 (df = 8), P < 0.001 and day 2: chi-square = 54.15 (df = 8), P < 0.001 (fig. 5indicates the day 1 relationship). In the cognitive domain, the assessment of orientation showed limited ability to discriminate between individuals. Consequently, the cognitive domain was assessed using the remaining four tests, and this indicated a significant difference in cognitive recovery in relation to the duration of anesthesia and surgery for all measurements except T15, T40: chi-square = 57.06 (df = 8), P < 0.001; day 1: chi-square = 16.37 (df = 8), P = 0.037; day 3: chi-square = 19.82 (df = 8), P = 0.011, and T15: P = 0.51 (day 1 is shown in fig. 5).
Discussion
Defining “recovery,” after surgery and anesthesia, is a necessary step in the development of a recovery scale. The definition of recovery used in the PQRS is the concept of return to the presurgical state or even improvement. It is expected that patients will deviate from their presurgical state and then progressively recover over time. Integral to using this definition is the requirement to perform baseline testing before surgery. Postoperative values are then compared with baseline values to determine whether recovery has occurred. This definition caters to the wide range of baseline scores that will occur between patients. Many of the recovery scales developed to date do not include any assessment before surgery and anesthesia.
To establish face validity, the PQRS was developed over a period of time by a consensus of experts with diverse backgrounds and amended on the basis of empirical data to remove overlapping items and items that had floor or ceiling effects. Face validity means that the scale should show changes that are known and expected from clinical experience. The scale and its constituent parts demonstrated improved recovery over time, which is consistent with clinical experience.
This study shows that the PQRS is able to track recovery in multiple domains over time, in a wide age range of patients, and in multiple languages. With minimal additional training, researchers are able to perform the testing in a relatively brief time, and the use of face-to-face and telephone interviews provided minimal disruption to both patient and staff time.
One of the strengths of the PQRS is that it is a brief test to apply (approximately 5 min), making it feasible to use in many environments. It is acceptable to patients across a wide range of ages, languages, and types of surgery and has a low patient refusal rate. The percentage of patients unable to attempt the scale was very low for all time points other than the first time point T15. This demonstrates the feasibility of performing the test. Other than the very young patients, there were no major differences in the usability of the test across ages, gender, and languages. In contrast to older children, very young patients (6–10 yr) were most likely to be unable to answer questions at T15and T40.The usability in very young children needs more research to identify the optimal minimal age, as these data are small in number and predominantly from one institution conducting ear, nose, and throat surgery, where emergence agitation in young children is common. Further comparisons between languages are also necessary, but it requires more homogeneous cohorts, as the current dataset is too heterogeneous for meaningful comparisons between languages and cultures, other than to demonstrate that the PQRS can be completed in languages other than English.
An inability to answer questions at T15should not be seen as an inability to use the test at an early time period, but rather it is reflective of the state of recovery at the time. In a number of cases at this very early time period, many patients were unconscious at 15 min postsurgery (for example, the 35 patients undergoing cardiac surgery). The time points for assessment used in this feasibility study are not proscriptive but were chosen to assess recovery at time periods relevant to clinical recovery, such as emergence, postanesthesia care unit discharge, and return to home. It is recommended that researchers using this tool choose time periods that are practical and appropriate to the surgical cohort and research question. For example, T15and T40are very important for ambulatory surgery but meaningless for recovery after cardiac surgery where patients may still be intubated.
Some scales have examined internal consistency and structure by performing tests to examine the relationship between items and to ensure that items cohere around meaningful constructs.9,14The approach adopted in the development of the PQRS was to exclude items that demonstrated significant correlation with each other within domains, so as to get maximum diversity and retain the brevity of the scale.
The inclusion of both face-to-face or telephone interviews was a deliberate attempt to increase the ability of the tool to capture time points after hospital discharge, without the additional burden and cost to patients of attending in person to perform the tests. This technique follows the development of other brief assessments that have also used a telephone or mail assessment for patients no longer easily accessible.13–15,35–38
The richness of data captured by the PQRS differentiates it from other recovery scales. The multiple aspects of recovery from both clinician and patient perspectives allow for a far more complex assessment of recovery processes than is currently available in other scales of recovery.6,9,13–15,36–38These other scales most notably lack the formal assessment of cognition. Although it is only possible to perform a limited screening test of cognition in this brief assessment of recovery, the relatively high incidence of patients failing to recover by day 3 in the cognitive domain was surprising to us. Only a third of the patients returned to baseline values in all cognitive tests by day 3, although recovery on individual tests occurred in approximately two thirds of attempts. It is important for us to emphasize that the cognitive tests in the PQRS are not formally assessing postoperative cognitive decline (POCD), but rather cognitive recovery. POCD is a formal definition using a comprehensive battery of neurocognitive tests and with defined limits of deviation from baseline values to define whether POCD exists.18,19It is possible that delayed cognitive recovery, especially at the D3time period, may be a harbinger of POCD.17,39These data are consistent with other publications associating duration of anesthesia with recovery parameters or morbidity indicators and adds to the face validity of the PQRS. In the International Study of Postoperative Cognitive Dysfunction-1, there was a modest but significant increase in POCD with each additional hour of anesthesia at 1 week.40In volunteers subjected to 2, 4, or 8 h of anesthesia, performance on cognitive tests was better in the first 2 h of recovery with the shorter anesthetic duration.41Other studies have shown association with duration of anesthesia and pain,42nausea,43and physiologic recovery and safety.44Further, prospective research with defined cohorts is required to elucidate whether impaired recovery is because of the anesthetic or the type of surgery, as the current dataset is too heterogeneous for that level of analysis.
The four postoperative time periods were designed to capture recovery over time. The specific timing was designed to capture rate of recovery and maximize differences in recovery. It is also broadly aligned with clinical decision making such as discharge from the postanesthesia care unit, return to work, and long-term issues of cognitive function and return to normalcy. It is clear from the data that capturing all patients at all time points does place a significant burden on the assessors and requires a dedicated person to perform the assessments.
The recovery data described in this study will reflect the patient cohorts and surgical case load. In general, there was a bias toward outpatient and ear, nose, and throat surgery, younger and fitter, and with most patients discharged home by D3. For many clinicians, once the patients are discharged, their recovery is not further evaluated. The potential value of the PQRS is reinforced by the number of patients who demonstrate a failure to recover. Forty-seven percent of patients still reported pain at D3, 12% still reported nausea, 36.2% had not returned to full ADL, and 66.5% of patients had not returned to baseline cognitive function on all tests. Emotive recovery was high and occurred early, although it is possible that the emotive domain may have greater deviations from baseline in more acute or major procedures. Despite failure to recovery in many aspects, patient satisfaction with their operative experience was very high. Some other assessments of recovery assume that assessed cognition will reflect subjective reports of cognition or understanding and assume that individuals have insight into their cognition or changes in cognition.13Other data assessing postsurgery have demonstrated that there is a poor correlation between subjective report and objective assessment of cognitive decline,45,46and the former seems to reflect mood state rather than any insight into cognitive performance.
It is important to identify some limitations of this article. First, the definition of recovery involves forcing a binary outcome based on preoperative assessment. Where patients score very low values at baseline, then by definition, they are likely to demonstrate recovery. Our recommendation is that researchers using the PQRS with this definition of recovery will need to identify whether these patients should be included in their trial. Poor baseline performance may be an exclusion criterion to enrollment or reason for post hoc exclusion.
The data reported do not analyze performance over time as at this stage not all data were collected on all participants at each time point. At this stage of development of the scale, it was intended to demonstrate its applicability to assess recovery and to assess the capacity to perform assessments at each time point. The findings suggest that the intensity of measurement may not be applicable in all studies of specific forms of surgery. For short anesthesia and minor surgery, the earlier assessments may be applicable, but for more major surgery (e.g. , cardiac surgery), the later assessments may be more important.
The patient sampling was “by convenience” as the primary focus of the study was to assess feasibility and face validity. This produces a heterogeneous group of patients, which has restricted the extent of data analysis that can be performed. Hypotheses relating to causation of failed recovery are the subject of specific future studies, which are more tightly controlled or randomized, depending on the question being asked. The analysis investigating anesthesia duration and cognitive recovery was used to illustrate the potential of the PQRS to discriminate between what may be considered the “severity” of the procedure. This study was not intended to comprehensively define causation of failed recovery. The findings of this analysis should be considered as hypothesis generating, and the basis for future research, rather than hypothesis proving.
There are many forms of validation that can apply to a new scale, of which feasibility and face validation have been primarily addressed in this article. In the ideal setting, any new scale would be compared against “a gold standard.” Unfortunately, there is no gold standard for measuring recovery, which was the impetus for the group to develop a more sophisticated scale. Current tests, such as the Aldrete Scale, have never been validated. Despite this, the Aldrete Scale, which was developed from an observational study, and published without validation, has been widely adopted.6
A number of future studies are planned in the development of the PQRS. Studies are currently under way to examine discriminant validity, validate interview methods, provide nonoperative data, and test–retest reliability data. Comprehensive validation of the tool is a process rather than a single entity. Larger scale use of the PQRS by other researchers will investigate hypothesis-driven research on the causation, management, or prevention of poor recovery. It is envisaged that the PQRS will primarily be for research and specific audits, rather than applied as a routine audit tool, as it is still sufficiently time consuming for individual practitioners to complete while conducting their anesthetic practice. It is better suited to having a dedicated person completing the testing. Once the group has completed the further validation studies, however, the aim was to investigate the concept of a shortened version PQRS for the purpose of individual audit.
The inclusion of baseline data is integral to the concept of recovery used in the PQRS. It provides individual patient change data and is much more robust than current scales. It is, however, a clear limitation of the ease with which the scale could be used in a busy clinical environment. The logistics required to perform the PQRS in this way could interfere with the workflow of a busy anesthesiologist. The balance between brevity and richness of data is a delicate balance and to exclude baseline data would negate the ability for the scale to account for individual changes and the variety of performance of individuals. It is likely, therefore, that for many anesthesiologists, someone will have to be allocated to perform these assessments.
Conclusion
The PQRS is a brief tool that enables the assessment of recovery in multiple domains and over multiple time periods. It requires preoperative assessments that form the basis of later assessments and the scale demonstrates good face validity. It is of note that the data in this study demonstrated that recovery improves over time, but many patients still have delayed recovery by the third day postsurgery.
The authors thank the many research staff, nursing staff, and surgical and anesthesiology colleagues who have helped with the conduct of the study. In particular: Zelda Williams, M.Cur., Clinical Research Nurse, University of Melbourne, Melbourne, Victoria, Australia; Emma Farcas, M.B.B.S., and Smita Gupta, M.B.B.S., Research Fellows, University of Toronto, Toronto, Ontario, Canada; Mariana Herrera-Guerrero, M.D., Anesthesiologist, American British Cowdray Medical Center, Mexico City, Mexico; Patrick Doyle, M.B.Bch., Consultant Anesthetist, Charing Cross Hospital, London, United Kingdom; Andrew Smith, M.B.B.S., and Roger Cordery, M.B.B.S., Consultant Anesthetists, Heart Hospital, University College London Hospitals, London, United Kingdom; Robert Kong, M.B.B.S., Consultant Anesthetist, Royal Sussex County Hospital, Brighton, United Kingdom; and Shashi Hirani, Ph.D., Biomedical Statistician, University College London, London, United Kingdom, for statistical advice.