Abstract
Quality of recovery (QoR) after anesthesia is an important measure of the early postoperative health status of patients. The aim was to develop a short-form postoperative QoR score, and test its validity, reliability, responsiveness, and clinical acceptability and feasibility.
Based on extensive clinical and research experience with the 40-item QoR-40, the strongest psychometrically performing items from each of the five dimensions of the QoR-40 were selected to create a short-form version, the QoR-15. This was then evaluated in 127 adult patients after general anesthesia and surgery.
There was good convergent validity between the QoR-15 and a global QoR visual analog scale (r = 0.68, P < 0.0005). Construct validity was supported by a negative correlation with duration of surgery (r = −0.49, P < 0.0005), time spent in the postanesthesia care unit (r = −0.41, P < 0.0005), and duration of hospital stay (r = −0.53, P < 0.0005). There was also excellent internal consistency (0.85), split-half reliability (0.78), and test–retest reliability (ri = 0.99), all P < 0.0005. Responsiveness was excellent with an effect size of 1.35 and a standardized response mean of 1.04. The mean ± SD time to complete the QoR-15 was 2.4 ± 0.8min.
The QoR-15 provides a valid, extensive, and yet efficient evaluation of postoperative QoR.
Quality of recovery (QoR) after anesthesia is an important measure of the early postoperative health status of patients
The authors developed and tested a short-form version of the 40-question QoR-40
A 15-question version of (QoR-40) was tested in 127 adult surgical patients
The short version performed well in all dimensions and took only about 2.5min to complete
RECOVERY after surgery and anesthesia is a complex process dependent on patient, surgical, and anesthetic characteristics, as well as the presence of any of numerous adverse sequelae. Most studies evaluating recovery after anesthesia and surgery have focused primarily on physiological endpoints, recovery times, and the incidence of adverse events, such as major morbidity and mortality. Although these parameters are important and should be measured, they mostly ignore quality of recovery (QoR) from the patient’s perspective and therefore a variety of measurement tools have been developed.1–8
Myles et al.5 have previously developed and psychometrically evaluated two patient-rated postoperative QoR instruments: a brief nine-item QoR score, and a more comprehensive 40-item score, the QoR-40.6 The QoR-40 is a global measure of QoR.
Psychometric evaluation of the QoR Score revealed moderate validity and reliability, with most patients able to complete the questionnaire in less than 2min,5 indicating that it should be reserved for conditions where a simple, rapid evaluation is required. Evaluation of the longer QoR-40 instrument demonstrated superior validity and reliability compared with the QoR Score, with most patients able to complete the questionnaire in less than 10min.6 It incorporates five dimensions of health: patient support, comfort, emotions, physical independence, and pain, with a score range from 40 to 200.6,9 It has been found to provide a more extensive evaluation of a patient’s QoR, however we have found the feasibility of administering a 40-item questionnaire to be problematic in some circumstances.
The aim of this study was to develop a short-form 15-item postoperative QoR score, and test its validity, reliability, responsiveness, and clinical acceptability and feasibility in surgical patients with general anesthesia. Our hypothesis was that this instrument would retain the excellent psychometric properties of the QoR-40, while improving its clinical acceptability and feasibility, allowing it to be applied more widely in research and clinical practice.
Materials and Methods
Based on our previous experience and that of others,5,8–11 it was expected that a 15-item short-form instrument could reproduce the psychometric properties of the QoR-40 and yet be more user friendly. After literature review and consultation with experienced anesthetic and research nursing staff, items from the QoR-40 were selected based on their clinical importance, ease of interpretation, and relevance to a good patient-centered outcome. We included items from each of the five dimensions of the QoR-40. All QoR-40 items had been identified previously by patients, their relatives, nursing, and medical staff as important during the postoperative recovery period.5 Items were further selected on the basis of their correlation with QoR and their representation of the five QoR-40 dimensions (fig. 1).6
This process resulted in the QoR-15 questionnaire (fig.2). We undertook a post hoc factor analysis, which selected 12 of 15 items (of the 19 items used to create the QoR-15), with an Eigen value of 85%; the 19 items we ultimately selected to create the QoR-15 (fig. 1) had an adjusted R2 (explained variance) of 97% for the total QoR-40 score.
To optimize scaling properties and the option of obtaining verbal numerical responses more familiar to patients and staff because of pain rating scales, an 11-point numerical rating scale was constructed (for positive items, 0 = “none of the time” to 10 = “all of the time”; for negative items the scoring was reversed; maximum score 150).
After obtaining institutional review board approval (Melbourne, Victoria, Australia), a prospective observational cohort study of adult surgical patients undergoing surgery and general anesthesia was conducted. Patients included were required to be able to provide informed verbal consent and be available for follow-up at 24h (in person or by telephone). Patients were excluded if they had poor English comprehension, a psychiatric disturbance that precluded complete cooperation, a known history of alcohol or drug dependence, any severe preexisting medical condition that limited objective assessment after operation, the presence of any life-threatening postoperative complication, or were undergoing emergency surgery.
Patient demographic and preoperative data were collected at the time of enrollment. Intraoperative data regarding the type, nature, extent, and duration of surgery were collected from the patient’s anesthetic record and the hospital’s perioperative clinical information system. The type of surgery was classified according to surgical subspecialty. The nature of surgery was classified as elective or nonelective. The extent of surgery was classified as minor, intermediate, or major depending on the degree of and expected duration of surgery, as well as the expected postoperative inflammatory response. The duration of surgery was determined using the surgery start and stop times obtained from the hospital’s perioperative clinical information system.
Before or on the day of surgery, patients were asked by the investigators to complete the QoR-15 questionnaire as a measure of baseline (relatively healthy) status. They were then asked to repeat the questionnaire 24h postoperatively and also rate their overall postoperative recovery using a 100-mm visual analog scale (VAS), marked from “poor recovery” to “excellent recovery.” This provided an alternative global assessment of recovery. A subset of patients was asked to repeat the QoR-15 questionnaire 30–60min later (as a measure of repeatability). Patients who were discharged home on the day of surgery were contacted by telephone to complete the questionnaire. Patient demographic and perioperative data were also collected.
A full psychometric evaluation of the postoperative QoR-15 was then performed.12–14 This included:
Validity—This describes accuracy and was assessed using the following criteria:
Convergent validity: The QoR-15 was compared with the global QoR VAS score and interitem correlations were measured.
Construct validity: It was deemed that there would be an association between the QoR-15 and age, gender, duration of surgery, duration of stay in the postanesthesia care unit, duration of hospital stay, and time required for completion of the questionnaire.
Discriminant validity: Patients with complications, and those who had undergone a good or poor postoperative recovery, as defined by a global VAS of ≥70 or <70, would have a lower QoR-15 score.
Reliability—This describes consistency and was assessed on the basis of the following:
Internal consistency: The averaged correlation between each of the items within the QoR-15.
Split-half reliability: The correlation between random split segments of the QoR-15.
Test–retest reliability: A subset of patients (n = 25) was asked to repeat the QoR-15 a second time at around 30–60min later and the correlation between measurements was assessed.
Responsiveness—This describes an instrument’s sensitivity or ability to detect clinically important change. This was quantified using:
Acceptability and feasibility—These are measures of clinical user-friendliness and were assessed using:
Patient recruitment rate.
Successful completion rate.
Time taken for patients to complete the 15-item questionnaire.
Statistical Analysis
The sample size selected for this study was guided by our previous studies, as power calculations cannot be reliably determined with correlation analysis. Data are presented as mean ± SD, median (interquartile range), number (%) or 95% CI. All percentages are rounded up to the nearest integer. Associations were measured using Pearson (r) or Spearman rank (ρ) correlation coefficients. Internal consistency was measured using Cronbach α.16 Test–retest reliability was measured using the intraclass correlation coefficient (ri).17 Repeatability was also calculated from the within-subjects SD, based on the Bland-Altman method.18 Changes from baseline were compared using the paired t test. All statistical analyses were performed using SPSS for Windows v19.0 (SPSS Inc., Chicago, IL). The null hypothesis was rejected if two-tailed P < 0.05.
Results
Of the 146 patients approached in this study, 10 were ineligible and there were two refusals, resulting in a 99% recruitment rate; seven patients were excluded after recruitment (95% completion rate). Thus, there were 127 evaluable patients (age range 18–85 yr) recovering from many types of surgery. Patient demographic and clinical characteristics are presented in table 1. Perioperative characteristics of those undergoing minor, intermediate, and major surgery are available as a web-based table (see Supplemental Digital Content 1, https://links.lww.com/ALN/A916).
The mean time of assessment was 26h after surgery; range 22–49h. The mean time taken to complete the postoperative QoR-15 questionnaire was 2.4±0.8 (range 1–6) min. There was a correlation between the time taken to complete the QoR-15 and age, r = 0.30 (P = 0.001), as well as with extent of surgery, ρ = 0.18 (P = 0.04), but not American Society of Anesthesiologists physical status score, ρ = 0.06 (P = 0.56). These support construct validity of the score. The latter indicates the ease of use irrespective of the level of patient comorbidity.
Convergent validity was assessed by the correlation between the QoR-15 and VAS, with r = 0.68, P < 0.0005. The interitem correlation matrix is shown in table 2.
Construct validity was tested by comparing the QoR-15 score of patients having minor, intermediate, and major surgery, showing a significant decrease in QoR-15 score according to the extent of surgery; 118±20 versus 106±21 versus 92±23, respectively, P < 0.0001. Men had a higher QoR-15 score than women; 102±23 versus 97±25, P = 0.047. Patients who experienced postoperative complications had a lower score than those who did not; 91±13 versus 103±25, P = 0.002. There was a significant negative correlation between the QoR-15 and duration of surgery (ρ = −0.49, P < 0.0005), time spent in the postanesthesia care or intensive care units (ρ = −0.41, P < 0.0005), duration of hospital stay (ρ = −0.53, P < 0.0005), and time taken to complete the questionnaire (ρ = −0.28, P = 0.001). There was no relation between QoR-15 score and patient age (r = −0.02, P = 0.81).
Discriminant validity was determined by comparing patients who had a good or poor postoperative recovery, as defined by a global VAS of ≥70 or <70mm, respectively. The QoR-15 score differed significantly between these groups, 115±18 versus 85±20, mean difference 30 (95% CI, 23–36), P < 0.0001.
Reliability indices were high: internal consistency α = 0.85; split-half coefficient 0.78; and test–retest intraclass coefficient ri = 0.99 (all P < 0.0005). For agreement, the mean bias was small, −0.3 (95% CI, −1.9 to 1.3), P = 0.72.
The baseline and postoperative QoR-15 scores were 123±16 and 101±24, respectively. This indicates excellent responsiveness, Cohen effect size of 1.35 and a standardized response mean of 1.04. For the ambulatory or minor surgical subgroup (defined as surgery <60min), the Cohen effect size was 1.7. Changes in perioperative health status and responsiveness are summarized in table 3.
The QoR-15 had very good scaling properties. The 10th, 25th, 50th, 75th, and 90th centiles were 66, 83, 103, 118, and 130, respectively (fig. 3). For the 21 ambulatory surgical patients who needed to be contacted by telephone, the mean (SD) score was 118 (20) and kurtosis 1.29, consistent with a normal distribution.
Discussion
This study consisted of the development and prospective evaluation of a short-form 15-item patient-rated postoperative QoR score, the QoR-15. We chose a broad range of surgeries to maximally test the performance of the QoR-15 to demonstrate utility in many settings, including ambulatory surgery. The validity, reliability, responsiveness, and clinical acceptability and feasibility of the score were excellent, with most patients able to complete the questionnaire in less than 3min. We found that the QoR-15, with a score range of 0–150, was not limited in its capacity to discriminate those patients at the extremes of good or poor recovery, with only one patient with a score less than 50, and two with a score more than 145. Floor or ceiling effects are considered to be present if more than 15% of the subjects achieved the lowest or highest possible score, respectively.19
The QoR-15 was validated using a variety of endpoints. All of these measures support its ability to measure QoR. Content validity has been demonstrated previously.5 Convergent validity was moderate and comparable to the more extensive QoR-40.6 This exceeds published recommendations (correlation >0.60),14 despite being constrained by the use of a global VAS as an alternative assessment of recovery. The VAS is an imperfect scale without psychometric evaluation that overlooks the individual components of recovery and is prone to overrating. There is, however, no gold standard with which to compare the QoR-15. The QoR-40 cannot be used for this purpose because of their shared items for which the colinearity will produce spuriously high correlation.20
The evidence of construct validity was strong, with the QoR-15 able to differentiate between known determinants of postoperative recovery. The QoR-15 was able to discriminate between men and women, for it is known that women generally have a worse postoperative recovery.5,21 A negative association was demonstrated between the QoR-15 and duration of surgery, duration of time spent in the postanesthesia care unit, duration of hospital stay, and time taken to complete the questionnaire. There was no relation between the QoR-15 score and patient age. This finding is understandable as older people generally report less pain, nausea, and vomiting, and are more likely to score their health status and satisfaction with care more favorably.22,23
Discriminate validity was determined by comparing patients who had undergone minor, intermediate, and major surgery. The QoR-15 clearly distinguished patient groups and there was a significant decrease in QoR-15 scores among those having more extensive surgery. Discriminant validity was further confirmed by comparing patients who had a good or poor postoperative recovery, as defined by a global QoR VAS score. This classification of “good” and “poor” recovery is in part arbitrary. Although it might seem preferable to ask clinical staff to make this judgment, that approach devalues actual patient experience.
Internal consistency was measured using Cronbach α and split-half reliability. Both of these coefficients were high and satisfied published recommendations (0.70–0.90).14 These results were comparable to those obtained with the QoR-40,6 and indicate that the QoR-15 should provide reliable assessment for both group and individual measurements or comparisons. Internal consistency was also established using interitem correlation. Each item was internally consistent and correlated well with the total QoR-15 score. The coefficient values (0.29–0.85) indicate that there is little redundancy among the items, and that each item addresses a unique aspect of the recovery process.
Reproducibility (test–retest reliability) was excellent, and exceeded that reported for the QoR-40. We also used analysis of variance to determine how much of the total variability in scores is due to true differences between individuals and how much is due to variability in measurement.24 Our findings demonstrate that the QoR-15 is able to yield consistent results when evaluating test–retest reliability, allowing for confident interpretation of the QoR-15 score. There is no consensus about the length of time that should elapse between tests.12,14 We believe that the 30–60-min time period used in this study was a sufficient duration that patients were unlikely to recall their previous answers, but not so long that actual changes in their postoperative health status had occurred. It is possible that the test–retest coefficient is an underestimate in view of the general ongoing improvement in patients’ health status after operation; however, this contention is unlikely given the short duration between assessments and the negative test–retest bias.
The responsiveness of the QoR-15 was assessed using Cohen effect size and standardized response mean.24 Both of these measures are expressed in standardized units (0.2 being considered small, 0.5 as medium, and 0.8 or greater as large) that permit assessment of the relative size of a change, in this case, overall QoR.15 The QoR-15 had an effect size of 1.35 and a standardized response mean of 1.04. These values exceed those obtained with the QoR-40,6 and suggest a very strong ability to detect a clinically important change in QoR, even for small numbers of patients. It is thus an eminently suitable patient-centered outcome measure for clinical trials. Responsiveness is the most important psychometric index for evaluative instruments,12 that is, those intended to measure a change in health status “outcome.”
For individual items, effect size values ranged from 0.04 to 3.09 and standardized response mean values from 0.03 to 1.08. All items were affected by surgery and anesthesia, and almost all displayed moderate to excellent responsiveness. The one exception was the item related to the patient’s support provided to them by hospital staff. This finding is understandable, as support from staff should be consistently high, irrespective of any adverse effects of surgery or anesthesia. It is, however, an important component of the patient’s experience after surgery5 and warrants inclusion in the QoR-15 instrument. Some of the baseline values obtained preoperatively may be underestimates given that many patients were probably anxious, medically unstable, or in pain before their procedure. These circumstances do not provide an ideal baseline for comparison. A QoR-15 measurement after complete recovery may have achieved a better comparator, but this assumes that a complete recovery will occur in all cases.
The acceptability and feasibility of the QoR-15 was assessed using recruitment rate, successful completion rate, and time taken to complete the questionnaire. There was a high rate of participation and successful completion, and most patients were able to complete the questionnaire in less than 3min. These findings represent a significant improvement when compared with the QoR-40 and other instruments.6–8 This highlights the clinical usefulness of the QoR-15.
That is, patients did not find the questionnaire difficult to understand or burdensome. We attribute this to the reduction in the number of items and the use of a simple 11-point numerical rating scale. Acceptability of health status instruments is important to ensure high response rates, making results of trials easier to interpret, more generalizable, and less prone to bias from nonresponse.25
These features mean that the QoR-15 can be printed on a single page, read, and completed quickly. This minimizes the time required to train staff to use the QoR-15 and represents increased feasibility when compared to the longer and slightly more complex QoR-40. This is important, as excessive burden to staff may jeopardize trial conduct and disrupt clinical care.25 Furthermore, staff attitudes and acceptance of the value of an instrument can make a substantial difference to its ultimate acceptability by patients.25
Limitations of the Study
This study was conducted in a single, university-affiliated tertiary hospital in Australia. We excluded those with poor English comprehension, severe preexisting medical conditions, and having emergency surgery. There were relatively few ambulatory surgical, gynecological, and urological patients. The QoR-15 thus needs further validation in these and other settings.
The QoR-15 provides a valid, reliable, responsive, and easy-to-use method of measuring the quality of a patient’s postoperative recovery. When compared with the QoR-40, the QoR-15 provides an equally extensive, yet more efficient evaluation of a patient’s QoR after anesthesia and surgery. The QoR-15 can be a valuable outcome measure in perioperative clinical trials, and for assessing the impact of changes in health care delivery for quality assurance purposes.