Several quality of recovery (QoR) health status scales have been developed to quantify the patient’s experience after anesthesia and surgery, but to date, it is unclear what constitutes the minimal clinically important difference (MCID). That is, what minimal change in score would indicate a meaningful change in a patient’s health status?
The authors enrolled a sequential, unselected cohort of patients recovering from surgery and used three QoR scales (the 9-item QoR score, the 15-item QoR-15, and the 40-item QoR-40) to quantify a patient’s recovery after surgery and anesthesia. The authors compared changes in patient QoR scores with a global rating of change questionnaire using an anchor-based method and three distribution-based methods (0.3 SD, standard error of the measurement, and 5% range). The authors then averaged the change estimates to determine the MCID for each QoR scale.
The authors enrolled 204 patients at the first postoperative visit, and 199 were available for a second interview; a further 24 patients were available at the third interview. The QoR scores improved significantly between the first two interviews. Triangulation of distribution- and anchor-based methods results in an MCID of 0.92, 8.0, and 6.3 for the QoR score, QoR-15, and QoR-40, respectively.
Perioperative interventions that result in a change of 0.9 for the QoR score, 8.0 for the QoR-15, or 6.3 for the QoR-40 signify a clinically important improvement or deterioration.
In a study of 204 patients, the minimum clinically meaningful differences in the quality of recovery scales were 0.9, 8.0, and 6.3 for the 9-, 15-, and 40-item versions, respectively.
Supplemental Digital Content is available in the text.
Several quality of recovery health status scales have been developed to assess patient experience after surgery, but the size of difference within scales representing the minimum clinically meaningful difference to patients has not been defined
In a study of 204 patients, the minimum clinically meaningful differences in the quality of recovery scales were 0.9, 8.0, and 6.3 for the 9-, 15-, and 40-item versions, respectively
THE effectiveness and safety of modern anesthesia has led to a stronger emphasis on patient-centered outcome measures.1–3 These outcome measures assess quality of life, patient satisfaction, disability-free survival,4 and aspects of the postoperative experience that encompass well-being, physical functioning, and comfort.5,6
Our group has developed a range of quality of recovery (QoR) scores that provide a patient-centered global measure of overall health status after surgery and anesthesia. The first was the 9-item QoR score,2,7 the next was a more comprehensive 40-item QoR-40 scale,8,9 and the most recent was the 15-item QoR-15 scale.10 These instruments have undergone extensive psychometric testing, but to date, we have yet to ascertain the smallest change in score that still constitutes a meaningful change in health status. This has been referred to as the minimal clinically important difference (MCID),11,12 the “smallest real difference,”13 and the “tolerance interval.”14 MCID can be estimated using distribution-based and anchor-based methods.11,15
Distribution-based methods are based on the statistical variability of assessment scales.11 The simplest has been the SD/2 rule, previously recommended for quality of life instruments.15 This concept was intended for those with chronic disease and so may not apply to those with acute health conditions and varied expectations such as typically occurs after surgery.
Another approach is to use an anchor-based method, which relies on repeat patient ratings that quantify the extent of change (i.e., improvement or deterioration) with treatment or a recovery period using a Likert scale.11,16–20 This method calibrates (“anchors”) the change in condition perceived by patients relative to their baseline state.
The aim of this study was to determine the MCID for the QoR score, the QoR-15 scale, and the QoR-40 scale, in order to assist physicians to interpret the results of perioperative studies and to plan future trials.
Materials and Methods
This observational study enrolled a sequential, unselected cohort of patients recovering from surgery at three Australian hospitals (a university-based inner city hospital with a complete variety of all adult, nonobstetric services; a university-based inner city obstetric and gynecology hospital; and a rural hospital with mixed surgical [excluding cardiothoracic and neurosurgery] and obstetric services). Institutional review board approval was obtained at each site.
Male or female patients aged 18 yr or older were eligible for enrollment if they were recovering within 4 to 48 h of surgery that included general or neuraxial anesthesia. Exclusion criteria were poor English comprehension, drug or alcohol dependence, psychiatric disorder, or any current disorder impairing accurate and objective completion of the questionnaires (e.g., delirium or uncontrolled pain). Patients were informed that three brief questionnaires were to be completed on 2 separate days, with assistance if required. The option of a third visit was offered in order to further evaluate change scores. Patients who agreed were offered a plain language statement and asked to provide informed consent.
Data collection included relevant variables known to affect health status or QoR. This included preoperative demographic, medical, and surgical details. Extent of surgery was classified by the anesthesiologist-researcher and judged according to the expected tissue damage and neurohumeral stress response (Supplemental Digital Content, table 1, http://links.lww.com/ALN/B274).
Measurement of Quality of Recovery
Three QoR scales were used (provided as Supplemental Digital Content, http://links.lww.com/ALN/B274, pages 2 to 7). The order of presentation was according to the number of items in each scale (QoR score, QoR-15, and then QoR-40). We have clarified this in the Materials and Methods. The QoR score, QoR-15, and QoR-40 have ranges in scores of 0 (extremely poor recovery) to 18, 150, and 200 (all, excellent recovery), respectively.2,8,10 The QoR-40 has five dimensions: (i) physical comfort (12 items), (ii) emotional state (9 items), (iii) physical independence (5 items), (iv) psychologic support (7 items), and (v) pain (7 items).8 We surveyed patients using each of these scales on two occasions, and if the patient remained in hospital and was available for interview, then the patient was surveyed on three occasions, each approximately one day apart.
Measurement of Change in Health Status
Anchor-based determination of the MCID was calculated as the change in mean score of each QoR scale according to the patient’s assessment of his or her change in health status.18,21 This consisted of asking patients, “How would you rate your overall recovery from surgery since yesterday?” We used a 15-point scale, ranging from –7 to +7, to measure their response:11,18,21
−7: A very great deal worse
−6: A great deal worse
−5: A good deal worse
−4: Moderately worse
−3: Somewhat worse
−2: A little worse
−1: Almost the same, hardly any worse at all
0: No change
1: Almost the same, hardly any better at all
2: A little better
3: Somewhat better
4: Moderately better
5: A good deal better
6: A great deal better
7: A very great deal better
In addition, we asked patients, “In your opinion, have you made a good recovery from your operation?” with response options of yes, no, or unsure. These responses were used to classify patients as having made a good recovery or otherwise.
A formal sample size calculation could not be reliably calculated for the determination of MCID. Previous influential studies have enrolled 40 to 100 subjects.16,18 We planned to enroll at least 150 subjects to enable sensitivity testing to evaluate heterogeneity (variations of MCID across gender and obstetrics).
There is no consensus in the literature on the optimal method of MCID estimation, and so several methodologies were employed. Recommendations for distribution-based determination of the MCID include calculation of 0.5 SD,15 0.2 SD,22 0.3 SD,17 and standard error of the measurement (SEM).23 Others have used 5 to 10% of the instrument range,24 but this is likely to be dependent on the scaling properties and study population. In view of these discrepancies, we planned to use 0.3 SD, SEM, and 5% range.19 The SEM was calculated as the SD multiplied by the square root of 1 minus the intraclass correlation coefficient.23
Patients whose score on the global rating of change questionnaire was 0, 1, or −1 were classified as unchanged.18 Patients whose score was 2, 3, −2, or −3 were considered to have experienced a small change equivalent to the MCID; those with scores of 4, 5, −4, and −5 were considered to have experienced moderate change, and those with scores of 6, 7, −6, and −7 were considered to have experienced large change.18 Absolute (i.e., we changed the sign of the scores for those who deteriorated) mean changes in QoR scores according to patient-rated change in postoperative recovery health status were then calculated. MCID was a priori determined by triangulating (averaging) the 0.3 SD, SEM, and 5% range rules, and the minimal change was determined by the anchor-based method.
Data are presented as mean ± SD or number (%) unless otherwise specified. Changes in QoR scores at each interview were compared with paired Student’s t test. Both the unpaired Student’s t test and independent samples median test were used to compare QoR scores in those with a good recovery or otherwise. An estimation of the minimum absolute score for each QoR scale—the patient-acceptable symptom state21,25 —was achieved using the 25th centile for each QoR scale in those who rated their recovery as good at the second postoperative interview. The association between changes in the anchor values and changes in scores was quantified by the Pearson correlation coefficient (r).
For each of the QoR scales, we determined test–retest reliability using the intraclass correlation coefficient and internal consistency using Cronbach α.26 Responsiveness was measured in those with global rating of change scores of at least ±4, using standardized response means, calculated as the mean change divided by its SD27,28 ; accepted thresholds for moderate and large effect sizes are 0.5 and 0.8, respectively.28 All statistical analyses were performed using SPSS for Windows V20.0 (SPSS Australasia Ltd., Australia). A P value less than 0.05 was considered significant; no correction was made for multiple comparisons.
Patients were enrolled between August 2014 and July 2015. Of the 205 patients enrolled at the first postoperative visit, 199 were available for a second interview, and 24 patients were available for a third interview. There were no patient refusals, but one patient withdrew participation during the first visit and another four were unavailable for follow-up.
The study population included a broad range of patients and surgical procedures (table 1). Mean QoR, QoR-15, and QoR-40 scores tended to be higher in those recovering from less extensive surgery (table 2); all P < 0.02. The frequency of each level of change in health status between the first and second postoperative visits is presented in table 3. The QoR scores improved significantly between the first and second visits (table 4); all P < 0.0005.
There was a moderate correlation between the patient-rated change in postoperative health status (global score) for the QoR, QoR-40, and QoR-15 scores between the first two postoperative visits, r = 0.41, r = 0.40, and r = 0.38, respectively; all P < 0.0005. Similar results were obtained between second and third visits (Supplemental Digital Content, http://links.lww.com/ALN/B274, page 8).
The distribution-based estimates of SD, score range, and SEM are presented in table 5. The changes in QoR scores according to patient-rated change in postoperative recovery health status are presented in table 6. The change scores tended to be larger in patients who had a small improvement as opposed to a small deterioration in their health status (table 6). Triangulation resulted in an MCID of 0.92, 8.0, and 6.3 for the QoR score, QoR-15, and QoR-40, respectively. Sensitivity analyses found that male and female patients, and women recovering from cesarean section, have similar MCID scores for each QoR scale: 0.80, 5.5, and 4.3; 0.72, 6.0, and 4.4; and 0.5, 3.6, and 2.9, respectively.
Cronbach α (internal consistency) of the QoR score, QoR-15, and QoR-40 at the first postoperative visit (n = 204) was 0.66, 0.81, and 0.88, respectively. The standardized response mean (responsiveness) of the QoR score, QoR-15, and QoR-40 was 0.58, 0.86, and 0.69, respectively.
Those who did not report a good QoR (11%) had significantly lower QoR scores at the second postoperative visit compared with those who reported a good QoR (89%), see table 7. The 25th centile for the QoR score, QoR-15, and QoR-40 in those with a good recovery was 16, 118, and 180, respectively. The 75th centile for the QoR score, QoR-15, and QoR-40 in those who did not have a good recovery was 16, 112, and 179, respectively.
We found that the MCID for the QoR score, QoR-15, and QoR-40 is 0.9, 8.0, and 6.3, respectively. That is, perioperative interventions that result in such a change can be interpreted to signify a clinically important improvement or deterioration in QoR. This facilitates interpretation of studies using any of these scales as outcome measures.29–33
Our findings were consistent in male and female patients and in those recovering from nonobstetric surgeries and cesarean section, the latter done using spinal anesthesia. The distribution-based methods were quite consistent across each derived statistic and for each QoR scale. However, the anchor-based method did not display clear discrimination in that there was partial overlap across the change categories and their variance was large relative to the point estimates. This is a common finding using the anchor-based method.11,16,19,34 There is likely to have been some recall bias and perhaps cognitive (memory) impairment. It probably weakened the anchor-based approach somewhat, but we believe that inclusion of this method properly weights the patient’s experience. QoR and the anchor-based method measure slightly different constructs, in that the former is focused on comfort and functional independence, whereas the latter is a subjective rating of the patient’s recovery. This is an unavoidable issue when using a single anchor-based question and comparing it to a multidimensional health status questionnaire. One cannot measure recovery status with a single question. We, thus, used several methods of MCID estimation, and this is one of the strengths of our study.17 Furthermore, there is likely to be a ceiling effect of each scale in that many rated their recovery very highly and so could not rate their further recovery with a higher score.
Each of the QoR scales offers very good evaluative and discriminatory ability for quantifying changes in postoperative health status,27,28 and so are ideal measures of patient-reported QoR. We were specifically interested in estimating the between-person MCID in order to inform comparative studies of groups of individuals in future studies and thus included several distribution-based methods in our triangulation approach to calculate the MCID.
The main strengths of our study are that we included a broad range of patients and surgical settings, including those receiving general and/or regional anesthesia, and women recovering from cesarean section. We confirmed the excellent psychometric indices of the QoR scales, with both the QoR-15 and QoR-40 in particular having excellent reliability and responsiveness in the postoperative setting.
In conjunction with an assumed SD, the MCID is often used to identify the appropriate effect size,35 including for equivalence or noninferiority,36 when determining the sample size required to achieve adequate statistical power in clinical trials. Our results offer reliable values to inform these calculations.
Our results suggest that the patient acceptable symptom state for the QoR score, QoR-15, and QoR-40 is 16, 118, and 180, respectively. Each of these does not overlap with the 75th centile of scores in those who rated their recovery as less than “good” (table 7). If we accept that the main objective for the patient is to reach a state they consider acceptable,25 then these cut-off scores can be considered a benchmark for perioperative care. These cutoff scores can be used to dichotomize patient outcome in future perioperative clinical studies, including an opportunity to calculate the number needed to treat with effective interventions.25
This study has several limitations. Other studies have evaluated different anchor-based approaches.16,37 The responsiveness of the MCID scores needs to be further evaluated in the setting of new and effective therapies. The study was undertaken in Australia, and despite including patients from non-English-speaking backgrounds, results may not translate to other settings. We made numerous comparisons when testing three scores across three time periods (table 4). Such multiple testing will inflate the α error. The data from the third follow-up visit (n = 24) were sparse and likely underpowered.
In conclusion, we have determined the MCID and the patient-acceptable symptom state of several QoR scales; this information can be used to design and interpret perioperative clinical studies utilizing any of these instruments as patient-centered outcomes.
The authors thank Andrea Ditoro, B.Sc.(nurs) and Catherine Farrington, B.Sc.(nurs), Department of Anaesthesia and Perioperative Medicine, Alfred Hospital, Melbourne, Victoria, Australia, and Elizabeth Leeton, B.Sc.(nurs), Department of Anaesthesia, Royal Women’s Hospital, Parkville, Victoria, Australia, for their assistance with data collection.
Supported by an Australian National Health and Medical Research Council Practitioner Fellowship (APP1042462, Canberra, Australian Capital Territory, Australia; to Dr. Myles) and by internal department funds.
The authors declare no competing interests.