Anesthesiologists need tools to accurately track postoperative outcomes. The accuracy of patient report in identifying a wide variety of postoperative complications after diverse surgical procedures has not previously been investigated.
In this cohort study, 1,578 adult surgical patients completed a survey at least 30 days after their procedure asking if they had experienced any of 18 complications while in the hospital after surgery. Patient responses were compared to the results of an automated electronic chart review and (for a random subset of 750 patients) to a manual chart review. Results from automated chart review were also compared to those from manual chart review. Forty-two randomly selected patients were contacted by telephone to explore reasons for discrepancies between patient report and manual chart review.
Comparisons between patient report, automated chart review, and manual chart review demonstrated poor-to-moderate positive agreement (range, 0 to 58%) and excellent negative agreement (range, 82 to 100%). Discordance between patient report and manual chart review was frequently explicable by patients reporting events that happened outside the time period of interest.
Patient report can provide information about subjective experiences or events that happen after hospital discharge, but often yields different results from chart review for specific in-hospital complications. Effective in-hospital communication with patients and thoughtful survey design may increase the quality of patient-reported complication data.
In a study of over 1,500 subjects more than 30 days after surgery, patient-reported outcomes, compared to automated or manual chart review, demonstrated poor-to-moderate positive agreement (0 to 58%) and excellent negative agreement (82 to 100%). Discrepancies frequently reflected patients reporting events that occurred outside the time period of interest, suggesting that more effective in-hospital communication and thoughtful survey design may improve the value of patient-reported outcomes.
Supplemental Digital Content is available in the text.
Patient-reported outcomes will play an important role in healthcare research and reimbursement, but their concordance with medical record data in the perioperative period has not been examined
In a study of more than 1,500 subjects more than 30 days after surgery, patient-reported outcomes, compared to automated or manual chart review, demonstrated poor-to-moderate positive agreement (0 to 58%) and excellent negative agreement (82 to 100%)
Discrepancies frequently reflected patients reporting events that occurred outside the time period of interest, suggesting that more effective in-hospital communication and thoughtful survey design may improve the value of patient-reported outcomes
IN order to assess and improve perioperative care, it is essential to track postoperative patient outcomes. Together with other healthcare professionals, anesthesiologists have a strong incentive to monitor these outcomes. This is becoming ever more pressing with the imminent introduction of pay-for-performance models and with the advent of the perioperative surgical home concept, where the role of the anesthesiologist at some centers is expanding to involve all aspects of perioperative patient management, including postoperative care. To justify this expanded role, anesthesiologists must provide evidence that the perioperative surgical home model improves patient outcomes, including reduced complication rates both before and after hospital discharge.1,2 Patient-reported outcomes (PROs) will likely play a role in achieving this aim. Ever since Guyatt et al.3 published an influential 1993 article on the measurement of health-related quality of life, a variety of PROs have been developed to measure overall health or to measure disease-specific domains. The proliferation of studies reporting PROs led to an extension of the Consolidated Standards of Reporting Trials guidelines to specifically address the use of PROs in randomized trials4 and instructions from the Food and Drug Administration regarding the use of PROs in trials for labeling of medical products.5 Wu et al.6 recently published a checklist to help clinicians interpret research studies that use PROs. In fact, PROs have become so widespread that multiple major journals have published viewpoint articles suggesting that PROs be collected regularly in all clinical settings.7,8
Although PROs are widely used to measure largely subjective outcomes, such as quality of life, functionality, and pain, they have not commonly been used to measure more objective medical outcomes such as myocardial infarction. Obtaining such outcomes by patient report rather than by chart review is desirable because chart review is unable to capture events that happen at home or events for which patients do not seek care at affiliated medical facilities. When compared to the medical record, nonsurgical patients self-report elements of their medical history with high negative agreement (range, 64 to greater than 99%) but with low positive agreement (range, 1 to 85%).9–16 This means that one modality almost always reports a symptom as absent when the other modality reports that symptom as absent, but the two modalities frequently disagree when one modality reports a symptom as present.17 Similar results were obtained in a study examining patient report of medical complications after bone marrow transplant.18 The accuracy of patient reports of medical complications has not been investigated as thoroughly among surgical patients. Existing studies have been limited to either a single type of surgery19–21 or to a single postoperative complication.22–24 Additional characterization of PROs in a diverse surgical population is necessary if they are to be used to identify postoperative complications for clinical care, research, or quality improvement. The purpose of this study was to examine the convergent validity of PROs and medical record review in detecting postoperative complications in the first 30 days after a wide variety of surgical procedures.
Materials and Methods
The Human Research Protection Office at Washington University approved this study. This analysis is a part of the Systematic Assessment and Targeted Improvement of Services Following Yearlong Surgical Outcomes Surveys (SATISFY-SOS) project (NCT02032030). All patients who completed surveys provided written, informed consent when visiting the Center for Preoperative Assessment and Planning. A waiver of consent was obtained to extract data from the medical record of surgical patients who did not visit the preoperative clinic or did not enroll in SATISFY-SOS during their preoperative clinic visit. Patients aged 18 yr or older were eligible if they were receiving anesthesia services for any procedure at Barnes-Jewish Hospital (St. Louis, Missouri) or an affiliated hospital between July 2012 and June 2013. This includes patients receiving general anesthesia, regional anesthesia, monitored anesthesia care, and procedural sedation.
Patients received a questionnaire approximately 30 days after their procedure. In addition to previously validated instruments such as the Veterans RAND 12-item short form25 and the Barthel Index,26 the survey included a question asking what complications patients experienced while in the hospital recovering from their procedure. Patients could select from a list of 18 complications, write in another complication, indicate they prefer not to answer, or indicate they experienced no complications (table 1). Patients could select more than one complication. The survey contained a separate question about complications after hospital discharge, which was not included as part of this investigation. The survey was conducted in English only.
The survey was sent by e-mail if the patient provided an e-mail address at the time of consent. If the patient did not provide an e-mail address or did not respond to the e-mail within 14 days, then a paper copy of the questionnaire was sent through the mail. If the patient did not return the mailed survey within 21 days, a second copy was mailed. If the patient did not return the second mailed survey within 14 days, then the patient was telephoned and asked the questions over the phone. If there was initially no answer at the telephone number, then the patient was telephoned at least once per week, up to five times. Surveys returned and electronically processed by June 2013 were included in the analysis.
Automated Chart Review
An automated computer algorithm was used to review the patient’s electronic medical record from the time of the procedure until hospital discharge. Researchers were blinded to survey responses during development of the algorithm. The algorithm is described in full in the supplemental appendix (Supplemental Digital Content 1, http://links.lww.com/ALN/B268). Most of the complications were identified using International Classification of Diseases, 9th edition, codes present at the time of discharge, after the exclusion of admitting diagnoses. Arrhythmias were identified using new rhythm abnormalities on electrocardiogram reports, a new prescription for amiodarone, or the performance of a cardioversion. Angina was defined as pain with documented location of “substernal” and severity of at least 7/10. Respiratory failure was defined using procedure codes for mechanical ventilation. Severe pain was defined as a pain score of at least 7/10 on postoperative day 1 or later. Severe nausea and vomiting was defined as the administration of an antiemetic (ondansetron, metoclopramide, scopolamine, prochlorperazine, or dexamethasone) on postoperative day 1 or later. The automated chart review was performed for all patients who returned the survey.
To allow for assessment of selection bias, the automated chart review was also performed for a matched cohort of patients who did not enroll in this study. To allow for assessment of follow-up bias, the automated chart review was also performed for a matched cohort of patients who consented to the study but did not return the survey. Patients were matched using the propensity score matching technique based on age, sex, race, body mass index, American Society of Anesthesiologists physical status, and comorbidities (coronary artery disease, previous myocardial infarction, congestive heart failure, heart valve disease, atrial fibrillation, history of pacemaker or defibrillator, previous cerebrovascular accident, previous aortic disease, history of venous thromboembolism, diabetes, pulmonary hypertension, end-stage renal disease requiring dialysis, chronic obstructive pulmonary disease, obstructive sleep apnea, liver cirrhosis, previous or current cancer, gastroesophageal reflux disease, previous anemia, previous thrombocytopenia, previous positive Coombs test, and dementia). Propensity scores were calculated using logistic regression in which the dependent variable was enrollment in the study (for the first matched cohort) or return of the survey (for the second matched cohort). Matching was performed using the nearest neighbor method with replacement using a caliper of 0.10.
Manual Chart Review
A subset of the patients who returned the survey was selected for manual chart review. All patients who reported a complication on the survey, as well as a random sample of the remaining survey respondents, were included in this subset. Each chart was systematically reviewed by two independent, trained researchers. The reviewer used the contents of the medical record to determine whether each complication had been diagnosed while the patient was in the hospital (i.e., if a progress note or discharge summary referred to the complication or if diagnostic tests supported the diagnosis). Each researcher discussed potential ambiguities with a third independent researcher. Discrepancies between these two manual data extraction processes were resolved by manual chart review undertaken by a fourth independent researcher. The records reviewed included anesthesiologist and recovery room notes, admission notes, progress notes, discharge summaries, laboratory values, and reports from imaging studies and diagnostic tests. Chart reviewers were blinded to survey results. The two independent manual chart reviews were reliable, achieving near-perfect agreement for eight complications (Cohen’s κ values ranged from 0.5 to 1.0).
The characteristics of the patient population were described using frequency and percentage for categorical variables and using median and interquartile range for nonnormally distributed continuous variables. The incidence of complications found by each modality was described using frequency and percentage. Agreement between the survey results and the automated chart review was quantified using positive agreement, negative agreement, and Cohen’s κ statistic. By definition, the κ statistic was undefined if either modality identified zero cases of a particular complication. The same methods were used to quantify agreement between the survey results and the manual chart review and between the two chart review methods. Cases were excluded from the agreement statistics if the patient marked “prefer not to answer” on the survey or left the question blank. As a sensitivity analysis, the agreement statistics were repeated stratifying patients by survey modality, sex, and age (dichotomized as less than 60 yr and 60 yr or more). To assess for selection bias at the time of enrollment, McNemar test was used to compare the incidence of complications found by automated chart review between survey respondents and a matched cohort of patients who did not enroll in the study. To assess for follow-up bias, McNemar test was used to compare the incidence of complications found by automated chart review between survey respondents and matched nonrespondents. To explore the effects of combining complication detection modalities, the survey results and automated chart review were combined into a composite complication assay. The composite assay was positive if either the survey or the automated chart review indicated a complication. The survey and automated chart review were chosen for the composite assay because these modes of detecting complications are more feasible than manual chart review for scaling to large patient populations. Agreement between the composite assay and the manual chart review was quantified using positive agreement, negative agreement, and Cohen’s κ statistic. All analyses were performed using SAS version 9.3 (SAS Institute, Inc., USA). P values less than 0.05 were considered statistically significant.
To qualitatively explore the reasons for discordance between survey responses and chart review, one researcher phoned 70 randomly selected patients for whom the survey and the manual chart review yielded discordant results. A result was discordant if any complication was positive by one method and negative by the other method. Patient reports of angina, nerve injury, severe pain, and severe nausea/vomiting were not included, as patients might experience these complications without notifying the medical staff. The researcher asked patients to describe their experiences and also conducted an unblinded review of the patient’s entire chart, including past admissions. The reason for each discordant report was clarified where possible. Naturally occurring themes among the reasons for discordance were identified.
Of the approximately 30,000 patients who visited the Center for Preoperative Assessment and Planning between July 2012 and June 2013, 8,792 (30%) were enrolled during their preoperative clinic visit and 5,133 patients (58% of those enrolled) returned the postoperative survey (fig. 1). This analysis includes the 1,578 surveys that had been returned to our department and electronically processed by June 2013. As shown in table 2, these patients underwent a variety of surgical procedures. Patients were enrolled an average of 10 days before surgery (fig. 2).
Incidence of Postoperative Complications
Of those who returned the survey, 446 patients reported at least one complication, 1,010 patients reported no complication, 5 patients indicated that they prefer not to answer, and 117 patients left the question blank. The most commonly reported complication was severe pain lasting more than 1 day (188 patients), followed by abnormal heart rhythm (86 patients) and severe nausea/vomiting lasting more than 1 day (63 patients). The frequency count of each complication is shown in table 3. Pain, abnormal heart rhythm, and nausea/vomiting were also the three most common complications found by each of the chart review modalities.
The automated chart review was performed for patients who returned the survey, for a matched cohort of patients who consented to the study but did not return the survey by June 2013, and (via waiver of consent) for a matched cohort of patients who did not consent to the study. As measured by the automated chart review, patients who returned the survey were less likely to experience a complication (355 of 1,578, 22%) compared to those who did not return the survey (421 of 1,578, 27%): McNemar chi-square test (degree of freedom = 1) = 7.75, P = 0.006. Patients who returned the survey were equally likely to experience a complication (355 of 1,578, 22%) compared to those who did not consent to the study (332 of 1,578, 21%): McNemar chi-square test (degree of freedom = 1) = 0.93, P = 0.36.
Agreement between Surveys and Chart Reviews
The survey and the automated chart review (N = 1,578) showed poor-to-moderate positive agreement (range, 0 to 41%) and excellent negative agreement (range, 93 to greater than 99%; table 3). Kappa values ranged from 0 for cardiac arrest and angina to 0.43 for pulmonary embolism. Likewise, the survey and the manual chart review (N = 750) showed poor positive agreement (range, 0 to 55%) and excellent negative agreement (range, 82 to 100%; table 4). Kappa values ranged from 0.00 for stomach or intestinal ulcer to 0.53 for stroke. These results did not change substantially when patients were stratified based on survey modality, sex, or age.
Agreement statistics between the automated chart review and the manual chart review are shown in table 5. The two chart review methods also showed poor-to-moderate positive agreement (range, 0 to 58%) and excellent negative agreement (range, 87 to 100%). Kappa values ranged from 0.00 for angina, gastrointestinal bleed, and severe pain to 0.55 for arrhythmia.
Agreement statistics between the composite assay (which was positive if the survey or the automated chart review was positive) and the manual chart review are shown in table 6. The two methods showed poor-to-moderate positive agreement (range, 0 to 59%) and excellent negative agreement (range, 82 to 100%). Kappa values ranged from 0.00 for stomach or intestinal ulcer to 0.48 for severe nausea and vomiting.
Reasons for Discordant Reports of Complications
Of the 70 randomly selected patients whom we attempted to contact by telephone, we successfully reached 42 patients. Because some patients had more than one discrepancy between self-report and chart review, we were able to investigate 54 pairs of discordant complication reports. This included 49 complications reported on the survey but not found during manual chart review and 5 complications found during chart review but not reported on the survey. Naturally occurring themes among the reasons for discordance are outlined in table 7. The most common themes were patients accurately reporting events that occurred before surgery (n = 22), patients reporting a complication when they did not intend to do so (n = 11), and patients accurately reporting events that occurred after hospital discharge (n = 7). Some patients misinterpreted events, such as reporting ventilator use during general anesthesia as a complication. Among those patients with complications found during chart review that were not reported on the survey, some patients had no recall of the event discovered on chart review even when specifically asked, while other patients did remember the event after being prompted.
In this cohort study, patient report of postprocedure complications showed low-to-moderate positive agreement and excellent negative agreement both when compared to an automated chart review and when compared to a manual chart review. Agreement was not improved when patient report and automated chart review were combined into a composite assay. Much of the discordance occurred when a patient reported an event that happened before the procedure or after hospital discharge as a postoperative in-hospital complication. There was also evidence of a follow-up bias, as patients who did not return the survey were more likely to have complications detected by automated chart review than patients who returned the survey.
As in previous studies, our findings suggest that outcomes obtained from patients frequently do not match outcomes obtained from the medical record. Low positive agreement (0 to 71%) and high negative agreement (71 to greater than 99%) for patient-reported postoperative complications have been observed after hernia repair19,20 and after gynecologic oncology surgery.21 Some studies of orthopedic populations provide limited information because the medical record was only consulted if patient report of a postoperative complication was positive.27–29 The only complication that has been investigated in diverse surgical populations is wound infection, for which similar rates of positive agreement (47 to 83%) and negative agreement (95 to 98%) have been found.22–24 The high negative agreement is likely driven by the low incidence of the observed complications. This study extends the existing literature by examining a wide variety of complications in a broad population of surgical patients.
Several factors may lead to discordance between patient report and chart review. Patient report may not accurately reflect postprocedure events if healthcare professionals do not communicate effectively with patients in the hospital, if patients cannot accurately recall postprocedure events when completing the survey, or if patients do not understand the survey question. Chart review may not accurately reflect postprocedure events if complications are inadequately documented in the medical record or if there are flaws in the chart review algorithm. However, a well-refined chart review algorithm can demonstrate excellent sensitivity and specificity.30
The qualitative analysis (table 7) revealed that the most frequent reason for discordance between patient report and chart review was because the patient reported an event that happened either before the procedure occurred or after the patient was discharged from the hospital. This suggests that patient report may be unreliable in precisely describing the temporal relationship between events. Patients may have found the question stem confusing or read the question too quickly, not noticing the instruction to limit responses to complications that occurred in the hospital while they were recovering from their procedures. We potentially could have placed even greater emphasis on the time period by using special formatting such as italicized or underlined text. However, part of the problem could stem from patients’ inability to accurately recall the timing of events, rather than lack of understanding the intent of the survey question. Intensive care patients who experience a postoperative complication generally have no memories from the time of surgery until intensive care unit discharge.31 Recent work at our institution confirms that patients have limited memories of the perioperative period.32 Patients who do not have memories of the perioperative period may be unable to accurately describe the temporal relationship among perioperative events.
As indicated in table 7, some discordance between patient report and chart review was caused by patient misinterpretation of hospital events. Nearly half of patients in the United States demonstrate limited literacy, as measured by the National Adult Literacy Survey.33 These individuals are likely to have limited health literacy as well, and healthcare providers face heightened challenges when trying to communicate with patients of limited health literacy.33 Efforts to convey information to patients are not always successful. In fact, only 42% of medical patients at a public hospital were able to state their diagnosis at the time of hospital discharge.34 For this reason, physician–patient communication has been identified as a key domain for improving the discharge process.35 Methods to enhance information transfer to patients with limited health literacy include frequent visits with less information per encounter and use of the “teach-back” approach to actively assess patient understanding.36 Improved communication may prevent patients from reporting routine in-hospital events (e.g., prophylactic antibiotics) as complications and help patients report complications when they do occur.
Some patients unintentionally reported a complication (table 7), which may be caused by survey fatigue. The question analyzed in this study was one item on a 44-item survey. Furthermore, the question included a long list of answer choices (table 1), which can be especially overwhelming when presented over the telephone. When respondents lack motivation to focus on the survey, they begin to answer without carefully optimizing their decision-making—a process known as satisficing.37 Common manifestations of satisficing include increased use of the “prefer not to answer” option, selection of the first reasonable option without consideration of later options, consistent agreement with the interviewer, consistent selection of the midpoint of a Likert scale, and random selection from among multiple choice answers.37 Minimizing fatigue should help improve the quality of survey responses.
Although patient report can be limited by temporal ambiguity, poor communication, or survey fatigue, chart review has limitations of its own. Subjective outcomes may not be documented adequately in the medical record, despite their importance to patients. For example, severe postoperative pain lasting greater than 1 day was only detected in four patients on automated chart review, but 188 patients reported this complication on the survey (table 3). Chart review is also unable to detect complications that occur after a patient leaves the hospital, unless the patient returns to the hospital or seeks care at a facility that shares a medical record with the hospital. Minor complications that require no treatment will be unrecognized. Patient report might be able to effectively fill these gaps. The shortcomings of automated chart review are of relevance to approaches that are currently adopted in anesthesiology research, including the Multicenter Perioperative Outcomes Group and the Anesthesia Quality Institute.
The reader should note that the authors do not intend to draw generalized conclusions about PROs. Although this article focuses on specific postoperative complications, many PROs ask patients about outcomes such as quality of life, return to work, or functional status. In fact, the survey circulated to patients after surgery as part of the SATISFY-SOS project includes the Veterans RAND 12-item short form25 and the Barthel Index,26 along with other common PROs. These PROs were not presented in this article because they have been previously validated and are well-accepted by the medical community.
The limitations of this study should be noted. For some complications, a small number of cases were observed. Small numbers can limit the precision of the agreement statistics, and κ could not be calculated for some complications because one of the chart review modalities detected zero cases. Even after matching on important characteristics, patients who did not return the survey had a higher incidence of postoperative complications detected by chart review than patients who did return the survey. This could perhaps indicate a response bias. If patient report alone were used to identify complications, then the true incidence of complications would likely be underestimated. On the other hand, the similar incidence of complications between patients who enrolled in the study and patients who did not enroll in the study suggests the absence of a selection bias at the time of enrollment. Thus, the patients who enrolled in our study are likely representative of the patients who attend our preoperative clinic. However, because this study was conducted at a single academic medical center, the generalizability of the findings to other settings may be limited. The question utilized on the survey has not been previously validated, so it may have been misunderstood by patients. Changes to the questionnaire might have improved the convergent validity observed in this study. The manual chart review in this study was limited because several different researchers participated; however, the high level of agreement between the two independent reviews of each record suggests that the manual chart review algorithm was robust. Yet the poor-to-moderate agreement between the manual chart review and the automated chart review demonstrates that the results depend to a great extent upon the methodology used to perform the chart review.
As anesthesiologists focus increased attention on postoperative outcomes beyond the immediate postoperative period, it is essential to have practical tools to measure these outcomes. Patient report can provide information about subjective experiences and about events that happen at home, which cannot be detected by other data collection modalities. To collect data by patient report, it is imperative to communicate effectively with patients before hospital discharge, to clearly explain the time period of interest on the survey, and to minimize survey fatigue. However, patient report of objective complications frequently yields different results from those found by chart review. Thus patient report may be more appropriate for subjective experiences, while chart review may be preferable for objective complications.
The authors thank the staff of the Institute of Quality Improvement, Research, and Informatics at Department of Anesthesiology, Washington University School of Medicine, St. Louis, Missouri. The authors specially thank Will Godfrey, M.A. (Department of Anesthesiology, Washington University School of Medicine), for informatics support, and Beth Burnside, M.A. (now affiliated with Wright State Research Institute, Wright State University, Dayton, Ohio), for general project management.
Research reported in this publication was supported by the Washington University Institute of Clinical and Translational Sciences (St. Louis, Missouri; grant no. UL1TR000448) from the National Center for Advancing Translational Sciences of the National Institutes of Health (NIH, Bethesda, Maryland). Dr. Fritz was supported by the Washington University Institute of Clinical and Translational Sciences (St. Louis, Missouri; grant nos. UL1TR000448 and TL1TR000449) from the National Center for Advancing Translational Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. SATISFY-SOS was funded by the Washington University Department of Anesthesiology and the Barnes-Jewish Hospital Foundation (St. Louis, Missouri; award reference no. 7937–77 to Dr. Avidan).
The authors declare no competing interests.