Health administrative (HA) databases are increasingly used to identify surgical patients with obstructive sleep apnea (OSA) for research purposes, primarily using diagnostic codes. Such means to identify patients with OSA are not validated. The authors determined the accuracy of case-ascertainment algorithms for identifying patients with OSA with the use of HA data.
Clinical data derived from an academic health sciences network within a universal health insurance plan were used as the reference standard. The authors linked patients to HA data and retrieved all claims in the 2 yr before surgery to determine the presence of any diagnostic codes, diagnostic procedures, or therapeutic interventions consistent with OSA.
The authors identified 4,965 patients (2003 to 2012) who underwent preoperative polysomnogram. Of these, 4,353 patients were linked to HA data; 2,427 of these (56%) had OSA based on diagnosis by a sleep physician or the apnea hypopnea index. A claim for a polysomnogram and receipt of a positive airway pressure device had a sensitivity, specificity, and positive likelihood ratio (+LR) for OSA of 19, 98, and 10.9%, respectively. An International Classification of Diseases, Tenth Revision, code for sleep apnea in hospitalization abstracts was 9% sensitive and 98% specific (+LR, 4.5). A physician billing claim for OSA (International Classification of Diseases, Ninth Revision, 780.5) was 58% sensitive and 38% specific (+LR, 0.9). A polysomnogram and a positive airway pressure device or any code for OSA was 70% sensitive and 36% specific (+LR, 1.1).
No code or combination of codes provided a +LR high enough to adequately identify patients with OSA. Existing studies using administrative codes to identify OSA should be interpreted with caution.
In approximately 5,000 patients who underwent preoperative polysomnography, 56% met criteria for a diagnosis of obstructive sleep apnea (OSA). In these patients with known or excluded OSA, none of the health administrative diagnostic codes, diagnostic procedures, or therapeutic interventions by themselves or in combination identified OSA with adequately high sensitivity and specificity. Existing studies using administrative codes to identify OSA should be interpreted with caution.
Considerable research on perioperative risks and outcomes in patients with presumed obstructive sleep apnea (OSA) uses health administrative databases and codes for OSA, but the accuracy of these codes for presence or absence of OSA is unknown
In approximately 5,000 patients who underwent preoperative polysomnography, 56% met criteria for a diagnosis of obstructive sleep apnea (OSA)
In these patients with known or excluded OSA, none of the health administrative diagnostic codes, diagnostic procedures, or therapeutic interventions by themselves or in combination identified OSA with adequately high sensitivity and specificity
Existing studies using administrative codes to identify OSA should be interpreted with caution
OBSTRUCTIVE sleep apnea (OSA) has been identified as a significant issue for physicians caring for patients in the perioperative period.1 Numerous societies provide guidelines to support the care of patients with OSA undergoing surgery,2,3 and the relation between OSA and perioperative outcomes has been extensively studied.4 Despite the large number of studies that examine OSA in the perioperative period, significant limitations exist in the quality of the current literature and in our overall understanding of the impact of OSA on surgical and anesthetic outcomes.
A 2012 meta-analysis found a significant association between OSA and adverse cardiac and pulmonary outcomes after surgery.4 Importantly, this review also highlighted a number of knowledge gaps and methodological limitations in the perioperative OSA literature. First, most studies are single centered, and many lack the sample size to (1) detect important differences in mortality and major morbidity; and (2) adequately control for important confounders in the OSA–outcome relation. Second, available studies lack consistent measurement of well-defined clinical outcomes. Recent commentary supports the continued existence of significant knowledge gaps regarding perioperative OSA5 and ongoing need for studies that provide long-term follow-up and that capture out-of-hospital events.
Population-based studies using health administrative (HA) data have the potential to address specific knowledge gaps related to perioperative OSA outcomes because these data provide a relatively inexpensive and practical means to study disease exposure, processes of care, and health outcomes at a population level. HA data studies are often large and provide the statistical power to study rare outcomes. Accordingly, numerous HA data studies of OSA have been recently published.6–9 Unfortunately, available studies of OSA using HA data may suffer from misclassification bias because single diagnostic codes have been used to identify people with6–8 or without OSA.9 International Classification of Diseases (ICD) codes are frequently used, but the diagnostic accuracy of these codes has not been assessed. Because diagnostic codes within HA data demonstrate high variation in their ability to accurately identify true disease status,10,11 the misclassification of patients with regard to OSA status could influence study results in important ways.
To address the potentially significant methodological issues present in existing HA data studies of perioperative OSA, we undertook a study with three objectives: (1) to measure the accuracy of methods used in the current literature to identify patients with OSA in HA data; (2) to measure the accuracy of other case-ascertainment algorithms based on a combination of diagnostic codes, diagnostic procedures (polysomnogram), and therapeutic interventions (positive airway pressure [PAP] devices) available in HA data, which we predicted would allow highly accurate identification of patients with OSA within HA databases; and (3) to describe the types of patients identified using these approaches to better understand the possible effects of misclassification on study outcomes described in the current literature. We conducted this study in the Canadian province of Ontario because high-quality, linked, population-based HA data exist, which allow identification of chronic disease prevalence cohorts,12–16 provide detailed hospitalization and procedural records, and validate outcome measures and longitudinal follow-up.
Materials and Methods
This is a validation study of diagnostic test accuracy using clinical data from a multihospital health sciences network as the reference standard linked to population-based HA data. Clinical data are from The Ottawa Hospital, a 900-bed tertiary care academic health sciences network serving a population of approximately 1.2 million people.* The hospital network consists of three geographically distinct campuses, including two inpatient hospitals and a free-standing ambulatory surgical center. This investigation is reported using the Standards for the Reporting of Diagnostic Accuracy Studies (STARD initiative, appendix).17 The study was approved by the Ottawa Health Sciences Network Research Ethics Board (OHSN-REB 2008835, 20120813).
Our study used two distinct sets of linked healthcare databases. The first was The Ottawa Hospital Data Warehouse (OHDW), a combination of administrative and clinical data repositories for The Ottawa Hospital beginning in 1996. The OHDW captures reports of all polysomnograms done at the hospital and records all surgeries. The second set of linked databases are housed at The Institute of Clinical Evaluative Sciences, an independent research institute that holds HA data for the province of Ontario, Canada’s largest province with a population of more than 12 million people. Ontario provides single-payer universal health insurance to all residents, including coverage for polysomnography and at least partial coverage for PAP devices prescribed for OSA. This study used the following databases: the Discharge Abstract Database (DAD), which provides a detailed information pertaining to hospitalizations, including diagnoses; the National Ambulatory Care Reporting System, which provides a record (including diagnoses) of all emergency room visits; the Same Day Surgery Database, which provides a record of all ambulatory surgeries; the Ontario Health Insurance Plan (OHIP) database, which records fee for service physician claims (including polysomnograms); and the Assistive Devices Program database, which records funding for all durable medical equipment including PAP devices.
We used the OHDW to retrospectively identify all patients who had undergone a polysomnogram at The Ottawa Hospital between 1996 and 2012. From this cohort, we identified all patients who had surgery at our hospital between January 1, 2003 (to coincide with the introduction of the ICD, Tenth Revision, coding system at our hospital) and December 31, 2012 (the latest date for which all data sources were current). We identified only patients who underwent preoperative polysomnography before surgery to ensure that the OSA status of each patient in our reference population was known. We excluded procedures performed exclusively with monitored anesthesia care, specifically gastrointestinal endoscopy and ophthalmologic surgeries, as we felt that these patients could be significantly different from those having more invasive noncardiac surgery.
From the polysomnogram most proximal to and preceding surgery, one of two investigators (D.I.M. and G.L.B.) retrospectively abstracted the apnea hypopnea index (AHI) and the diagnosis or exclusion of OSA from the OHDW. Patients were classified with OSA if the sleep medicine physician reporting the study diagnosed the patient with OSA or if the AHI was 5 or greater. OSA severity was that graded by the sleep physician reporting the study; if no severity was reported, criteria from the American Society of Sleep Medicine18 were applied based on the AHI (AHI 5 to 14 = mild OSA; AHI 15 to 29 = moderate OSA; and AHI >30 = severe OSA). Patients diagnosed with nonobstructive sleep pathology (such as central sleep apnea) were categorized together as having an “other sleep disorder.”
Linking to Provincial Data and Creating the Analytical Dataset
Each patient record was assigned a unique, anonymized identifier and linked to provincial HA databases. We then identified diagnostic codes and claims (for diagnostic procedures and therapeutic interventions) within the provincial datasets, which we hypothesized could indicate a diagnosis of OSA (table 1). For each patient, we identified any claim—within the 2 yr before their surgery—for a diagnostic or therapeutic polysomnogram (from physician billing claims in OHIP); receipt of a PAP device (from the Assistive Devices Program database); and any ICD-9 or ICD-10 code for sleep apnea (from hospitalization data in the DAD or from physician billing claims in OHIP). The ICD-9 codes we used were the same as those used in previous HA data studies of surgical patients with OSA6–9 ; ICD-10 codes were those that identified obstructive or other sleep apneas. All ICD codes in our analytic dataset were treated separately based on the database of their origin (OHIP vs. DAD).
Patient age and sex were determined from DAD or Same Day Surgery Database record for the index surgery. Previously described methods were used to identify the Elixhauser comorbidities based on ICD-9 and ICD-10 codes from the DAD in the 2-yr preceding surgery.19 A Charlson comorbidity score was calculated for each patient.20
Diagnostic codes, therapeutic intervention claims, and diagnostic procedure claims were coded as binary variables (present or absent) for each patient. Case-ascertainment algorithms were based on the presence of single codes (as in previous studies6–9 ) or predetermined combinations of codes and service claims, which we hypothesized might improve accuracy. For each case-ascertainment algorithm, each patient was classified as true positive (OSA present, algorithm positive), false negative (OSA present, algorithm negative), true negative (OSA absent, algorithm negative), or false positive (OSA absent, algorithm positive).
Because we anticipated that the prevalence of OSA in our reference population would be higher than the general population (as reference individuals were identified by a physician as being at risk of OSA, hence their referral for a polysomnogram), we measured the association of each case-ascertainment algorithm with OSA by calculating the positive and negative likelihood ratios. Compared with the positive and negative predictive values, likelihood ratios are relatively impervious to disease prevalence and should, therefore, provide a more accurate representation of the ability of each case-ascertainment algorithm to predict the true presence or absence of OSA.21 The 95% CIs for these likelihood ratios were calculated according to the method by Simel et al.22 Positive and negative predictive values are not reported as these values would be significantly biased for external populations by the high prevalence of OSA in our reference population.23 To describe operating characteristics of various algorithms within this population, we calculated sensitivity and specificity with 95% CIs using the binomial distribution.
In subgroup analyses, the accuracy of each algorithm was evaluated separately for all patients diagnosed with OSA (i.e., mild, moderate, and severe disease) as well as for prespecified subgroups including patients diagnosed with moderate and severe disease only (i.e., those in whom PAP therapy is recommended), any patient with sleep-disordered breathing (i.e., OSA or other sleep disorders), and categorized by inpatient versus ambulatory surgery. We performed further post hoc analyses restricted to individuals whose polysomnogram had been performed at most 3 yr before surgery to eliminate the bias that might be present due to changes in patient disease status over a long time lag between testing and surgery (a 3-yr time window was chosen because this was the approximate 75th percentile of polysomnogram surgery time lag for our population and because longitudinal studies support stability or increase in AHI for most individuals over 4- to 5-yr follow-up periods24–26 ). Finally, we limited analyses to patients whose surgeries were in the first half of our study period (2003 to 2007) or the second half (2008 to 2012) to examine the impact of possible changes in coding patterns over time.
To allow a qualitative assessment of the effect of misclassification presented by different methods to identify OSA in HA data, we also documented the demographic details, severity of OSA, and prevalence of chronic medical conditions that may confound the OSA–outcome relation for each case-ascertainment algorithm based on the test result (i.e., categorized for true positives, false positives, true negatives, and false negatives). The chronic medical conditions considered included heart disease (defined as any history of valvular disease, congestive heart failure, arrhythmia, or history of a myocardial infarction), respiratory disease (defined as any history of chronic obstructive pulmonary disease, asthma, or chronic pulmonary disease), and diabetes (defined as a history of complicated or uncomplicated diabetes). All analyses were conducted using SAS version 9.3 for UNIX (SAS Institute, Inc., USA).
We identified 4,965 patients who underwent a preoperative polysomnogram (fig. 1). Of these, 4,353 patients (88%) were linked successfully to our HA databases; the other patients were unable to be linked to provincial databases because our hospital also serves patients without OHIP coverage, such as patients, mostly from Quebec, with universal health insurance separate from Ontario’s or, rarely, patients with other forms of health insurance. Although we could not identify the specific reason for nonlinkage in all patients, even those in figure 1 without a valid OHIP number were likely from Quebec. OSA was diagnosed in 2,427 patients (56%) with the patients distributed evenly between mild, moderate, and severe OSA. Nonobstructive sleep pathology was diagnosed in 77 patients (2%). Forty-five percent of patients underwent ambulatory surgery with the remainder undergoing inpatient surgery (table 2). Compared with patients without the diagnosis, those with OSA were more likely to be male, diabetic, and hypertensive (table 2), which is in keeping with previous cross-sectional analyses.27 Patients with OSA were less likely to receive ambulatory surgery than those without OSA.
Accuracy of OSA Case-ascertainment Algorithms
None of the diagnostic codes, diagnostic procedures, or therapeutic interventions, by themselves or in combination, identified OSA with adequately high sensitivity and specificity (table 3). The combination of a polysomnogram followed by receipt of a PAP device was highly specific for a true diagnosis of OSA (98%) and had the highest positive likelihood ratio (+LR) of all algorithms (10.9). The sensitivities of all diagnostic codes, by themselves or in combination, were very low. The specificity of both ICD-9 and ICD-10 codes for OSA in the DAD exceeded 97%. In contrast, the specificity of OSA diagnostic codes in OHIP was less than 40%. The combination of diagnostic codes for OSA with polysomnogram + PAP maximized sensitivity at 70% but caused the specificity and +LR to both decrease substantially. Negative LRs for all algorithms were between 0.8 and 1.1.
Changes in LRs, sensitivities, and specificities varied slightly among subgroups (table 4). The polysomnogram + PAP algorithm had a lower +LR in the moderate and severe OSA subgroup, whereas the +LR was minimally different in ambulatory, inpatient, and sleep-disordered breathing subgroups compared with the full population. Sensitivities and specificities for the polysomnogram + PAP algorithm were also minimally changed in any of the subgroups. Similarly, ICD-9 and ICD-10 codes for sleep apnea applied to the DAD did not change substantially in any subgroup compared with the full population. The ICD-9 code 780.5 also produced similar LRs and sensitivities/specificities in all subgroups, as did the algorithm based on polysomnogram + PAP or any ICD code. Limiting analysis to those with a polysomnogram within the 3 yr before surgery did not change LRs in any marked way. Likelihood ratios did vary between time periods; however, the changes were not consistent. The polysomnogram + PAP algorithm had higher +LRs after 2007, whereas ICD-10 codes had higher +LR before 2008. ICD-9 codes performed poorly in all time periods.
As compared with the overall cohort of patients with OSA, true positive cases identified using single diagnostic codes or using the combination of a polysomnogram followed by receipt of a PAP device had a higher prevalence of male sex, moderate-to-severe OSA, and important comorbid diseases. For ICD-9 code 780 and the polysomnogram + PAP case-ascertainment algorithms, true positive cases were older than the average age of the full study cohort diagnosed with OSA (table 5).
In this validation study of case-ascertainment algorithms, we found that HA data codes for diagnoses, diagnostic procedures, or therapeutic interventions (or combinations thereof) failed to accurately identify surgical patients with OSA. Although the specificities of single diagnostic codes identified in hospital discharge abstracts as well as the combination of a polysomnogram followed by receipt of a PAP device before surgery were high, sensitivities were low, and false-positive results were identified. Such test characteristics mean that the overall accuracy provided by the case-ascertainment algorithms tested in this study are inadequate in identifying people who truly have disease or to rule out disease in people who truly do not have OSA.
The usefulness of a diagnostic test is dependent on the combination of disease prevalence and the accuracy of the test. Likelihood ratios are relatively impervious to disease prevalence and can be used to estimate the usefulness of a diagnostic test at various prevalence levels.21 Furthermore, likelihood ratio–based cutoffs have been suggested to guide the assessment of diagnostic tests, such as our case- ascertainment algorithms. Tests with a +LR of 10 or greater and a −LR of 0.1 or less are considered to be very useful; those with a +LR between 5 and 10 and a −LR between 0.1 and 0.2 are considered moderately useful; and those with a +LR between 2 and 5 and a −LR 0.5 are considered somewhat useful. Tests with a +LR less than 2 or a −LR greater than 0.5 are considered useless.28 Existing validated case-ascertainment algorithms used to identify and study chronic diseases in population-based HA data are typically in the very or moderately useful category (hypertension,29 diabetes,30 previous myocardial infarction,14 and congestive heart failure13 ). The most accurate case-ascertainment algorithm in our study (the combination of a claim for a preoperative polysomnogram followed by receipt of a PAP device) had a +LR of 10.9 and a −LR of 0.83. For reference, if OSA is conservatively estimated to be present in 10% of surgical patients,31 a +LR of 10.9 would increase the pretest probability of OSA from 10 to 55% (i.e., patients meeting these criteria would have only a 55% probability of truly having OSA); a +LR of 40 would be needed to increase the posttest probability to 80%. No subgroup existed in which these case-ascertainment algorithms were any more accurate. Using the single diagnostic code methods used in the literature, we estimate that the highest probability of identifying true OSA would be less than 40%. Furthermore, because −LRs were approximately 1 for all methods tested, the absence of a diagnostic code for OSA would provide almost no indication that the patient did not have OSA. We would therefore expect a prevalence of OSA of approximately 10% in the unexposed group in existing HA data studies of OSA.
Although one would expect misclassification to bias the results of a study toward the null (i.e., no increased impact of OSA on adverse outcomes), the types of patients with OSA identified by diagnostic codes further cloud the study interpretation. Our results demonstrate that the patients who had a diagnostic code for OSA and had a true diagnosis of OSA based on polysomnogram results (i.e., true positives) are those who may be at greater risk of adverse outcomes. These true positive patients had a markedly higher prevalence of diabetes, heart disease, or respiratory disease and were more likely to be male and of increased age. All of these factors are associated with adverse outcomes after surgery.32–34 Furthermore, the majority of these patients had moderate or severe OSA based on their AHI, which is also postulated to increase postoperative risk.35 We believe that the results of existing HA data studies examining OSA in the perioperative period should be interpreted with great caution. In summary, we report that diagnostic codes used to determine OSA status appear to identify patients at higher risk of perioperative adverse events, independent of OSA; this bias may overstate the impact of OSA on postoperative outcomes.
As described in the second paragraph of our introduction, the current literature regarding outcomes of surgical patients with OSA contains significant knowledge gaps. HA data studies have the potential to address these gaps, which could explain the recent publication of numerous HA data studies in this area. However, our data raises extensive skepticism about the ability to accurately identify patients with OSA by using the HA data. Until an accurate method to identify OSA in HA data is developed, researchers and knowledge consumers should approach such studies cautiously. Other examples exist in perioperative medicine where diagnostic codes with unknown accuracy for identifying true disease status provided quantitatively different and conflicting results compared with studies using higher-quality prospective data. In the case of statins and postoperative delirium, an initial study, performed using HA data, concluded that the statin use was associated with a 28% increase in the odds of postoperative delirium.36 Subsequent prospective study demonstrated that statins were actually protective, with the odds of postoperative delirium being decreased by 46% in statin users.37 Validation testing demonstrated that despite high specificity, diagnostic codes for delirium in HA data lack accuracy for identifying delirium, with a +LR 18 and a −LR of 0.82 for a disease that is approximately 10% prevalent.38 Given this example and the new knowledge generated in our study, we suggest that if OSA is felt to be an important perioperative risk factor, concerted efforts by researchers and health system administrators will be required to allow the accurate identification of patients with OSA in HA data. For researchers, this may involve the development of case- ascertainment algorithms that do not rely exclusively on diagnostic codes, diagnostic procedures, or therapeutic interventions related to OSA. For health system administrators, this may involve improving the coding accuracy of diagnostic codes for OSA in HA databases.
Strengths and Limitations
Our study has several strengths. We used definitive standard diagnostic methods to determine the OSA status of each patient in our cohort. We used population-based administrative data, which captured all physician and hospital encounters as well as all polysomnograms and the vast majority of PAPs for all patients. Although many HA data studies of OSA define disease status using administrative data only at the time of surgical hospital admission, we looked back 2 yr before surgery to capture diagnostic codes related to OSA; although we cannot exclude the possibility that a longer look-back window would increase the accuracy, accuracy would have to increase substantially to make codes useful for identifying OSA in HA data. Our validation cohort was large, allowing for subgroup analyses. Our algorithms may be applicable to nonsurgical patients captured in Ontario’s, and other similar health systems’ HA data, as ICD codes, polysomnography, and PAP devices are not unique to patients with OSA undergoing surgery.
Our study has certain limitations. The sampling frame was a single academic health sciences center and may lack generalizability; however, patient characteristics were consistent with typical patients with OSA from other settings. Although LRs are relatively immune to disease prevalence, there is little guidance available to predict the full impact of disease prevalence on a binary disease indicator (such as the presence or absence of a diagnostic code). A simulation study of normally distributed disease indicators (such as hemoglobin levels indicating anemia) suggests that +LRs may increase in populations with low disease prevalence, whereas −LRs may tend closer to one.23 This finding, however, is not directly generalizable to the current study. Our polysomnogram + PAP algorithm may not be applicable in all health systems. Reliance upon PAP device receipt limits the identification of patients with OSA offered other treatment modalities. Our case-ascertainment algorithms contained physician-identified diagnostic codes (which are used for billing submissions and are based on ICD-9); such codes do not require specific diagnostic criteria and could be applied if disease was suspected but not proven (as might be the case in a clinical visit resulting in ordering of a polysomnogram). Finally, 12% of our reference population was not linked to HA data. The majority of unlinked patients were patients with universal health insurance provided by the province of Quebec and who live in the same metropolitan area as our linked patients. It is unlikely that these individuals differed substantively from linked individuals, but such a possibility should be considered when applying our findings.
The use of diagnostic codes, or a combination of diagnostic codes, diagnostic interventions, and therapeutic interventions, did not provide a case-ascertainment algorithm that reliably identified patients with OSA in HA data. Furthermore, the use of diagnostic codes to define OSA may introduce important misclassification bias since patients with OSA identified by these algorithms appear to systematically differ from other patients with OSA with respect to the severity of their OSA and important perioperative prognostic factors. Future research is required to develop methodologies that accurately and consistently identify patients with OSA in HA data. Until such methodologies exist, studies using diagnostic codes to identify OSA exposure should be interpreted with great caution.
Dr. McIsaac thanks the salary support from The Ottawa Hospital Anesthesia Alternate Funds Association (Ottawa, Ontario, Canada). Dr. Bryson thanks the support by The Ottawa Hospital Anesthesia Alternate Funds Association (Ottawa, Ontario, Canada). Dr. Wijeysundera thanks the support, in part, by a Clinician Scientist Award from the Canadian Institutes of Health Research (Ottawa, Ontario, Canada) and a Merit Award from the Department of Anesthesia at the University of Toronto (Toronto, Ontario, Canada).
This study was funded by the Ontario Thoracics Society (Toronto, Ontario, Canada) and the Canadian Anesthesiology Society (Toronto, Ontario, Canada). This study was also supported by the Institute for Clinical Evaluative Sciences (ICES), which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC; Toronto, Ontario, Canada). The opinions, results, and conclusions reported in this article are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. These data sets are held securely in a linked, deidentified form and analyzed at ICES.
The authors declare no competing interests.
The Ottawa Hospital: About our hospital. Available at: https://www.ottawahospital.on.ca/wps/portal/Base/TheHospital/AboutOurHospital. Accessed January 31, 2015.