Dr. Scott Reuben allegedly fabricated data. The authors of the current article examined the impact of Reuben reports on conclusions of systematic reviews.
The authors searched in ISI Web of Knowledge systematic reviews citing Reuben reports. Systematic reviews were grouped into one of three categories: I, only cited but did not include Reuben reports; II, retrieved and considered, but eventually excluded Reuben reports; III, included Reuben reports. For quantitative systematic reviews (i.e., meta-analyses), a relevant difference was defined as a significant result becoming nonsignificant (or vice versa) by excluding Reuben reports. For qualitative systematic reviews, each author decided independently whether noninclusion of Reuben reports would have changed conclusions.
Twenty-five systematic reviews (5 category I, 6 category II, 14 category III) cited 27 Reuben reports (published 1994-2007). Most tested analgesics in surgical patients. One of 6 quantitative category III reviews would have reached different conclusions without Reuben reports. In all 6 (30 subgroup analyses involving Reuben reports), exclusion of Reuben reports never made any difference when the number of patients from Reuben reports was less than 30% of all patients included in the analysis. Of 8 qualitative category III reviews, all authors agreed that one would certainly have reached different conclusions without Reuben reports. For another 4, the authors' judgment was not unanimous.
Carefully performed systematic reviews proved robust against the impact of Reuben reports. Quantitative systematic reviews were vulnerable if the fraudulent data were more than 30% of the total. Qualitative systematic reviews seemed at greater risk than quantitative.
SYSTEMATIC reviews of randomized controlled trials (RCTs), with or without meta-analysis, are considered powerful levels of evidence on which to guide clinical practice. However, bias and fraud in original research can threaten systematic review and meta-analysis. For example, when systematic reviews include trials with inadequate concealment of treatment allocation or inappropriate blinding, they are likely to overestimate the benefit of a treatment.1,2Similarly, accidental inclusion of covert duplicate publication into meta-analysis can bias the conclusion of that meta-analysis in favor of an experimental intervention.3
Falsification of data is a grave breach of scientific ethics. Fabricated data may be published in peer-reviewed scientific journals and may subsequently be included in systematic reviews. Recently, routine audit uncovered perhaps one of the largest research frauds ever reported.4U.S. anesthesiologist Dr. Scott Reuben allegedly fabricated clinical studies.5Most of these trials demonstrated benefits from analgesic drugs.
Data from Reuben publications have been included in systematic reviews and meta-analyses. It has been claimed that the retraction of Reuben’s articles compromise every systematic review that included these fabricated findings.6In this context, two questions are relevant. First, does noninclusion of fraudulent data in systematic reviews change the conclusions of these systematic reviews? Second, are some systematic reviews more robust than others against the impact of included fraudulent data, and, if so why? We set out to address these issues using the Reuben case as an example.
Materials and Methods
For the purpose of this analysis, we regarded all reports that were coauthored by Reuben as potentially fraudulent, including those not included on the official retraction list.7
We searched for indexed reports of any study architecture that were coauthored by Dr. Scott S. Reuben. We searched ISI Web of Knowledge using the author search term Reuben SS .‡‡The date of the last search was March 18, 2009. The Create Citation Report tool was used to summarize bibliometric data of the Reuben reports. From all reports that cited Reuben at least once, we selected those that used the term systematic review or meta-analysis in the title. When the title left doubt about the nature of the citing reference, we consulted the abstract.
We checked whether the citing reviews fulfilled at least one of two criteria of a systematic review: (1) a methods section with a description of the search strategy or (2) explicit inclusion criteria for eligible reports. All other reviews were regarded as narrative, nonsystematic reviews and were not considered further.
Systematic reviews were categorized into three subgroups: Category I cited a Reuben report (e.g. , in the introduction or the discussion) but did not consider the data for inclusion; category II retrieved and considered a Reuben report but eventually excluded it on the basis of, e.g. , quality or validity criteria; and category III included data from a Reuben report either qualitatively or quantitatively (i.e. , meta-analytically).
When a Reuben report was included in a qualitative category III review, each author decided independently whether noninclusion of that report would have changed the overall conclusions of the review. When our verdict was unanimous, we assumed that noninclusion of Reuben reports would, or would not, have changed the conclusions of the review.
When data from Reuben reports were included in a quantitative category III systematic review (meta-analysis), we contacted the authors and asked them to repeat all subgroup analyses that included data from Reuben reports without Reuben reports using their original statistical software package. When they did not respond to our inquiry, we extracted the relevant data from text, Forrest plots, or tables of the meta-analysis or from original articles, and repeated the analyses without the Reuben reports using Review Manager (RevMan) version 5.0 (The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, Denmark; 2008). There was a pre hoc agreement that a change from a significant to a nonsignificant result (or vice versa ) by excluding Reuben data were a relevant change in the outcome of a meta-analysis.
We retrieved 96 Reuben reports. Sixty-four were cited at least once (total number of citations, 1,199; without self-citations, 682 or 57% of all citations). Of the 64 cited Reuben reports, 27 (see Supplemental Digital Content 1, a document that lists all Reuben references cited in this study, http://links.lww.com/ALN/A559) were cited by at least one of 28 reviews (fig. 1).8–35
Two of the citing reviews did not fulfill the minimal criteria of a systematic review and were therefore not considered further.21,34One was published twice.10,11Therefore, 27 Reuben reports were cited by 25 systematic reviews (fig. 1). The systematic reviews were published between 1994 and 2007, thus all at least 2 yr before the Reuben case was made public. Consequently, none mistakenly included one of the retracted articles.
The 27 Reuben reports were published between 1994 and 2007 (table 1). Twenty-five were RCTs, including data from 1,906 patients (median number of patients per trial, 60 [range, 40–200]). One was a retrospective analysis including data from 434 patients, and 1 was performed in 15 healthy volunteers. Ten reports were officially withdrawn.7Six acknowledged sponsorship by pharmaceutical companies, although it remained unclear whether the authors were funded by industry or whether the trials were sponsored by industry like trials performed for registration. All reports with acknowledged sponsorship by a pharmaceutical company were subsequently withdrawn. Three reports acknowledged sponsorship by institutional funds, and in 18, no sponsorship was acknowledged. Twelve different drugs, mainly analgesics, given by various routes, were tested: classic nonsteroidal antiinflammatory drugs (NSAIDs), cyclooxygenase 2–selective inhibitors (coxibs), morphine, fentanyl, oxycodone, meperidine, bupivacaine, pregabalin, venlafaxine, and droperidol. Seven of the 10 officially withdrawn reports tested a coxib. Types of surgery were mainly orthopedic. Reuben was the first author of 21 reports (78%). Thirty-eight individuals acted as coauthors (median number per report, 2 [range, 1–5]). One individual coauthored eight times, one coauthored six times, and one coauthored five times.
Category I Systematic Reviews
Five systematic reviews (studying antiemetic drugs against morphine-induced emesis,17tramadol for osteoarthritis,10perioperative gabapentin and pregabalin,31perioperative rofecoxib,13or postdischarge symptoms after outpatient surgery35) cited Reuben reports in the introduction or discussion but did not consider the data for inclusion (fig. 1).
Category II Systematic Reviews
Six systematic reviews (studying rofecoxib for postoperative pain relief,8opioid sparing with coxibs,28or efficacy of intraarticular administration of analgesic drugs16,19,24,29) retrieved and considered Reuben reports for inclusion but excluded them from analysis for a variety of reasons (fig. 1). The Reuben reports were mainly excluded because patients received concomitant drugs that potentially interfered with the efficacy of the experimental intervention or because specific endpoints were not reported. On no occasion did the authors of these systematic reviews express concerns about the validity of the excluded Reuben reports.
Category III Systematic Reviews
Quantitative Systematic Reviews (Meta-analyses).
Six quantitative systematic reviews included data from Reuben reports (fig. 1). We had access to all trial data of all meta-analyses except for the analysis by Jirarattanaphochai and Jung,18who did not respond to our inquiry. We therefore repeated their analyses with and without Reuben reports.
Five of those six quantitative systematic reviews (studying the efficacy of adding NSAIDs or coxibs to patient-controlled analgesia with morphine after major surgery,14the impact of NSAIDs or coxibs on morphine-related adverse effects,22the efficacy of preemptive analgesia,25or the antiemetic efficacy of droperidol9,32) reported on a total of 14 subgroup analyses that included Reuben reports. Exclusion of Reuben reports did not significantly change any of the results (table 2).
The sixth quantitative systematic review (studying NSAIDs and coxibs for analgesia after spine surgery)18included data from 17 RCTs (789 patients); 5 (240 patients) were Reuben reports [7,8,16,22,25].§§A further 2 Reuben reports [23,24] were cited in introduction or discussion but were not further analyzed. Sixteen subgroup analyses included Reuben reports. For some subgroup analyses, the majority of data came from Reuben reports. For example, the analysis of pain intensity at 24 h with coxibs included data from 234 patients from 5 RCTs; 200 patients came from 4 Reuben reports. A total of 8 subgroup analyses (50%) met our criterion for a relevant difference when the Reuben reports were excluded (table 2). This was mainly due to a decrease in power, i.e. , 95% confidence intervals became wider. However, for some subgroup analyses, point estimates also changed. For example, 24-h morphine sparing with coxibs changed from an average of −30.2 mg to −2 mg after exclusion of Reuben reports. Similarly, perioperative blood loss with NSAIDs or celecoxib changed from −19.7 ml to +23.7 ml (table 2).
In the 6 meta-analyses, 30 subgroup analyses involved Reuben reports. In 19 (63%), the ratio of the numbers of Reuben reports over the numbers of all trials and the ratio of the numbers of patients in Reuben reports over the total numbers of patients were approximately 30% or lower (fig. 2); for none did the exclusion of Reuben reports make any difference. In 8 analyses (27%), the ratios of the numbers of reports and patients were between approximately 40% and 70%, and for 5 of those, exclusion of Reuben reports did make a relevant difference. In 3 analyses (10%), the ratios of the numbers of reports and patients were 80% and higher, and for all 3, exclusion of Reuben reports made a relevant difference. All analyses with significant changes in results after exclusion of Reuben reports were from 1 single meta-analysis.18
Qualitative Systematic Reviews.
It was our unanimous verdict for 3 of 8 qualitative systematic reviews that noninclusion of Reuben reports would not have changed their conclusion.23,26,33They studied the role of clonidine as an adjuvant to local anesthetics for peripheral nerve blockade,23adverse effects associated with opioids,33and controlled-release oxycodone for the treatment of cancer and noncancer pain.26
It was our unanimous verdict that noninclusion of Reuben reports would have had an impact on the results of 1 qualitative systematic review.12This analysis of adjuvants to local anesthetics for intravenous regional anesthesia (IVRA) included 29 trials (1,217patients); 6 (325 patients) were from Reuben [4,6,9,11,12,15]. The authors pointed out that only 1 trial (a Reuben report ) looked at the potential intraoperative benefit of NSAIDs added to local anesthetics: significantly fewer patients had tourniquet pain when ketorolac was added, and reportedly there were a number of significant postoperative benefits from IVRA with ketorolac compared with systemic control. According to these authors, another Reuben report  found that ketorolac was equally analgesic either infiltrated into the surgical site or when given as an adjunct to IVRA. In addition, they thought, based on a Reuben report that was performed in 15 volunteers , that clonidine prolonged tourniquet tolerance and improved postoperative analgesia. According to the authors, these experimental findings were further supported by a clinical study by Reuben . They also stressed that according to various Reuben reports [12,15], a small dose of clonidine as an adjuvant to IVRA seemed to be well tolerated. Finally, they concluded that opioids were disappointing for IVRA and that, based on data from Reuben , only meperidine had substantial postoperative benefit but at the expense of postdeflation side effects. It was our view that noninclusion of the 6 Reuben reports would have substantially changed some of the conclusions of that qualitative systematic review.
For 4 qualitative category III systematic reviews, our verdict of whether noninclusion of Reuben reports would have had an impact on the results or the conclusions of the review was not unanimous.15,20,27,30These are discussed briefly, and the reasons for our lack of unanimity are explained.
Rømsing and Møiniche27compared four different coxibs with NSAIDs for postoperative analgesia. They included 3 Reuben reports [16,18,20] and excluded an additional report  because pain intensity up to 24 h postoperatively was not reported. Our ambiguity was due to one of the main conclusions of the review, stating that 50 mg rofecoxib provided superior analgesia compared with 200 mg celecoxib. That conclusion was based on data from 4 trials; 2 were from the same group of Merck collaborators, and 1 was from Reuben , and all 3 were in favor of rofecoxib.
Straube et al. 30tested the effect of coxibs on postoperative outcomes. They included 4 Reuben reports that tested celecoxib or rofecoxib [16,18–20]. The authors used a vote counting procedure. Noninclusion of Reuben reports would not have changed the ratio of positive to negative trials. However, in the discussion the authors stated that “With one exception (citing a Reuben report ) studies did not address the question of preemptive analgesia, where preoperative coxib was compared with postoperative coxib, though that exception found a large benefit for preoperative over postoperative rofecoxib,” and “Only one small study (citing a Reuben report ) evaluated preoperative with postoperative coxib in a standard preemptive design, and did find significant benefit for preoperative use.” Our disagreement was whether without referring to that Reuben report the authors would have been able to make a statement in favor of preemptive analgesia with rofecoxib.
Liu and Wu20provided a summary of postoperative analgesia practices using data from systematic reviews but also from some additional, selected RCTs. The authors described a search strategy; however, selection criteria remained obscure and the reader was left in the dark as to how the conclusions from, e.g. , a systematic review were weighted compared with data from a single RCT. In the results section, the authors suggested that the use of coxibs might result in a reduction in long-term complications after surgery including chronic pain. As evidence for this assumption, 2 RCTs from Reuben [26,27] were cited that both studied the analgesic efficacy of celecoxib. Although the authors did not retain this hypothesis in the conclusions of the review, our disagreement was whether these Reuben reports would have had an impact on one of the results of that review.
Finally, Fischer et al. 15published evidence-based recommendations for analgesia techniques after total knee arthroplasty. The authors acknowledged support by an educational grant from Pfizer, reimbursement by Pfizer for attending working group meetings, help and expertise in performing literature searches by a Pfizer employee, and editorial assistance by medical writers who were sponsored by Pfizer. The search strategy considered exclusively RCTs. Two Reuben reports were retrieved but were eventually excluded by the authors because surgery was not knee arthroplasty , or postoperative pain scores were not reported . Because no Reuben reports were considered in the actual analysis, we considered whether to classify the review as category II. However, in the recommendations section, the authors unexpectedly referred to 2 further, previously not considered Reuben reports [22,23]. One was a retrospective chart review . Based on those 2 reports, the authors weighted the evidence regarding bone healing against NSAIDs and in favor of coxibs: “Limited data show that conventional NSAID may have dose- and duration-dependent detrimental effects on bone healing,” and “Although there is concern about impairment of bone healing with cyclooxygenase 2–selective inhibitors, limited evidence shows that they have no detrimental effects.” Both Reuben reports were classified as level 1 evidence by the authors, and because both reports were performed in spinal surgery, the evidence was regarded as “transferable.” Our disagreement was whether noninclusion of these Reuben reports would have changed one of the main conclusions of that review.
The majority of quantitative systematic reviews (meta-analyses) proved to be robust against the impact of potentially fraudulent Reuben reports. This was not an unexpected result because data from a few trials, even when flawed or fabricated, should have no substantial impact on the conclusions of a meta-analysis, which often includes data from hundreds or thousands of patients from a large number of trials. However, some systematic reviews seemed to be less robust; they would probably have reported on different results or would have drawn different conclusions had they not included Reuben reports.12,15,18,20,27,30There were three main reasons for this lack of robustness. First, the numerical relation between the number of potentially fraudulent and the number of valid data in a systematic review seems to be crucial (although we were only able to show this empirically for meta-analyses).12,18It may be inferred from figure 2that meta-analyses that include mainly trials or patient data from one or only a few authors or centers need to be interpreted cautiously. Whether the cutoff ratio of approximately 30% is universally applicable depends on several factors. For example, here, trial sizes were very similar. Hence, one mega-trial would overwhelm even a large number of fabricated trials if they were small. There are no rules as to how many trials a valid meta-analysis should include. The proportion of potentially fraudulent data among valid data needs to be considered.
Our lack of unanimity in estimating the impact of Reuben reports in some qualitative systematic reviews was mainly due to two reasons. First, some systematic reviewers seemed to give undue weight to evidence emerging from particular Reuben reports.20,30This problem is inherent to the process of qualitative systematic reviews because they cannot take into account effect size. Second, in one review that claimed to consider exclusively data from RCTs, the authors referred to observational data from Reuben that eventually had a strong impact on the overall conclusions.15The Improving the Quality of Reports of Meta-Analyses of Randomized Controlled Trials (QUOROM) statement stipulates that the results of a systematic review should be interpreted in the “light of the totality of available evidence.”36To avoid any misinterpretation, the QUOROM statement may need to be more specific in that the results of a systematic review should be interpreted in the “light of the totality of evidence that was retrieved through the systematic literature search.” Data not directly relevant should be interpreted as hypothesis generating and may be used as a basis for a rational research agenda rather than as transferable evidence.
Of the 96 Reuben reports, one third (n = 32) have never been cited; although their impact on science and clinical practice is impossible to quantify, we may assume that it remains low. Two thirds have been cited almost 1,200 times in indexed journals, and Reuben and his coauthors have actively participated in this dissemination process because almost half of all citations were due to self-citation. Thirty-seven reports (38%) were cited in articles other than systematic reviews; among those were editorials, guidelines, clinical studies, and conventional, nonsystematic review articles. A previously published opinion statement attempted to estimate the impact of Reuben reports in this literature.6Finally, 27 reports (28%) were cited in systematic reviews. Clearly, the detrimental impact of Reuben reports on systematic reviews, if there was any, was limited for several reasons. First, Reuben reports were published over a period of 15 yr with scope for others to confirm or refute the findings. There is empirical evidence that the median survival time without substantive new evidence for meta-analyses is approximately 5 yr only, and that clinically important evidence that alters conclusions about the effectiveness and harms of treatments can accumulate rapidly.37Second, the Reuben studies were of limited size, minimizing their quantitative impact on meta-analyses. Third, most Reuben reports echoed current knowledge and did not contradict science.
One particularity of a few Reuben reports was that they pretended to add new insights; these were often attractive, welcomed by many, and sometimes they seemed to be revolutionary and to advance science. Among those were the absence of detrimental effects of coxibs on bone healing after spine surgery, the beneficial long-term outcome after preemptive administration of coxibs including an allegedly decreased incidence of chronic pain after surgery, and the analgesic efficacy of ketorolac or clonidine when added to local anesthetics for intravenous regional anesthesia. Clinical algorithms based on this evidence need to be revised.38
Our analysis has limitations. First, we were unable to address the role of Reuben’s coauthors. It remains obscure why a small number of individuals coauthored a large number of Reuben reports without having any suspicion about the nature of the data. Although the publishing role of senior members of a collaboration group on an article has been recognized for a while, the responsibilities of coauthors, and thus each author’s contributions, have only been emerging recently.39Second, we were not able either to evaluate the impact of sponsorship. Research that is sponsored by industry may draw undue conclusions in favor of the industrial product.40–42Some Reuben reports acknowledged sponsorship by industry; however, we do not know whether industry simply funded the authors or was actively involved in design, analysis, and publication of the studies. All reports that acknowledged sponsorship from industry were retracted because the reported data were identified as having been fabricated (table 1). For the majority of Reuben reports, no information on sponsorship was provided; we do not know whether there was none or whether it was not reported. Given the concerns about financial ties in research, one would suggest that if a substantial proportion of evidence was derived from a single study sponsor, the review of the evidence should be considered with greater skepticism. Third, we assumed that all Reuben reports that were cited by systematic reviews were fabricated and that the entire data sets were flawed. We cannot exclude that some of these reports contained valid data. However, it may be argued that it is dangerous when we discover a fraudulent research article to assume that there were no problems with all previous work.43
All human activity is associated with misconduct.44It is perhaps naive to believe that fraud can be avoided; it will probably always exist. Almost 2% of scientists admitted to have fabricated, falsified, or modified data at least once, and this estimate was considered to be conservative.45A fraudulent article looks much the same as a nonfraudulent one; there seem to be no obvious alert signs to hint that an article is fraudulent.46One of the strengths of systematic review is that it leads to a shift of emphasis from single studies to multiple studies. Unless a fraudulent study is very large and reports on an important number of events, it will not have much scope to change the conclusions of a systematic review. Consequently, conclusions from a single study should not be overestimated. Also, every effort must be undertaken to further improve quality and validity of systematic reviews. Well-conducted systematic reviews allow a more objective appraisal of the evidence than traditional, nonsystematic reviews, provide a more precise estimate of a treatment effect, and may explain heterogeneity between the results of individual studies beyond the play of chance. Ill-conducted systematic reviews, on the other hand, may be biased because of exclusion of relevant studies or inclusion of inadequate studies.
Our study shows clearly that meta-analysis is not an appropriate instrument to detect fraud if the fraudulent data are in line with valid data. Instances where noninclusion of Reuben reports changed the results from pooled analyses were almost always due to a decrease in the power of the analysis related to a decrease in the amount of analyzable data (table 2). Only rarely did a point estimate change significantly.
In conclusion, misleading systematic reviews can generally be avoided if a few basic principles are observed.36However, the QUOROM statement, intended to improve the quality of reporting of systematic reviews, does not consider the possibility of fraud. Additional issues may be considered to protect systematic reviews against fraudulent data and to further improve their credibility and validity. For example, the QUOROM statement should include a word of caution regarding inappropriate ratios between data coming from a very limited number of authors or centers and the total number of data that are included in a systematic review. Also, systematic reviewers should stick to pre hoc -defined rules and should not rely on evidence that was not considered for the review itself. Although readers may expect systematic reviewers to put their findings into a wider context, it does not make sense to define explicit inclusion and noninclusion criteria for the review process but then to base the main conclusions on data that were not initially included in the review because they did not fulfill minimal quality criteria. Finally, systematic reviewers must recognize that qualitative systematic reviews are more vulnerable than quantitative systematic reviews and that they are at particular risk of giving undue weight to individual studies. Such reviews must be performed with particular care. The clinical message of our analysis is pragmatic. The overall effect of Reuben’s fraud is weak in areas where there are lots of other data, but is likely to be stronger in areas of few data. Even after retraction of Reuben reports, classic NSAIDs and coxibs are still analgesic in surgical patients, they still have an opioid-sparing effect, and they still decrease pain intensity in the immediate postoperative period. However, clinical algorithms that have been heavily based on evidence from Reuben reports need to be revised. Among those are the effects of coxibs on bone healing and preemptive analgesia, and the analgesic efficacy of adjuvants to local anesthetics for IVRA.