We thank Dr. Avram for his comments on our publication and respond to his comments as follows. The aim of this study was to ‘evaluate the quality of study design in clinical trials published in four leading anesthesia journals between 1981 and 2000.”1We did not intend to establish differences among the four journals (and the individual journals were not identified in the paper) and therefore, the data can be legitimately pooled. Even if we had sought to pool the data based on a P value > 0.05, this approach is not novel. 2After pooling the raw data (and not percentages from the four journals), we compared each criterion in the two time periods. Applying the same criteria for correction of type 1 errors as we used in this study (number of comparisons in excess of 10), we corrected the P value for the 15 comparisons in this study by reducing the P value by 5-fold, resulting in a P < 0.01. It is important to recall that the threshold for correcting for type 1 errors is arbitrary. Indeed, Dr. Avram did not find fault with our threshold for defining type 1 errors in clinical trials. 1Moreover, as we restrict the P value for type 1 errors, we inversely increase the risk of a Type II error and this relationship should not be overlooked.
To provide an up-to-date snapshot of the quality of clinical trials in anesthesia, we included clinical studies from the year 2000. These studies were selected from a much smaller pool of published studies than those selected in 1981–1985 and 1991–1995, and as stated in the methods, they were not included in the analysis.
The primary outcome variable of this study was the mean analysis score, evaluated between 1981–1985 and 1991–1995. This score increased significantly with a P value that was so small (P < 0.00001) that had we corrected for 5,000 comparisons, it would still have yielded a statistically significant change in the mean analysis score. Comments about insufficient power relate only to negative studies, particularly when ethical, moral, and fiscal issues are not at stake.
Dr. Avram commented on the limitations of the quality criteria that we used. We acknowledged this limitation in our study by stating that we had modified existing quality criteria 3and made some arbitrary decisions about several criteria to ensure that they were relevant and applicable to anesthesia trials. Unfortunately, the original quality criteria by DerSimonian et al. were not tested for validity. 3We agree with Dr. Avram that median values may have been more appropriate measures of central tendency of the scores to report rather than mean values, although with normally distributed scores, the difference between mean and median values would have been moot.