IT is widely accepted that the randomized controlled trial is the optimal method to evaluate the efficacy of an intervention.1Clinical research aiming to inform and improve patient care should evaluate outcomes rated as important by our patients.2–4Unfortunately, many trials focus on surrogate outcomes that are of questionable significance.5Given that 5 to 10% of trial patients will suffer a serious postoperative complication, and that perioperative treatments are likely to have only a moderate effect on outcome, large numbers of patients are required to have sufficient trial power to detect a modest but clinically important effect.1,2
A systematic review of head injury trials published before December 1998 identified 203 studies in which 16,613 patients were enrolled.6Thus, the average number of patients in each trial was 82. Only 4% of these trials were large enough to detect a clinically important difference. We believe similar problems with underpowered studies exist in the anesthesia, perioperative, and pain medicine literature.7–9
Large trials can provide definitive evidence to guide clinical practice.1–3They provide insights in a broad range of clinical settings and offer an opportunity to identify patient, clinician, and institutional factors that may influence outcome.2They should have sufficient power to address clinically important questions, balance a range of known and unknown confounding factors, and provide precise estimates of effect—the latter being indicated by narrow confidence intervals. However, large multicenter trials take a lot of time and effort to establish and can be expensive to run.10
The required sample size needed to reliably demonstrate a clinically important effect should be based on the nominated primary endpoint of the study11; this is sometimes lacking in anesthesia trials.8,9There may be several secondary study outcomes, and each of these typically relates to mechanisms of action, specific organ effects, and potential adverse effects of therapy. If the primary endpoint of the trial is a serious complication (e.g. , death), it may only occur in 1 to 2% of patients, which greatly increases the sample size required to detect a clinically useful effect.12For this reason, some investigators combine several outcomes into a single composite, or pooled, endpoint.13–15For example, antithrombotic treatments may reduce nonfatal myocardial infarction (MI), nonfatal stroke, and death. Similar approaches have or are being used in perioperative anesthesia trials.16–18A summary measure of each of these effects can be defined. The composite endpoint will have a higher incidence than each of the individual outcomes, and as long as there is a reasonably similar effect across the individual components of the composite, this will reduce the sample size requirement for the trial. Because of the higher event rate, composite endpoints can provide increased statistical precision and efficiency. Clinical trials can be smaller, less costly, completed earlier, and the results made available sooner.
However composite endpoints have an inherent assumption in that each component of the endpoint has a similar burden on health.13–15,19–21This is often not the case. Endpoints of least importance to patients, such as a single episode of angina, as opposed to MI-induced heart failure, stroke, or death, typically contribute most to trial events.15A recent systematic review found that in approximately half the trials reviewed, there were large or moderate gradients in both importance to patients and magnitude of effect across components.15The most serious and highly rated events typically occurred least often, so they provided the least information (in terms of numerical value) to the composite endpoint. This may lead to misleading impressions of the true clinical value of the treatment.22
Should endpoints be weighted, and if so, how? We believe this should be guided by the perceived clinical importance of each individual endpoint, and emphasis should be placed on patient values. Objective justification can assist in this regard, using patient and clinician surveys, disability scores, and eventual health care utilization. There may be competing risks between endpoints; a positive composite endpoint may conceal a negative individual (more serious) outcome such as death. The most extreme example of this is when early death prevents the possibility of other less serious outcomes, such as subclinical MI or prolonged hospital stay. Clinicians should be wary of post hoc definition of composite endpoints as secondary measures when the primary outcome fails to show a treatment effect. This is one of several reasons why findings from the ENIGMA trial23are being followed up with a larger, more specifically designed trial focusing on cardiovascular outcomes.18
Cardiovascular trials commonly use major adverse cardiac events as a composite primary endpoint.14,21But there is no standard definition for major adverse cardiac events. Most include MI, stroke, and death; others include revascularizations, and some include cardiac arrest, heart failure, or bleeding complications. For a technical term to have value, it must have consistency in interpretation and usage. Major adverse cardiac events fail in this regard and should be abandoned because it risks erroneous assumptions. This highlights the need to clearly define each component and justify their inclusion in a composite endpoint. Sometimes a composite endpoint plans to combine efficacy and safety components, and sometimes it is assumed that a similar effect across outcomes will exist, but it does not. Perhaps the best and most recent example is the Perioperative Ischemic Evaluation Study trial,16which identified a significant reduction in perioperative MI with β-blocker therapy but at a cost of excess death and stroke. There was also more hypotension and bradycardia in the β-blocker group.
It is unlikely that a common definition of perioperative composite endpoints can be achieved by independent investigators, but we believe that some consensus is needed so that different studies can be compared or, when undertaking meta-analysis, their results pooled. Outcome after surgery and anesthesia is sometimes defined by pain score, functional status, or satisfaction, using scales or instruments as endpoints. Some of these have never been validated in the perioperative setting, and their relationship with true morbid events, such as dementia, stroke, and long-term disability, is unclear.
The decision to stop a trial early because of apparent benefit or harm needs to consider a range of issues, including the statistical uncertainty inherent in a reduced sample size.24This is more so if the decision is based on the monitoring of a composite endpoint, particularly if this is driven by a relatively minor yet frequent outcome dominating the composite endpoint.22Such an approach may lead to an overestimation of benefit and underestimation of risk. Critical review of the frequency of each of the individual components of the composite endpoint, and whether the treatment effect seems to be consistent for each of these, is strongly recommended.13–15,19,20If large variations exist between components, then the value of the composite endpoint is diminished. The same process should be undertaken at the end of the trial when considering the final results.
Composite endpoints are useful in that they provide an overall summary of effect, which may be readily appreciated by clinicians and their patients. When correctly constructed, they enhance comprehension, study power, and precision, and these should lead to earlier identification of real improvements in care. But poorly constructed composites, or insufficient consideration of the rates of each of the individual endpoints making up the composite, can be misleading. Better appreciation of these issues will lead to improved clinical research, its interpretation, and implementation of evidence-based care.