The authors published a pharmacokinetic- pharmacodynamic model for two drugs based on response surface methodology. Because of the complexity of the model, they performed a simulation study to answer two questions about use of the model: (1) which study design would be most satisfactory; and (2) how many patients would need to be studied to adequately describe an entire response surface.

Data were simulated using realistic variability for two hypothetical intravenous anesthetic drugs that interact synergistically and that could be given by computer-controlled infusion. Three trial designs were simulated, one that made a series of parallel slices of the response surface, one that crisscrossed the response surface, and one that made a series of radial slices across the surface. Series of 5, 10, 20, and 40 "subjects" were simulated. A pooled data approach was used to assess the ability of the various trial designs and numbers of subjects to adequately identify the interaction response surface and estimate the original response surface.

The crisscross design was shown to be the most robust in terms of its ability to both discriminate the correct order of the interaction term and to discriminate the original response surface using the least number of patients. Twenty subjects would be required to adequately define a surface using the crisscross study design, and 40 subjects would be required using the other trial designs.

The results showed that a number of trial designs would be viable, but a design that crossed the surface in a crisscross fashion would give the most robust result with the least patients.

TYPICAL intravenous anesthetic drugs such as hypnotics, opioids, and benzodiazepines combine synergistically to create the anesthetic state. 1–3The full concentration–response relationship for two drugs is a surface in which every combination of the two drugs (the independent X and Y variables, representing the concentrations of drugs A and B, respectively) produces a point on the surface, the height of which is the drug effect. If this surface can be mathematically described, then one can predict the drug effect for any two concentrations of the interacting drugs. The mathematical approach taken is described in the companion paper. 4The clinical trial methodology will be complex and will require the use of computer-controlled infusions, frequent arterial blood sampling, and continuous measures of effect, such as the arterial pressure or the electroencephalogram, to elicit the full pharmacokinetic–pharmacodynamic (PK-PD) model–based response surface for two drugs given simultaneously.

The purpose of this study was to develop a study method for efficiently estimating the response surface for two intravenous drugs that interact synergistically. The complexity of the model, which is based on a series of calculations and a series of nonlinear regression procedures to arrive at the final response surface, means that an analytical solution to the problem of defining appropriate trial design and sample size is not available. Accordingly, we developed a simulation model using realistic variability to analyze these problems numerically as a prelude to clinical trials. 5We answer two questions: (1) which study design would be most satisfactory; and (2) how many patients would need to be studied to adequately describe an entire response surface.

## Methods

### Simulation Model Design

To simulate how the pharmacodynamic response surface model would perform on actual patients, a stochastic model for eliciting a response surface was set up on a standard spreadsheet program (Excel 7.0; Microsoft Corp., Redmond, WA). Simulations of computer-controlled intravenous infusions of the drugs used standard formulae. 6The pharmacodynamic response surface was elicited using the method described below. The pharmacokinetic and pharmacodynamic data used were based on published models for alfentanil and midazolam. 7,8The likely interaction term for this combination was approximated from past clinical studies using benzodiazepine–opioid combinations. 2,9

### Two-drug Pharmacokinetic–Pharmacodynamic Model

The standard pharmacodynamic model for one drug was expanded to the case of two drugs acting simultaneously by the following approach. Consider each drug in units of its own C^{50}, *i.e.* , the effect site concentration associated with 50% of the peak effect, in the absence of other drugs. Thus, if C^{A}and C^{B}are the effect site concentrations of drug A and drug B, then, in normalized units, Ue^{A}and Ue^{B}can be defined as:

For the purposes of the simulations described here, we have assumed that there is synergy between the two drugs. In this case, the effect is modeled as:

Where θ is the ratio between the two drugs expressed as a single independent variable:

and C^{50}(θ) is the C^{50}for the fixed ratio of drug concentrations. The function described assumes that the effects of all possible combinations of drugs A and B are additive. C^{50}(θ) was described using a flexible function able to describe nonadditive interactions between drugs A and B.

where β^{0}, β^{1}, β^{2}, β^{3}, and β^{4}are the regression coefficients. When θ= 0 (drug A alone) and when θ= 1 (drug B alone), C^{50}(θ) = 1 by definition, and so β^{0}and β^{1}are necessarily 1. The other regression coefficients chosen for the simulation were β^{2}= 2, β^{3}= 0, and β^{4}= 0. Now the polynomial equation can be expressed as:

The resulting response surface and 50% effect isobologram in the concentration domain is shown in figure 1. The synergy using half the C^{50}concentration of each drug represents a 50% departure from additivity at the maximum inflection point of the isobologram. No changes in E^{max}or n were included in the simulation model for the sake of simplicity. A more complete mathematical discussion of the response surface method can be found in the companion paper. 4In interpreting the results, it is useful to note that a second-order interaction term (β^{2}) of 1.6 would represent a 40% departure from additivity at the midpoint of the 50% effect isobole, and a second-order interaction term of 2.4 would similarly represent a 60% departure from additivity.

### Trial Design

Three potential trial designs were studied using simulations of 5, 10, 20, and 40 patients. The number of patients studied was cumulative, so that the initial trial of five patients was part of the group of 10 patients, *etc* . These trial designs were selected because they were all capable of being performed. To simulate realistic data that would be elicited in individual patients, we added variability to each step in the model. The variability added to all parameters except the observed effects was normally distributed and heteroscedastic with a coefficient of variation of either 0.2 or 0.3 as indicated by previous clinical studies. 7,8,10,11This amount of variability is relevant for studies of groups of volunteers or patients chosen according to moderately restrictive criteria, but is less than would be expected from a heterogeneous patient population. Variability on the observed effects was homoscedastic with a coefficient of variation of 0.2. The method used for deriving the expected plasma concentrations and effects using this procedure is outlined in figure 2.

Because both the “slices” and “radial” approaches described below are intrinsically non–steady state experiments, we performed a series of pilot evaluations to determine the feasibility of identifying k^{e0}, the rate constant that determines equilibration between the plasma and the effect site. These pilot simulations, not further described herein, guided our choice of doses to permit identification of k^{e0}in those trials designs.

#### Trial Design 1:“Slices.”

In this study design, drug A was held at a variety of pseudo–steady state concentrations, and brief zero-order infusions of drug B were given up to the maximum effect (E^{max}), which for the purpose of the simulations was chosen to be 96–97% of true E^{max}. In our initial pilot tests, we found that the hysteresis loop between concentration and effect for drug B could only be reliably collapsed using a semiparametric method if the pseudo–steady state concentration of drug A was at or below its C^{50}so only that area of the surface up to 50% of E^{max}for drug A could be explored with this study design. Starting with the C^{50}of drug A, this concentration was progressively halved to obtain the target concentrations, which also included the use of a target concentration of zero, *i.e.* , drug B alone. Table 1shows the target drug concentrations, and figure 3shows the trajectories the concentration effect data took. In the case of the study of five patients, with reference to table 1, infusion regimens 1 and 4 were chosen for drug A, and infusion regimens 2, 3, and 5 were chosen for drug B.

#### Trial Design 2:“Crisscross.”

This study design was identical to the “slices” design except that in half of the patients, the concentration of drug A was held constant and the concentration of drug B varied, while in the other half of the patients, the concentration of drug A was varied and the concentration of drug B was held constant. Table 1shows the target drug concentrations, and figure 3shows the trajectories the concentration effect data took. Coverage of the response surface was more even with this design.

#### Trial Design 3:“Radial.”

In this study design, both drugs were given simultaneously by zero-order infusions. In addition to studying each drug alone, three radial slices of the surface were taken as follows: (1) using the same infusion rates, as a function of the population C^{50}, of the individual drugs; (2) taking these infusion rates, the rate for drug A was decreased by 50% and the infusion rate of drug B increased by 50%; and (3) the rate for drug B was decreased by 50% and the infusion rate of drug A increased by 50%.

Table 1shows the target drug concentrations, and figure 3shows the trajectories the concentration effect data took. The hysteresis loop describing the relation between concentration and effect was then collapsed twice, once in the plane of drug A and once in the plane of drug B. The two effect site concentrations were thencombined in the effect model for the drug combination. A disadvantage of this approach was that pilot studies indicated that when a radial slice using a drug ratio higher than 1:2 of Ue^{A}:Ue^{B}(drug A:B), the hysteresis loop could not be reliably collapsed in the plane of drug A. This is because the relatively small amount of drug A to be infused will only show a very small amount of hysteresis, which is overwhelmed by the within-patient variability between data points.

### Deriving an Interaction Surface from the Simulated Data

The response surface was derived for the “slices” trial design from the simulated data using the procedure outlined in figure 4. When zero-order infusions were given, the hysteresis curve between time, concentration, and resulting effect was collapsed to calculate the estimated effect site concentration using a semiparametric technique 12. Extended least squared error was chosen as the objective function to characterize the interaction surface 13,14:

where:

Y is the observed effect and Ŷ the predicted effect, V is the variance for a constant coefficient of variation model, and ς is the root mean squared error.

In each case, the objective function was minimized using the Solver function in the Excel spreadsheet program. Because each “patient” would only cover a small part of the entire surface, to keep the simulation mathematically tractable, the data were analyzed with a pooled data approach. 12The concentration–effect data from all patients were pooled, and all parameters necessary to specify the shape of the surface were estimated simultaneously. Starting estimates for the regression procedure were the population pharmacodynamic parameters from which the interaction surface was first derived, except for the interaction term, where the starting estimates were always zero. An additive model was always fitted first, and then interaction terms of increasingly higher order were progressively introduced.

### Analysis

#### Choice of Trial Design to Adequately Identify an Interaction Response Surface.

Each “study,” using one of the three trial designs and one group of either 5, 10, 20, or 40 patients, was analyzed individually. To determine which order of interaction term should be selected as giving the best fit for each study, two procedures were used. First, the difference in the sum of the extended least squared errors between models with increasingly complex interaction terms were referred to a chi-square distribution to determine whether a statistically significant difference existed between the models, as defined by a difference of greater than 3.84 between models. This equates to a probability of *P* < 0.05. This procedure gives a statistical interpretation to the difference between models and is dependent on the number of patients in the study. Second, to give a clinical interpretation to the fit of the various models, we measured the ability of the model to predict the observations from which it arose. 15To do this, we calculated the weighted residuals, defined as:

Then we calculated the median weighted residual (MDWR) as a measure of the bias of the fit:

and the median absolute weighted residual (MDAWR) as a measure of the accuracy of the fit:

We did not choose more complex models unless they resulted in at least a 1% better fit, as measured by MDAWR (1% was taken as MDAWR^{1}− MDAWR^{2}irrespective of the value of MDAWR). The 1% improvement threshold was set *a priori* as an estimate of the magnitude of improvement that would justify a more complex model. Median weighted residual and MDAWR were also used as measures of the ability of the models to describe the observations in each trial.

#### Accuracy of the Estimation of the Original Response Surface.

The goal of each simulated trial was to characterize a response surface that resembles the response surface shown in figure 1for the typical individual in the population. Every trial produced an estimation of the population response surface for the typical individual. To measure how well the surface estimated by each trial characterized the “true” response surfaceshown in figure 1, we calculated the response surface error (RSE) for each predicted response as follows:

In a manner analogous to how the median and absolute weighted residuals were calculated, we calculated the median response surface error (MDRSE) as:

and the median absolute response surface error (MDARSE) as:

Because we have introduced two measures of goodness of fit, the WR and the RSE, we will restate the distinction between them. The WR measures how well the observations of drug effect fit the estimated model of the response surface. Put another way, WR measures whether the estimated model described the observations. Of course, the estimated model may be completely wrong as far as the “true” response surface is concerned, but if it describes the observations well, the WR will be low. The RSE describes how closely the estimated model of the response surface approximated the “true” response surface shown in figure 1. In cases where there was an inconclusive result using these procedures, the resulting models were graphed and examined visually for whether the resulting model had captured the original shape of the surface.

## Results

### Choice of Trial Design to Adequately Identify the Interaction Response Surface

The ability of various trial designs to discriminate the correct interaction model are displayed in table 2. Using the criterion of a difference between MDAWR of less than 1%, it can be seen that the crisscross design had superior performance using the least number of patients. Even with up to 40 patients in the study, a higher-order model was sometimes accepted as fitting better by criteria of improvement in extended least squared error and MDAWR. The difference in extended least squared error was more likely to be significantly different with increasing numbers of patients and was therefore less useful for discriminating which order of interaction term to choose when compared with the MDAWR.

The ability of the various orders of interaction term to describe the observations when using each trial design with 20 patients is displayed in table 3. Performance of the second- and third-order models was similar. In cases in which the third-order model was chosen, visual inspection of the resulting response surfaces found these to be similar in shape to those for the second-order models. The radial trial design resulted in worse fits than the crisscross or slices trial design. The slices design occasionally resulted in skewed surfaces as a result of the lack of data over a portion of the surface that was of interest. In the case of the radial design similar skewing also occurred occasionally.

### Accuracy of Estimation of the Original Response Surface

The performance of the various trial designs in terms of their ability to recapture the original response surface and interaction term are displayed in figure 5. All models were capable of returning adequate estimates of the original surface when given adequate patient numbers. Although all trials could recapture the original surface on average, the range of results indicated that 20 patients would be adequate for the crisscross trial design, but that the other trial designs would require 40 patients to capture the original surface with similar accuracy. The performance figures for trials of 20 patients using each order of interaction term are listed in table 3.

Individual data for the median and worst fits for each trial design using 20 patients, in terms of their ability to recapture the original response surface, are shown in table 4. The resulting concentration–response surfaces for the worst fit for each trial design are shown in figure 6. Both second- and third-order fits were chosen using the criterion of greater than 1% change in MDAWR. There was virtually no loss in the ability of the third-order model to describe the true surface in terms of MDARSE and visual inspection of the surface when third-order models were chosen. All models were able to accurately capture the magnitude of the interaction term, but again, the crisscross design performed best with 20 patients.

## Discussion

In the current study, the crisscross design was shown to be the most robust in terms of its ability to both discriminate the correct order of the interaction term and to discriminate the original response surface using the least number of patients. This trial design may also have other advantages. It accords more closely with how drugs are used clinically, where concentrations of one drug at a time are usually altered in response to clinical circumstances. It may also be the most likely to detect model misspecification, as may occur as a result of significant order effects or concentration-dependent effects in the pharmacokinetic responses of a drug combination, such has been observed in some past studies of drug combinations using different study methodologies. 16An additional component could then be added to the model to take into account the influence of pharmacokinetic interactions between the two drugs should this be necessary. This trial design is also inherently symmetrical and would be better able to discriminate a skewed surface accurately. Another advantage is that, in comparison to the slices and radial designs, each patient intersects parts of the surface covered by other patients more frequently. This means that the effects of a few patients are less able to significantly skew the shape of the surface, as was occasionally observed using the slices or radial trial design, overcoming one of the problems of a pooled approach to data analysis.

Using the slices design, the inability to adequately explore the area above C^{50}of drug B was a disadvantage. Because this area of the response surface remains unexplored, there will always be an asymmetrical distribution of measured points across the surface, which could possibly lead to the wrong choice of interaction term. By not exploring a small portion of the surface, the result could also ignore a skew in the surface where one does exist. This skewing is more easily detected by visually examining the surface graphs rather than by looking at the parameter estimates. While this area could be covered by adding a few steady state points to the surface, this alters the methodology used to derive points on a part of the surface and may in itself introduce a new bias into the results. Because the surface we chose for the simulations was bilaterally symmetrical about a plane where Ue^{A}= Ue^{B}, the slices trial design may have performed better in this simulation study than expected if a skewed surface had been chosen.

The response surface model we used relies on the assumption that, at all concentration ratios for the two drugs, the dose–response relation is sigmoidal and that there are no time-dependent pharmacodynamic interactions between the two drugs. There is abundant evidence from previous studies of the influence of one drug on the dose–response curve of a second drug, or of two drugs being given simultaneously in the same dose ratio, to conclude that that this assumption is true in the circumstances in which the model would be used. The assumption that each concentration ratio of the two drugs creates a “new drug” with its own sigmoid-shaped concentration–effect curve is also the greatest strength of our approach as it improves our ability to fit the data to a surface and explore interactions in E^{max}and n as well as interactions in C^{50}. Possible interactions on these variables have not been explored in previous studies of drug interactions. If necessary, the possibility of time-dependent pharmacodynamic interactions could be explored using the crisscross trial design we have proposed.

An alternative approach to the ones studied would be to simply map points on the surface. This would be akin to how previous studies using isobolographic analysis explored solitary points on the surface, and it would be a simple task to develop a well-balanced study design. 14For a full PK-PD model, it would be necessary to wait at each point for effect site equilibration, which would take three to five plasma–effect site equilibration half-lives, approximately 15 min for alfentanil, but 45 min for midazolam. This would be very inefficient in terms of time taken to study each patient if, *e.g.* , 5–10 points were studied per patient. It would also not mimic how patients are given drugs clinically, where combinations of drugs are routinely used to achieve an adequate anesthetic state, but concentrations of only one drug are usually changed at any one time in response to clinical criteria.

A model is only useful if it will provide insights into the system under study that are not otherwise available. A robust model should yield accurate predictions about how the system will perform in unique circumstances. The PK-PD interaction between two drugs is complex, and there are an infinite number of possible combinations of two drugs that could be given. Response surface methodology has the ability to generate more efficient study designs for describing and understanding drug interactions in anesthesiology as indicated in this simulation study. An adequately mapped surface would allow the infusion of two drugs simultaneously while taking into account their interactions at the effect site. An adequately mapped surface would also allow optimization of doses of drug combinations for a given end point, such as greatest depth of sedation, quickest recovery, least hypotension. Previous studies of drug interactions have concentrated on ED^{50}effects, thus not allowing the design of anesthetic regimens when higher concentrations are required. 17The trial design could also be used to characterize the interactions between agonists and antagonists. Many of these concepts were also explored by Greco *et al.* 18

In conclusion, this simulation study investigated three potential trial designs and the number of patients required to elicit a full PK-PD response surface for a pair of drugs administered intravenously. The results showed that, given certain assumptions, a number of trial designs would be viable, but that a crisscross design would give the most robust result using the least number of patients. At least 20 patients would be required to adequately define a surface using the crisscross study design, and 40 patients would be required to achieve a similar degree of accuracy using the slices or radial trial designs. This result only pertains to a study that follows the trial design we used in this simulation analysis. Clearly, should the trial design or characteristics of the drugs be significantly different, it may be necessary to perform a similar simulation analysis to find an optimal trial design and number of patients to be studied before embarking on a clinical trial. An adequately mapped surface has the potential to provide useful insights into the PK-PD relations of drug combinations that are given routinely by anesthesiologists, who currently rely on a partial knowledge of these interactions, clinical experience, and intuition to guide them in safe drug-dosing practice. We plan to study the model prospectively in patients to explore these relations.

## References

^{50}of propofol for loss of consciousness and skin incision. A nesthesiology 1994; 81: 820–8