Many pharmacologic studies record data as binary yes-or-no variables, and analysis is performed using logistic regression. This study investigates the accuracy of estimation of the drug concentration associated with a 50% probability of drug effect (C50) and the term describing the steepness of the concentration-effect relation (gamma).
The authors developed a technique for simulating pharmacodynamic studies with binary yes-or-no responses. Simulations were conducted assuming either that each data point was derived from the same patient or that data were pooled from multiple patients in a population with log-normal distributions of C50 and gamma. Coefficients of variation were calculated. The authors also determined the percentage of simulations in which the 95% confidence intervals contained the true parameter value.
The coefficient of variation of parameter estimates decreased with increasing n and gamma. The 95% confidence intervals for C50 estimation contained the true parameter value in more than 90% of the simulations. However, the 95% confidence intervals of gamma did not contain the true value in a substantial number of simulations of data from multiple patients.
The coefficient of variation of parameter estimates may be as large as 40-50% for small studies (n < or = 20). The 95% confidence intervals of C50 almost always contain the true value, underscoring the need for always reporting confidence intervals. However, when data from multiple patients is naively pooled, the estimates of gamma may be biased, and the 95% confidence intervals may not contain the true value.
PHARMACODYNAMIC data are often recorded as binary variables (e.g. , whether the patient responds to command, whether the patient can maintain adequate spontaneous ventilation, whether the patient shows a hemodynamic or somatic response to surgical stimulus). In this situation, a common technique of data analysis is logistic regression, in which the probability (P ) of drug effect is evaluated as a function of the drug concentration in plasma or at the effect site (C), using the following equation:
where C50is the concentration at which the probability of drug effect is 50% and γ is a measure of the steepness of the concentration-effect curve.‡ Values of C50and γ are estimated by expressing the logarithm of the likelihood of the observed results (i.e., log likelihood) using and then maximizing this with respect to C50and γ. Logistic regression has been used for the analysis of the pharmacodynamics of inhaled and intravenous anesthetics, 1–11usually with a primary focus of determining values of C50and γ. However, any statistical technique should not only provide parameter estimates, but also measures of the accuracy of these estimates. Although typical statistical software programs provide confidence intervals for estimated parameters, because logistic regression is a nonlinear technique, these confidence intervals are only valid asymptotically as the number of data points (N) goes to infinity. 12In reality, most studies in the anesthesia literature involve relatively small numbers of data points. Therefore, it is often unclear how reliable parameter estimates are. The purpose of this study was to use simulations to analyze the relation between sample size and the accuracy of parameter estimates.
Although logistic regression commonly is used in anesthesiology research, the underlying statistical model is seldom described. A useful general reference for these types of statistical models is the textbook by Cox and Snell. 12This model assumes that anesthetic effect is measured by an underlying continuous variable, denoted y, which is related to the drug concentration by the following equation:
where x is a random variable that is assumed to have a logistic distribution. The logistic distribution is described by
where P (x) is the probability of a given value of x. This distribution has a mean of zero and a variance of π2/3). The model is applicable to binary yes-or-no data because we assume that a positive drug effect is observed only if y > 0 or equivalently if x > −(γ ln C −γ ln C50).
Thus, the probability of drug effect is as follows:
We attempted to illustrate the underlying model in figure 1. The solid line through solid diamonds depicts the deterministic part of the γ ln C −γ ln C50; the adjective “deterministic” refers to the fact that this part of the effect is determined solely by the drug concentration and the pharmacodynamic parameters C50and γ. This increases steadily as drug concentration (C) increases. The solid circles are random numbers conforming to a normal distribution; we used a normally distributed variable for convenience because of the ready availability of normally distributed random-number generators in standard statistical software packages. When the sum of the deterministic component (solid diamonds) and the random variables (solid circles) is greater than or equal to zero, the observed drug effect (solid triangles) is one, and if the sum is less than zero, the observed drug effect is zero.
To investigate the standard deviations of parameter estimates using logistic regression, simulations of typical anesthesia studies were performed. Simulations were performed for varying values of n (the number of data points) and γ, assuming that was the underlying model and C50= 100 (a valid assumption because the units of concentration are arbitrary). To begin each simulation, n data points were generated by randomly selecting n drug concentrations distributed uniformly on a logarithmic scale from 25 to 400 units using the random number generator of an Excel (Microsoft, Redmond, WA) spreadsheet. In a human study, this corresponds to the investigator assigning a drug dose to each patient enrolled in the study. At this point, if the concentration effect is totally deterministic (i.e., if γ is infinite), then a positive drug effect will be observed if the drug concentration (C) assigned to the data point exceeded C50. However, as noted previously, there is an element of randomness in the concentration–effect relation, embodied in the fact that γ is finite. To take into account this randomness, a uniformly distributed random variable from 0 to 1 was generated for each “patient,” again using the random-number generator of an Excel spreadsheet. If this number was less than Cγg/(C50γg+ Cγg) the simulated patient was assumed to have a positive drug effect (the response variable [R] was given a value of 1). Otherwise, it was assumed that R = 0. In this manner, responses are obtained for a range of concentrations, and spreadsheets consisting of columns of C (the drug concentration) and R (the response variable) are generated. For each simulation, the parameters (C50, γ) were estimated by maximum likelihood estimation. The logarithm of the likelihood of observed results (LL) was maximized as a function of C50and γ
(Piis given by for each value of C; R = 1 if a drug effect is observed and R = 0 otherwise; and i indexes data points). is the sum of the logarithms of the probabilities of independent outcomes, which is Piwhen Ri= 1 and 1 − Piwhen Ri= 0. The estimation procedure was implemented on the Excel spreadsheet, taking advantage of its Solver function. In some instances, the attempted maximization of the log likelihood failed with the Solver function returning a “#NUM” message, indicating it could not interpret the log likelihood as a number. In these cases, the simulated data set was discarded and a new set generated.
Two separate simulation experiments were conducted. In the first, simulations were performed for γ= 1.0, 1.5, 3.0, 4.5, 6.0, and 7.5, and it was assumed that C50= 100 (units are arbitrary) for every data point. Thus, in these simulations, interpatient variability was ignored by assuming that the data were derived from a single patient. Simulations were conducted for n = 10, 20, 30, 40, 50, 75, and 100 data points. Because any nonlinear regression technique necessitates the stipulation of initial parameter values (i.e., an initial guess of what the true value is), we repeated each simulation for C50initial = 40, 100 and 250, and γ initial = 0.6, 1.0, or 1.4 times the true value of γ. Each simulation was repeated 100 times, and the standard deviations of the estimates of C50and γ were calculated. We did not calculate confidence intervals, but we determined, for each simulation, whether the confidence intervals would include the true values of C50and γ. We did this by calculating the change in the logarithm of the likelihood, multiplied by a factor of 2 (2 ×δLL), which occurred when the true value of either C50or γ was substituted for the value determined by minimization of LL. This parameter (2 ×δLL) has a chi-square distribution with one degree of freedom, which enables the researcher to determine whether any specific value of either C50or γ is within the 95% confidence intervals. 12It should be noted that the confidence intervals for this type of data often are markedly asymmetric, and the parameter estimates do not follow a t distribution.
In the second set of simulation experiments, we assumed that we pooled single data points from multiple patients and that the C50and γ values had log-normal distributions (i.e. , P = PTVexp(η), where P denotes the parameter (C50, γ), TV denotes the typical value, and η has a normal distribution with a mean value of zero and an SD of 0.3. We assumed that C50TV = 100, and we considered γTVvalues of 1.0, 1.5, 3.0, 4.5, and 6.0. Simulations were again conducted for n = 10, 20, 30, 40, 50, 75, and 100 patients and for each value of simulations were repeated 100 times for starting values of C50initial = 40, 100 and 250, and γ initial = 0.6, 1.0, or 1.4 times the typical value of γ. We also determined whether CC50TV and γTVwere within the 95% confidence intervals, as described previously.
Our results are presented as the coefficient of variation (i.e., the SD of the parameter estimates normalized to the mean value of parameter estimates (C50SD/mean or γ SD/mean) and estimate confidence, which we define as the percentage of parameter estimates for which the 95% confidence intervals include the “true” value (%CI95).
Bias (i.e., the difference between the mean parameter estimate and the true value) is presented in table 1for simulations in which the initial parameter values were the true values. The bias of C50estimates generally was small (< 10%), unless n = 10 and γ was small (≤ 2), in which cases the bias was nearly 100%. The bias in the estimation of γ was less than 20% if the number of simulated data points (n) was more than 10, with the notable exception of simulations in which the parameters had log-normal distributions (see Discussion).
Figures 2 and 3present the coefficient of variation and estimate confidence of C50and γ estimates as a function of sample size when we simulate data from a single patient (fixed C50and γ). In these figures, the initial “guesses” of the parameters are the true values. As expected, the coefficient of variation decreases as the number of patients (n) increases. Additionally, the variability in the estimate of C50increases as γ decreases (fig. 2). For γ= 1.0, the coefficient of variation of the C50estimate is substantial (> 50%) unless the number of patients in the simulated study (n) exceeds 30. However, if γ is larger (> 6), the coefficient of variation of the C50estimate is approximately 10% if n ≥ 20. Equally important, the true value of C50lies within the 95% confidence intervals of the estimates in more than 90% of simulations. In contrast to C50estimates, the magnitude of the true value of γ has less influence on the variability of the estimates of γ, as is shown in figure 3. The coefficient of variation of γ estimates is somewhat greater than that of C50estimates and is approximately 60% for n = 20, decreasing only to approximately 40% for n = 100. However, the confidence of γ estimates is similar to that of C50estimates, with %CI95values well in excess of 90% in these simulations with no interpatient variability.
Figures 4 and 5present the coefficient of variation and estimate confidence for simulations in which C50and γ are different for different simulated patients. We assumed that the parameters have log-normal distributions. These simulations mirror the common situation in which data from multiple patients are pooled for analysis. Again, we present simulations in which the initial guesses of C50and γ are equal to C50TV and γTV. As in the case with no interpatient variability, the coefficient of variation of estimates of either C50or γ decrease as n increases. The magnitude of this variability is similar to the previous simulations, in which C50and γ were fixed. As before, the coefficient of variation of C50estimates decreases as γ increases. For C50, the confidence of the estimates are excellent, with the true value (100) within the 95% confidence intervals of the estimates in more than 90% of the simulations. However, it can be seen in figure 5that the fraction of simulations in which the true value of γ (γTV) was within the 95% confidence intervals of the estimate decreased to as low as 60% as n increased for values of γ more than or equal to 4.5.
Simulations were also conducted using starting values for C50and γ other than the true values, which were used for the simulations illustrated in figures 2–5. The results were similar to those described previously, with the exception that, for n = 10 or n = 20, use of an overestimate as the starting value of C50resulted in a markedly larger coefficient of variation in the C50estimate for small values of γ (≤ 3). Despite this increase in variability, %CI95remained more than 90%.
Evaluation of drug concentration–response curves is an important component of pharmacologic research. It has become customary to describe concentration–response curves with sigmoid functions characterized by two parameters: C50, the concentration of drug which results in 50% of maximal effect, and γ, a parameter that reflects the steepness of the curve. For many applications in anesthesiology research, the response is a binary all-or-none variable, and the maximal drug response or effect is unity. In this situation, estimation of C50and γ is usually referred to as logistic regression. Techniques of logistic regression are based on the principle of maximum likelihood and are inherently nonlinear. Any method of statistical analysis should provide parameter estimates and confidence intervals. In contrast to linear regression, the confidence intervals provided by most software packages for nonlinear regression are approximations that are only accurate asymptotically as the number of data points increases toward infinity. Because many applications of logistic regression in anesthesiology research involve 20–50 data points, the accuracy of C50and γ estimates are unclear. In this study, we used simulations (a well-known technique for analyzing statistical methodology, also known as Monte Carlo simulation) to investigate the relation between study population size and the accuracy of parameter estimates.
The primary measures we used to assess accuracy are the coefficient of variation and the fraction of simulations for which the 95% confidence intervals contained the true parameter value (%CI95). The coefficient of variation is the SD of the parameter estimate expressed as a percentage of the mean estimate. We did not graphically present bias (i.e., the difference between the mean estimate and the true value) because, in general, bias was small with the mean C50estimate within less than 10% of the true value and the mean estimate of γ within less than 20% of the true value. However, for simulations of small (n = 10) studies, we found a much larger bias for both parameters, although %CI95was still more than 90% when we simulated data from single patients. When we simulated pooled data from multiple patients (with log-normal distributions for C50and γ), there was a larger bias in γ estimates (up to 30%), even when n was large and %CI95was significantly smaller.
As expected, the variability of parameter estimates decreases (i.e., the coefficient of variation decreases) as the number of patients in the simulated study (n) increases. This is evident in figures 2–5. We did not anticipate that the variability of C50estimation would depend on the value of γ, improving as γ, the measure of the steepness of the concentration–response curve, increases. However, the basis for this observation is more evident with consideration of specific simulated data sets. Figure 6illustrates a simulation of 10 data points (from the same patient, so that interpatient variability is not an issue) with γ= 1. One could surmise that there is little information about either C50or γ in this data. Figure 7presents a larger data set (n = 100) for γ= 1. Comparing the two figures, one can see how C50may be estimated more reliably when n is larger. Figure 8presents simulated data for n = 100 and γ= 6. It is now clear that C50and γ both are estimated more reliably in this situation.
Statistical techniques should not only provide parameter estimates, but also measures of the accuracy of these estimates. The confidence intervals provided by typical statistical software programs for nonlinear techniques, such as logistic regression, are only valid asymptotically as the number of data points (n) approaches infinity. 12We investigated the accuracy of these asymptotic confidence intervals by determining the frequency with which the 95% confidence intervals contained the true value of the parameter. We used the change in log likelihood that occurred when the true parameter value was substituted for the maximum likelihood estimate as a means of assessing this frequency. When we simulated data from a single patient, we found that the 95% confidence intervals included the true parameter value in more than 90% of the simulations, even for small n (n = 10) and small γ, when the coefficient of variation of parameter estimates was large. This suggests that, even when there is large estimate variability (i.e. , even when the point estimate of a parameter may be significantly in error), the 95% confidence intervals will be large enough to include the true value, if interpatient variation is not an issue. When we considered interpatient variation and simulated pooled data from multiple patients, we again found that the 95% confidence intervals included the true value of C50in more than 90% of the simulations. Thus, analyzing data from a population of patients in a naive fashion does not appear to compromise the accuracy of C50estimates. However, the 95% confidence intervals of γ estimates were not as reliable. For larger γ (γ≥ 4.5), the fraction of simulations in which the 95% confidence intervals included the true value actually decreased to less than 90% as n increased. We believe this stems from the inability to distinguish between intrapatient and interpatient variability when data from multiple patients are pooled for analysis in a naive fashion (i.e. , without explicitly accounting for interpatient variability). In one sense, the parameter γ is a measure of intrapatient variability. If γ is large, the concentration–response curve is steep. If the concentration of drug (C) is slightly larger than C50, the probability of drug response is close to one, and if C is slightly less than C50, the probability of drug response is close to zero. The concentration range over which the probability of drug effect is intermediate, where there is significant intrapatient variability, is narrow. In contrast, if γ is small, the concentration–response curve is relatively flat, and the probability of drug response takes on intermediate values over a wider concentration range (i.e. , the region of significant intrapatient variability is larger). In figure 9, we illustrate how interpatient C50variability may be indistinguishable from intrapatient variability. This figure presents concentration–response curves for nine hypothetical patients, each with γ= 10, but with varying C50values. If one data point is collected from each patient and pooled for analysis, the resulting curve (dashed line) appears to have a much lower value of γ. We believe this is the basis for the failure of the 95% confidence intervals to include the true value of γ when we simulated data from multiple patients with log-normal distributions for C50and γ. This is consistent with our observation that the bias in γ estimation was increased in this case and that the mean γ estimate was less than the true value. It is somewhat surprising that the confidence of the estimate of γ decreased with increasing n; however, it should be noted that the confidence intervals are derived from the change in log likelihood, which results from substitution of other parameter values for the maximum likelihood estimates. The width of the confidence intervals decrease as n increases. Conversely, the bias in estimation of γ will not improve as n increases, because the steepness of the apparent concentration–effect curve will reflect the range of C50values; this is evident from figure 9. Consequently, as the confidence intervals become more narrow, the chance that they will include the true value of γ decreases.
We used simulations to investigate the accuracy of analyses of concentration–binary response relations using logistic regression. The implications of these simulations for pharmacodynamic investigations are as follows:
1. The accuracy of parameter estimates increases as n and γ increase. The coefficient of variation of C50in small studies (≤ 20) may be as high as 30–40%. It should be emphasized that estimates of either C50or γ are statistics. The distributions of these statistics are unknown; thus, the exact meaning of the coefficient of variation is unclear. Nevertheless, it seems reasonable to assume that point estimates of C50may be in error by 30–40% in a large number of studies of this size. The coefficient of variation of γ is larger, indicating even greater error in point estimates of γ.
2. The 95% confidence intervals of C50estimates contain the true value of C50in most of the simulations (> 90%), even in simulations with larger coefficients of variation and even when we simulated naively pooling data from multiple patients. Confidence intervals for C50, which are based on asymptotic statistical theory, appear to be reliable. Given the significant coefficient of variation that may occur in some studies, indicating the possibility of error in point estimates of C50, it is clear that confidence intervals should always be reported.
3. When we simulated pooling of data from multiple patients, we found that the 95% confidence intervals of the estimate of γ frequently did not include the true value of γ. The reason for this bias was discussed previously. Because of this, we believe that estimates of γ from studies in which data from multiple patients was naively pooled must be viewed with suspicion. In this type of analysis, intrapatient variability (embodied in the parameter γ) cannot be distinguished from interpatient variability. Accurate estimates of γ necessitate methods of analysis that take interpatient variability into account.