Potency of inhaled anesthetics (minimum alveolar concentration [MAC]) is typically studied in humans using an "up-down" approach in which the (quantal) response to skin incision is assessed only once for each individual, so that each individual's MAC is never determined. The authors examined the influence of interindividual variability and study design issues (e.g., the number of patients enrolled in a study) on the accuracy of MAC estimates.

The typical sequence of a MAC study was simulated. The authors varied and tested the impact of several factors: anesthetic concentration used to start a study; number of "crossovers" (successive patients having different responses to skin incision) to terminate a study; concentration increment between consecutive patients; interindividual variability; and "measurement error." For each factor, simulations were replicated 500 times, and the resulting estimates were summarized.

Starting an experiment below or above the "true" value led to slightly biased MAC estimates; in contrast, variability was underestimated with starting concentrations close to the true value. More than six crossovers improved MAC estimates minimally but increased variability estimates toward true values. A larger increment size affected MAC minimally and increased variability estimates toward true values. A larger interindividual variability led to more "outlier" estimates for MAC. Under many conditions, several of 500 replicates yielded MAC estimates that deviated more than 10% or even more than 25% from the "true" value.

Individual experiments may yield inaccurate MAC estimates. This inaccuracy is minimized as the number of crossovers increases; however, improvement diminishes as the number of crossovers exceeds six.

## ArticlePlus

Click on the links below to access all the ArticlePlus for this article.

Please note that ArticlePlus files may launch a viewer application outside of your web browser.

MOST studies that report potency of inhaled anesthetics (minimum alveolar concentration 1[MAC]) in humans use an “up–down” method 2in which the presence or absence of movement in response to a surgical incision is assessed, and the target anesthetic concentration for each patient is based on the movement response of the previous patient. 3,4Because each individual is assessed only once, the actual transition point between presence and absence of movement for that individual is never determined. Instead, when two successive individuals given different anesthetic concentrations differ in their response, the midpoint of these anesthetic concentrations is recorded. For example, if a patient moves in the presence of an end-tidal concentration of 0.8%, and the next patient, given 0.9%, does not move, this change in response is termed a “crossover” with a midpoint concentration of 0.85%. When four (or more) crossovers have occurred, the experiment is typically terminated, and the midpoint values are averaged to determine MAC. In some instances, the SD of these midpoint values is reported as an estimate of interindividual variability in MAC. 3This up–down study design for quantal (dichotomous) data differs from the design typically used for drugs that have a quantitative (*i.e.* , range of values) response, for which a fixed design, typically a certain number of individuals studied at each of several predetermined doses, is usual. The up–down approach is typically adopted because it permits a small number of individuals to be studied. 2We question whether the up–down approach results in accurate estimates of MAC and its interpatient variability. Limitations in the data available from humans prevent accurate assessment of these issues in the absence of a large prospective study. However, using simulation, we can assess the impact of interindividual variability, experimental error, and other issues on the accuracy of MAC experiments conducted using the up–down approach.

## Methods

We simulated (and replicated) MAC experiments using the following approach. First, we assumed that each patient has an individual value for MAC that is a function of the “typical value” for the population (“population MAC,” assumed in most of our simulations to be 1% inhaled anesthetic) and an SD (permitted to vary from one tenth to three tenths of the typical value). Second, we assumed that several factors (*e.g.* , machine measurement error and differences between end-tidal and effect site concentration) influenced the “accuracy” of measured end-tidal values as a representation of effect site concentrations (we use this term, rather than brain concentrations, because the effect of inhaled anesthetics on spinal α motor neurons may contribute to observed immobility as a response to skin incision 5). Third, we assumed that in a small number of patients (those in whom individual MAC differed minimally from the actual anesthetic concentration), observations of movement would be imprecise, 6leading to incorrect assessment of movement *versus* nonmovement.

We recognized that several factors might result in a difference between the concentration reported by the monitoring device and the alveolar concentration, and between the alveolar concentration and the effect site concentration. First, many anesthetic monitors report end-tidal concentrations with no more precision than 0.1%; thus, the concentration reported by the machine is a rounded representation of the actual “internal machine” reading. Second, machines that measure anesthetic concentration in airway gases have measurement errors. ‡Third, end-tidal concentrations differ to a small extent from arterial and effect site concentrations (*e.g.* , ventilation/perfusion [V̇/Q̇] mismatch), even at steady state. Assuming “reasonable” values (defined later) for these factors, we simulated the internal machine reading, the end-tidal concentration, and the effect site concentration based on a target end-tidal concentration for each individual.

Each simulation was performed using the sequence of a MAC study (table 1). For the first simulated patient, we selected a starting value for the target concentration and, using the approach described above, simulated an effect site concentration. For that individual, we also simulated their individual value for MAC (“individual MAC”), selecting this value randomly from a normal distribution with a mean equal to the population MAC and an SD selected for that simulation. We then compared the effect site anesthetic concentration to the individual value for MAC. If the effect site concentration exceeded individual MAC, we assumed that the patient would not respond (“no move”); conversely, if the effect site concentration was less than individual MAC, we assumed that the patient would move. For a patient in whom the difference between effect site concentration and individual MAC was small (*e.g.* , the ratio of these values was between 0.98 and 1.02), we assumed that the observation was ambiguous (“ambiguous response”) and assigned that patient randomly the value “move” or “no move.”

Once the response of the first patient was determined, a second patient was simulated. If the first patient “moved,” the target concentration for the second patient was increased by a predetermined increment, (*e.g.* , 10% of the starting concentration value). Conversely, if the first patient did not “move,” the target concentration for the second patient was decreased by the same increment. Target concentration for each subsequent patient was altered in the same manner (*i.e.* , based on the starting concentration). If the movement response of a patient differed from that of the previous patient, those two patients constituted a crossover; the midpoint between the target end-tidal concentrations of these two patients was recorded. However, each patient was permitted to be involved in only a single crossover (see footnote of table 1for an example). The process of assessing the response of consecutive patients continued until 10 crossovers were attained (note that clinical studies usually terminate after four crossovers). 3,7

Simulations were performed using the S-PLUS software package (version 5.0, 1998; MathSoft Inc., Seattle, WA) in a Solaris (UNIX) operating environment. For each of the following five issues, simulations were replicated 500 times, and the resulting values were averaged.

### Impact of Starting Concentration

We assumed that the optimal concentration to start a MAC experiment is a value close to the expected result. However, early in clinical development of an inhaled anesthetic (particularly as new populations are being studied 7), investigators may not be aware of the expected result. We hypothesized that using a starting value that differed markedly from the true value would impact the estimates of MAC minimally but would increase the number of patients required to achieve a predetermined number of crossovers. We also speculated that individuals whose MAC value differed markedly from the population value might produce outlying values for crossover midpoint concentrations, thereby impacting the estimate of MAC in certain instances. We simulated starting concentrations ranging from one half to twice the population MAC value (*e.g.* , assuming that the typical value for MAC is 1%, we stimulated starting target concentrations ranging from 0.5 to 2.0%).

### Impact of the Number of Crossovers

Dixon 2suggested that this type of experiment be continued until at least four crossovers were reached. We hypothesized that continuing a MAC study to a larger number of crossovers may result in more accurate MAC estimates. Therefore, we compared simulations based on 2, 4, 6, 8, or 10 crossovers. This was accomplished by completing each simulation when 10 crossovers were attained and then analyzing the data for all simulated patients or only through the first 2, 4, 6, or 8 crossovers.

### Impact of Increment Size of Concentration Adjustments

For a clinical MAC experiment, a typical value for the incremental change in target concentration of inhaled anesthetic between two consecutive patients ranges from approximately 10%3to approximately 15%4of the expected MAC value §(*e.g.* , 0.10% or 0.15% for an expected MAC of 1%, and 0.6% or 0.9% for an expected MAC of 6%). We hypothesized that a larger increment would decrease the number of patients required to achieve a predetermined number of crossovers but that variability in the estimate of MAC would increase. We simulated increment values as being a fraction of the chosen starting concentration (either 10% or 20%) over a range of starting concentrations from one half to twice the expected MAC value. We further tested whether after achieving the first crossover, decreasing the increment size to half its initial value or one tenth of the first midpoint value would affect the accuracy of MAC estimates and the number of patients needed to achieve accurate results.

### Impact of Interindividual Variability

If interindividual variability is small, then successive individuals differ minimally in their individual MAC, and crossovers should occur in rapid succession once the target concentration is near the true MAC. In turn, the range of crossover midpoint concentration should be small. In contrast, larger interindividual variability should result in a larger range of crossover midpoint concentrations and possibly more outlier (*i.e.* , flawed) estimates for MAC. We simulated interindividual variability ranging from 10 to 30% of the population MAC value.

### Impact of Measurement Error

At least three factors (digits displayed by the anesthetic monitor, machine imprecision, and V̇/Q̇ mismatch) create differences between the value reported by the anesthetic monitor and the effect site concentration. We simulated each component of these measurement errors. First, we assumed that precision of reporting end-tidal concentration was either 0.1% (*e.g.* , a reported concentration of 1.1% could represent an internal machine reading of 1.050–1.149%) or 0.01%. Second, we simulated that the effect of V̇/Q̇ mismatch and machine measurement error were each either 5% or 0.1% of scale (the smaller value minimized the influence of these factors on the estimate of MAC).

### Assessment of Simulations

Each simulation continued until 10 crossovers were obtained. The average of the first 2 midpoint values was termed MAC2; the averages of the 4, 6, 8, and 10 midpoint values were termed MAC4, MAC6, MAC8, and MAC10, respectively. SDs (SD2, SD4, SD6, SD8, SD10) of these crossover values were determined. The number of individuals needed to achieve the desired number of crossovers (2, 4, 6, 8, and 10) were termed accordingly N2, N4, N6, N8, and N10, and the number of individuals in whom ambiguous responses were obtained was recorded (Ambig2, Ambig4, Ambig6, Ambig8, and Ambig10, based on 2–10 crossovers). We further assessed whether MAC2, MAC4, MAC6, MAC8, or MAC10 differed from the true value for MAC by more than 10, 15, 20, or 25%. Each simulation was replicated 500 times, and summary statistics for MAC, SD, Ambig, and N were prepared from these replicates. The maximum (MAX2, MAX4, MAX6, MAX8, and MAX10, respectively) and minimum (MIN2, MIN4, MIN6, MIN8, MIN10, respectively) values of MAC with 2, 4, 6, 8, or 10 crossovers were determined from these 500 replicates. Each of the issues described above was examined over a variety of conditions. Only representative examples are reported.

### Comparison of Our Simulations with Published Clinical Studies

These simulations make various assumptions about the magnitude of interindividual variability and measurement error. If these assumptions are flawed, our simulations would be invalid. Although certain of our assumptions have not been tested, one overall check can be performed. We surveyed the published literature for MAC and found that the anesthetic for which MAC studies were conducted in the largest homogenous group of humans was halothane (97 individuals in five studies). 1,8–11We used the reported values for end-tidal concentrations and movement response in these individuals in two approaches. First, we assessed how many of these individuals moved at target concentrations greater than the estimated MAC value determined from that study or did not move at concentrations less than that value. Second, we assessed how many of all the patients in these studies moved at end-tidal concentrations greater than 0.75% halothane or did not move at concentrations less than 0.75% halothane. The first approach was also used for the reported data of two studies of sevoflurane MAC. 4,12If there was no interindividual variability or measurement error, individuals who moved at concentrations greater than MAC or did not move at concentrations less than MAC would be “misclassified.” The frequency of these misclassified individuals in the population thereby estimates the sum of interindividual variability and measurement error.

## Results

### Variability in MAC Results

With 500 replicates (assumptions: population MAC = 1.0%; interindividual variability = 10% of population MAC; incremental changes for inhaled anesthetic = 20% of starting concentration; machine measurement error = 5%; V̇/Q̇ mismatch = 0.1%; starting concentration = 1.3% inhaled anesthetic), 23% of replicates yielded MAC estimates more than 10% different from the true value (fig. 1); 2.4% of replicates deviated more than 15% from the true value. The range of MAC values was 0.45%.

With different initial conditions (*e.g.* , incremental changes for inhaled anesthetic = 0.2%, and interindividual variability = 30% of population MAC), MAC values varied from 0.65 to 1.35%. The magnitude of this range depended on the number of crossovers and the starting concentration (data not shown). With interindividual variability of MAC of 10% and minimal error, the average SD increased toward the true value (10%) with either a larger number of crossovers or larger starting concentrations (table 2). The average SD was typically 6–9% with either four or six crossovers and starting concentrations close to the true MAC. With a simulated interindividual variability of MAC of 20%, the average SD was approximately 11% with either four or six crossovers (table A, Web Enhancement;fig. 2).

### Impact of Starting Concentration

The average MAC estimate in 500 replicates was less accurate with small starting concentrations (table 2). Typically, starting concentrations larger than population MAC yielded average values for MAC slightly greater than population MAC, whereas starting concentrations less than population MAC yielded values for MAC less than population MAC (fig. 3and table 2); this was more pronounced for the smallest starting concentrations. As the starting concentration approached population MAC, the number of individuals needed to attain a desired number of crossovers decreased. With an increment size of one tenth of the starting concentration between consecutive patients, the magnitude of decrease was approximately two individuals per 0.2% change in the starting concentration (based on a true MAC value of 1.0%).

### Impact of Number of Crossovers

Increasing the number of crossovers from two to six improved MAC estimates (figs. 2, 4, and 5), whereas a further increase in the number of crossovers to 10 affected the average MAC and SD of MAC estimates less (table 2; table A, Web Enhancement). The range of MAC values was smaller with six, compared with four, crossovers (*i.e.* , fewer extreme [outlier] values). Increasing the number of crossovers from 2 to 10 required 2–3 more patients for each additional crossover under many different simulated conditions, *e.g.* , different starting concentrations (table 2).

### Impact of Increment Size

Increasing the increment size from one tenth to two tenths of the starting concentration slightly improved the average value of MAC estimates but increased the range of MAC estimates; variability (SD) increased and was closer to the true value (table B, Web Enhancement;fig. 4). The number of individuals needed to attain the target crossovers also decreased. The average number of ambiguous decisions was typically small. Decreasing the increment size to half its initial value or one tenth of the first midpoint value after the first crossover was achieved did not improve MAC estimates (data not shown).

### Influence of Interindividual Variability

As interindividual variability increased from 10 to 30% of the population MAC, the range for MAC estimates (fig. 5) and the number of ambiguous decisions decreased (table C, Web Enhancement). With interindividual variability of MAC of 10% and a starting concentration of 1.3% (table 3), 4% of 500 replicates yielded MAC estimates that differed more than 10% from the true value; 1% of the 500 replicates yielded a MAC estimate that differed more than 15% from the true value. With interindividual variability of MAC of 30%, approximately one third of replicates yielded MAC estimates more than 10% different from the true value. With a starting concentration of 1.3%, one fifth of replicates differed more than 15% from the true value, one tenth differed more than 20%, and one twentieth differed from the true value more than 25%.

### Effect of Measurement Error and V̇/Q̇ Mismatch

With measurement error and V̇/Q̇ mismatch of 5% each *versus* 0.1% each, the percentage of MAC estimates more than 10, 15, 20, or 25% different from the true value typically doubled (table 4). A starting concentration that deviated more from the true MAC value resulted in more outliers.

### Comparing Individual Patient Response with the Population MAC

#### Published Data for Halothane.

Estimates for halothane MAC in humans ranged from 0.73 to 0.765%. Of the 97 patients, 21 (22%) were misclassified based on their movement response, their reported target concentration, and the MAC value reported in that manuscript. Applying a single cutoff value for MAC of 0.75% for all five studies, 17% of patients were misclassified.

#### Published Data for Sevoflurane.

Five (25%) of 20 patients and five (22%) of 23 patients in two different studies were misclassified by these criteria.

#### Our Simulations.

Assuming a population MAC of 1.0, incremental changes for inhaled anesthetic of 0.1%, machine error level of 5%, V̇/Q̇ mismatch of 5%, and a starting concentration of 1.3% inhaled anesthetic, the percentage of patients misclassified was 16% with interindividual variability of 5% of population MAC, 17% with interindividual variability of 10% (table 1), 34% with interindividual variability of 20%, and 36% with interindividual variability of 30%.

## Discussion

To determine the reliability of estimates for MAC in humans, we simulated a typical MAC experiment and explored a number of issues that may affect these estimates. We also examined the impact of these issues on the number of patients that need to be enrolled to complete an experiment. We found that certain issues impacted the results of the study to a large degree, whereas other factors had less impact.

First, we learned that under many conditions, one or more of 500 replicates yielded estimates for MAC that deviated more than 10% or even more than 25% from the true value. In that an investigator conducts only a single experiment, this suggests that random occurrence may permit inaccurate estimates of MAC. If this occurred in the first experiment with a particular anesthetic in a particular population, there would be no previously published values to suggest an error, and the results might be accepted without criticism. However, if this occurred as the second or later experiment with that anesthetic in that same population, we speculate that the results might be assumed to be flawed (and never submitted for publication), even though the previous, rather than the new, results are actually flawed. Alternatively, the results of both studies might be published, leading the reader to conflicting conclusions as to the true value. For example, Inomata *et al.* 13reported that sevoflurane MAC in children aged 1–9 yr was 2.03%. In contrast, Lerman *et al.* 14reported that sevoflurane MAC in children aged 1–12 yr was approximately 2.5%. In that Lerman *et al.* 14obtained similar estimates of MAC for subgroups aged 1–3, 3–5, and 5–12 yr, it is unlikely that the difference in MAC between studies is a function of the slightly different age groups. One explanation for these different values for sevoflurane MAC is that the populations differ (Japanese *vs.* Americans). However, we propose that the populations may not differ; instead, experimental errors and interindividual variability may result in these findings. Marked differences in MAC were also reported for sevoflurane in healthy adults: values were 1.58, 151.71, 12and 2.05%. 4Similarly, our findings may explain why some investigators demonstrate an effect of adjuvant drugs on anesthetic potency (*e.g.* , Forbes *et al.* 9reported that pancuronium decreased halothane MAC), whereas other investigators do not (*e.g.* , Fahey *et al.* 16found no effect of pancuronium on halothane MAC).

Second, we learned that the SD determined from the midpoint concentrations underestimates true interindividual variability. A similar observation was suggested by de Jong and Eger 17in 1975; however, they presented no data to support this claim. The magnitude of the underestimate decreases as the number of crossovers increases, as the starting concentration differs markedly from the true value, and as the increment size increases (compare table C [Web Enhancement] with table 2). We offer the following explanation for this phenomenon. Consider a patient with an individual MAC value (1.45%) that differs markedly from the population MAC (1.0%). If the experiment starts at a high target concentration (*e.g.* , 1.5%), successive iterations of other patients (in whom individual MAC is close to 1.0%) results in successive “no move” responses. If this patient is tested at a target concentration of 1.4%, the patient will move, contributing a crossover value that differs markedly from the population MAC. In contrast, if the starting concentration for the experiment is close to the population value, successive responses will toggle around the population value. When this “aberrant” patient is tested with a target concentration of 1.0%, the “move” response will provide no insight as to whether this patient’s individual MAC value is 1.1% or 1.45% (the actual value). Therefore, a strategy in which MAC is approached from extreme values provides additional information about variability within the population. Note that a design in which the increment size is coupled to the starting concentration results in a larger increment with a larger starting concentration. In that a larger increment size and a starting concentration that differs from the typical value both improve the estimate of variability of MAC, it seems that the optimal strategy to quantify variability in MAC is to start with an initial concentration that exceeds the expected value.

Third, we learned that on average, four crossovers yield nearly the same average MAC as a larger number of crossovers. However, with four crossovers, there are more outlier estimates for MAC than with six or more crossovers (table A, Web Enhancement). In that the investigator can never be sure that the result of a particular experiment is not an outlier, this suggests that the larger number of crossovers might be useful to prevent spurious results. Terminating a study after six or more crossovers requires additional resources; however, our simulations suggest that only 2.0–2.5 additional patients would be needed to attain each additional crossover. The benefit of increasing the number of crossovers beyond six seems to be small.

We also learned that the starting concentration may affect the results of the experiment. First, if the starting concentration deviates markedly from the population MAC, a larger sample size is needed to attain the desired number of crossovers. Second, a starting concentration markedly different from the true MAC value yields slightly biased estimates for MAC and a larger incidence of biased individual estimates.

Finally, a larger increment size of concentration changes between consecutive individuals decreased the number of individuals needed to attain the target crossovers and increased variability of MAC estimates toward the true value.

Several issues of our study design warrant comment. First is our assumption that the response in a MAC experiment can be ambiguous. Antognini *et al.* 6assessed the movement response in rats repeatedly at different end-tidal anesthetic concentrations. Some animals moved at end-tidal concentration at 110% of their individual MAC or did not move at concentrations at 90% of their individual MAC. These findings suggest that there is intraindividual variability in MAC. We assumed that an ambiguous response occurred over only a small range (98–102% of the individual MAC). However, the findings of Antognini *et al.* 6suggest that the range might be larger. If the range is larger, the opportunity for inaccurate MAC values increases even further than we simulated.

The second issue of study design regards our many assumptions, the validity of several of which can be challenged. If our simulations introduced markedly more interindividual variability or error than exists in nature, then a markedly larger number of patients would be misclassified. For example, interindividual variability may be markedly smaller than the values we tested. If this were true and if the various components of measurement error were trivial, results from MAC studies should show a sharp delineation between movers and nonmovers, *i.e.* , there should be a single end-tidal concentration above which no patient moves and below which all patients move. The compiled results for halothane indicate that a single cutoff value of 0.75% misspecifies approximately 17% of patients; using the MAC estimate from each study results in 22% of patients being misclassified. Similarly, in two sevoflurane MAC studies, 22–25% of patients were misclassified based on single cutoff values. 4,12Allowing for 5% measurement error, 5%V̇/Q̇ mismatch, a 0.1% precision in machine measurement (*i.e.* , measurements reported to a tenth-percent anesthetic concentration), and 10% interindividual variability yielded a percentage of misclassified patients (17%) similar to published values. This suggests that the combined error and variability used in our simulations is no larger in magnitude than that which exists in nature.

Our simulations assume that each individual’s response to a stimulus is assessed only once. This is the approach typically used in MAC studies in humans. In contrast, a small number of studies in humans and many in animals bracket MAC for each subject. Different statistical issues may apply to studies performed using that design.

These simulations assumed a typical value of MAC of 1%. Although this value is close to that of halothane MAC, our results apply to all anesthetics, regardless of their typical value of MAC. We note that one component of our simulations, measurement error (quantified as a percentage of the true value) may be smaller for anesthetics, such as desflurane, with larger MAC values. The simulations also assumed normal or log-normal distributions. It is possible that certain distributions (*e.g.* , interindividual variability in MAC) may have nonnormal distributions. However, in the absence of information suggesting such distributions, we are unable to evaluate the possible implications of such distributions.

We replicated each simulation 500 times. Initial simulations (not reported) were performed with only 100 replicates. When these simulations were repeated, results sometimes varied, an expected finding with this small sample size. With the larger sample size (500 replicates), repeated simulations yielded similar results. However, to verify that 500 replicates were sufficient to minimize error, we replicated one set of simulations 25,000 times. The results of that markedly larger simulation were quite similar to those of the simulations with 500 replicates. Thus, we report analyses performed with 500 replicates.

Finally, although most studies of anesthetic potency focus on the estimate of MAC rather than on the estimate of interindividual variability in MAC, we focus on both. Clinicians presumably dose anesthetics so that few, if any, of their patients move in response to surgical incision. This is accomplished by dosing to prevent movement in 95% of patients (ED^{95}) or 99% of patients rather than administering a dose for which 50% of patients will move (ED^{50}). Determining ED^{95}requires knowledge of both ED^{50}and its variability.

In summary, we simulated the results of a MAC experiment in humans using an up–down approach. Allowing for reasonable estimates of interindividual variability and error, the value for MAC was estimated accurately, on average, with 500 replicates. However, certain estimates of MAC in individual simulations differed by 25% or more from the true value. In that investigators typically perform MAC studies only once, our findings suggest that the up–down approach for analysis of quantal data may yield incorrect estimates. Our findings suggest that aiming for six or more crossovers, rather than four, decreases the likelihood of reporting an inaccurate estimate and incurs minimal additional costs.

The authors thank Edmond I. Eger II, M.D. (Professor of Anesthesia, Department of Anesthesia, University of California San Francisco, San Francisco, CA), for valuable criticism of their study design.