Variability in surgical procedure times increases the cost of healthcare delivery by increasing both the underutilization and overutilization of expensive surgical resources. To reduce variability in surgical procedure times, we must identify and study its sources.
Our data set consisted of all surgeries performed over a 7-yr period at a large teaching hospital, resulting in 46,322 surgical cases. To study factors associated with variability in surgical procedure times, data mining techniques were used to segment and focus the data so that the analyses would be both technically and intellectually feasible. The data were subdivided into 40 representative segments of manageable size and variability based on headers adopted from the common procedural terminology classification. Each data segment was then analyzed using a main-effects linear model to identify and quantify specific sources of variability in surgical procedure times.
The single most important source of variability in surgical procedure times was surgeon effect. Type of anesthesia, age, gender, and American Society of Anesthesiologists risk class were additional sources of variability. Intrinsic case-specific variability, unexplained by any of the preceding factors, was found to be highest for shorter surgeries relative to longer procedures. Variability in procedure times among surgeons was a multiplicative function (proportionate to time) of surgical time and total procedure time, such that as procedure times increased, variability in surgeons' surgical time increased proportionately.
Surgeon-specific variability should be considered when building scheduling heuristics for longer surgeries. Results concerning variability in surgical procedure times due to factors such as type of anesthesia, age, gender, and American Society of Anesthesiologists risk class may be extrapolated to scheduling in other institutions, although specifics on individual surgeons may not. This research identifies factors associated with variability in surgical procedure times, knowledge of which may ultimately be used to improve surgical scheduling and operating room utilization.
IN an era of cost-constrained health care, it is of utmost economic importance for medical institutions to effectively schedule and efficiently use expensive surgical resources. Unexplained variability in procedure times complicates surgical scheduling and reduces operational efficiency. If variability in procedure times could be controlled or better predicted, the cost of surgeries could be reduced through improved scheduling of surgical resources. To reduce the impact of variability in surgical procedure times, we need to model and thus predict surgical procedure times more accurately. The purpose of this article is to lay a foundation for developing statistical models that identify and estimate the effects of major sources of variability in surgical procedure times.
Modeling of variability in surgical procedure times has been of interest for at least 35 yr. 1Various researchers have suggested statistical models for surgical procedure times, including normal models, 2lognormal models, 3,4and three-parameter lognormal models. 5However, it is known 6that when populations are heterogeneous mixtures, improvement occurs by identifying homogeneous subpopulations and modeling within these. A standard statistical procedure useful in this regard is linear modeling, which permits the identification of factors that cause nonhomogeneity in the population. By estimating the effects of the significant factors, improved models can be developed with better fits to the data, thereby permitting improved prediction of surgical procedure times.
We studied surgical variability and differences among surgeons at a large teaching hospital and also searched for demographic factors that might explain additional variability in the surgical procedure times. Our ultimate concern was describing variability and identifying the associated factors that might be predicted, controlled, or altered to improve scheduling and reduce the cost of surgical services.
We evaluated computerized records 7of all surgical cases performed at a large teaching hospital over a 7-yr period from 1989 to 1995. Use of anonymous patient records was approved by the human subjects review committee of the institution that collected the data. Independent variables included in the record were surgical time (ST; defined as the time from incision to closure of the surgical wound), total procedure time (TT; defined as the time from entry into the operating suite until emergence from anesthesia), patient age and gender, surgeon, anesthesiologist, American Society of Anesthesiologists (ASA) risk class (a scale for grading systemic illness defined by the ASA), type of anesthesia, ICD-9 codes, and Current Procedural Terminology (CPT) codes. 8We classified the type of anesthesia administered into four categories: general, local, monitored, and regional.
Of 60,643 total records, 779 were omitted from analysis because of incomplete data on the ST or TT variables, leaving 59,864. Surgeries included between one and three CPTs. There were 46,322 patients with only one CPT code, 10,740 with exactly two different CPT codes, and 2,802 patients with three CPT codes. To reduce potential confounding factors, we chose to analyze those procedures with only a single CPT code.
We reduced the data that we used so that the analysis would be both technically and intellectually feasible. In doing so, we attempted to maintain the breadth of surgical experience, all of the various surgical subsets and surgical subspecialties, and the physiologic diversity of our data, while at the same time minimizing the potential for biasing our results. As in any data mining (knowledge extraction from databases) project, one needs to decide which data to extract in light of the analytic approaches available and to document the process by which the extraction results. 9We documented our extraction process by describing the successively smaller interconnected databases that we used and summarized each in figure 1. The resultant data set was large and diverse.
We referred to the initial 46,322 cases, each described by only a single CPT, as the initial database . This database consisted of 3,096 different surgical procedures performed by 268 different surgeons and 151 different anesthesiologists in many combinations. The initial database was segmented into 20 categories based on the 20 primary headings of the CPT classification. The definitions of these categories, a description of the three most common CPT codes in each category, and the range of codes for each category are listed in table 1.
There were imbalances in the 20 categories of the initial database with respect to the number of CPTs and total ST and TT. These imbalances were imposed by anatomy, physiology, and technology that dictate the number of discrete surgical procedures encompassed by each of the primary CPT categories. To illustrate these imbalances and to characterize the surgical experience at this institution by the type and quantity of surgeries performed, we described ST and TT segmented by each CPT category (tables 2 and 3).
In theory, to achieve our primary goal of understanding the various factors affecting ST and TT, one might like to use a large factorial statistical model using the initial database and taking into account all 3,096 CPT codes and 268 surgeons. This is beyond the limitations of current software and hardware, 10so that appropriate subsets of the initial database must be examined to use available techniques. In pursuing a data mining approach that would allow us to extract representative information, we chose to maintain diversity of physiology and surgical experience yet focus on CPT codes that contained sufficient data so that standard statistical modeling would be appropriate.
We examined the three most numerous CPT codes occurring within each of the 20 categories, so that, at most, 60 CPT codes would be analyzed. To ensure that each surgeon had sufficient numbers of cases to estimate surgeon effect, we retained from these 60 possibilities only those CPT codes with two or more surgeons, each having 10 or more cases for that CPT code. Thus, if a CPT code had fewer than two surgeons with 10 or more surgeries each, we eliminated that CPT code from further consideration. The resultant database was called the representative database and contained 13,196 case records among 40 remaining CPT codes. We believed this database represented a cross-section of the surgical experience of the institution, and we therefore describe it in detail. For the representative database, tables 4 and Web tables 1–3provide tabulations by CPT code of type and number of procedures, age, type of anesthesia, gender, ASA risk class, surgeon, and anesthesiologist. Table 5describes ST and TT by CPT code for the representative database.
Because the analytic methods we used (as described in Statistical Analysis) require adequate sample sizes within the surgeon category and no missing values, we used a subset of the representative database in our analyses. This subset, drawn from the representative database, retained only surgeons who performed 10 or more surgeries each; a total of 1,024 cases were thereby deleted.
A further reduction in cases was required due to analysis of variance (ANOVA) modeling because one factor, ASA risk class, was not recorded whenever surgery was performed with local anesthesia (i.e. , without an anesthesiologist present). Because all cases of local anesthesia had missing ASA values, eliminating all missing values would have eliminated local anesthesia from our analyses. From initial analyses, we knew that type of anesthesia was a more important determinant of surgical duration than ASA risk class, and we wanted to retain it. To decide when to keep or delete ASA missing values, we adopted an arbitrary “10% rule.” When local anesthesia comprised less than 10% of a CPT’s anesthesia types, we deleted local cases and retained ASA in our models. When local anesthesia comprised greater than 10% of cases, we retained all cases and omitted ASA as a factor in the corresponding ANOVA models. An additional 47 cases were deleted using the 10% rule, which we adopted to maintain the number and diversity of cases in our analyses. After omission of these 1,071 cases, we referred to the 12,125 cases remaining among 40 CPT codes as the analytical database . This database was compared with the representative database to ensure that they were similar.
For the CPT codes in the representative database, we obtained various descriptive statistics and comparisons. Then, within each CPT code and based on the analytic database, we fit a five-factor main-effects linear model of the general form:
to each of ST and TT, where surgeon, anesthesia type, ASA risk class, and gender were treated as categoric variables, and age was treated as as a continuous variable. In addition, we examined a six-factor ANOVA with anesthesiologist added to the preceding five factors. In certain cases, as when there was no variation in gender or in anesthesia for a CPT code, the nonvarying factor was not retained in the model. Because of the relatively large number of independent variables, it was not feasible to examine interaction effects. Because of the exploratory nature of our data analyses, no formal corrections for multiple comparisons were used. Because of the previous indication of lognormality of ST and TT, 5,11,12we conducted the ANOVAs using LnST and LnTT (Ln = the natural logarithm). Other investigators in related research have also used log procedure duration as a response variable. 13
Within each CPT, the significance level used for testing each factor was P = 0.05. For certain tabulations, we report the number of CPTs providing statistically significant results for certain factors. As an intuitive statistical yardstick in this regard, if there were no factor effects, we would have expected on the average only 2 of the 40 CPT analyses (5%) to falsely yield significant results (table 6).
To estimate the effect of surgeon within CPT, we used the adjusted means based on LnST for surgeons, 14sometimes exponentiating this value to obtain an estimate of the median surgeon time in actual minutes. A similar approach was used to obtain adjusted means for type of anesthesia. Various graphical representations of these adjusted values were developed.
Detailed Description of Databases
The initial database was segmented into 20 categories by the primary headers of the CPT classification. These 20 categories, the number of cases in each, the descriptors for the three most common CPT codes in each, and the nominal range of CPT codes available to each category are reported in table 1.
There were 8,834,789 min of total operating-room usage (TT) in the initial database; 69% of this (or 6,109,704 min) were used for the surgical procedure (ST) alone, i.e. , with anesthesia and positioning times omitted. There were 1,424,197 min of initial anesthesia time, 639,240 min of preparation and surgical positioning time, and 661,648 min of anesthesia emergence and transport time. Anesthesia induction and emergence time together consumed 24% of TT. For the categorized CPT codes, procedure durations for ST ranged from a minimum of 1 min to a maximum of 1,130 min (18.3 h), whereas median durations for each category ranged from 33 to 195 min. Procedure durations for TT ranged from a minimum of 5 min to a maximum of 1,320 min (22 h), whereas median durations for each category ranged from 70 to 275 min.
The relative importance of the 20 surgical subspecialty categories to the institution is illustrated by the detailed description of ST and TT in tables 2 and 3. Among categories, cardiovascular surgery stood apart, using the greatest quantities (25% and 24% of ST and TT, respectively) of operating-suite time. Four categories—cardiovascular, musculoskeletal, neurosurgery, and general surgery procedures—together accounted for approximately two thirds of total utilization (65% and 66% of ST and TT, respectively). Of 268 surgeons, ∥the number of cases per individual surgeon ranged from 1 to 1,495 and for individual anesthesiologists ranged from 1 to 2,503 cases.
Although the derivation of the representative database was chosen to foster analytical clarity by vastly reducing the number of CPT codes under consideration, it presumed both physiologic and surgical subspecialty representation and maintained a sizable fraction of patient procedures.
As indicated, only 40 of the 60 conceptually possible CPT codes in the initial database appear in the representative database. This database resulted in 13,196 case records, 193 surgeons, and 124 anesthesiologists. There were 2,316,472 min of total operating-room utilization (TT) in the representative database; 71% of this (or 1,633,817 min) was used for the surgical procedure (ST) alone, i.e. , with anesthesia and positioning times omitted. There were 368,220 min of initial anesthesia time, 142,967 min of preparation and surgical positioning time, and 171,468 min of anesthesia emergence and transport time. Roughly speaking, the representative database reflects the initial database in utilization characteristics, but only 26% of total minutes of operating suite utilization. The representative database was segmented by CPT code and tabulated with respect to type of anesthesia, age, ASA risk class, gender, surgeon, and anesthesiologist (table 4, Web tables 1–3).The relative proportion of TT devoted to the various component times is essentially the same in both databases.
The mean age of all patients in the representative database was 50.6 ± 17.7 yr (mean ± SD; n = 13,196 cases). The gender and ASA risk scores of the patients for each procedure are summarized in Web table 2. Of 13,196 patients in the representative database, 51% were female and 49% were male. Of all surgical procedures, 46% were associated with general anesthesia, 34% with local, and 10% each with monitored or regional anesthesia (Web table 3).
In the representative database, one third of cases (34%) was listed as “local.” Of the 66% of remaining cases, i.e. , those who received anesthesia services and were therefore assigned ASA risk scores, 21% had an ASA score of 1, 33% had a score of 2, 36% had a score of 3, 9% had a score of 4, and less than 1% had a score of 5.
In summary, the characteristics of surgical cases in the representative database are, in our experience, typical of contemporary surgical and anesthetic academic practice. Moreover, within each of the various CPT codes in the representative database, there was a diversity of patient demographics and medical, anesthetic, and surgical characteristics.
Descriptive comparisons of the representative and analytic databases indicated strong comparability. Thus, we believed that it was appropriate to assert that the ANOVA results we obtained in the next subsection can be expected to be inferences relevant to the representative database. In fact, the analytic database was technically necessary for certain analyses but was otherwise an undesired digression from the primary task, which was examination of the representative database. For the sake of brevity, tabulations of the analytic database are not reported.
Summary statistics for ST and TT for the 40 procedures in the representative database are reported in table 5. Procedure duration data for TT and ST (fig. 2and Web fig. 1, respectively) were reported for the representative database using multiple boxplots #in which procedures were ranked by median duration. The rank order for ST was similar to that for TT; however, among the shorter procedures, the addition of anesthesia time varied the order for some procedures. The range and variability of the data increased markedly with increased procedure duration. This relation is examined in further detail in the following section.
To evaluate the association of independent variables (age, type of anesthesia, ASA risk class, gender, surgeon, and anesthesiologist) with ST and TT, we used main-effects linear models based on the analytic database. For each CPT code, we tabulated the resultant P values corresponding to each independent variable. In order of importance among the 40 CPT codes, the main effects for ST were found to be statistically significant at the 0.05 level for surgeon (30 CPT codes; 75%), type of anesthesia (15 codes; 44%), gender (6 codes; 19%), age (7 codes; 17%), and ASA risk class (4 codes; 14%;table 6). The order and relative percentages were similar when the same analysis was performed on TT, except that ASA risk class moved up to fourth place in rank order of importance, replacing age, which moved down to fifth place.
From our linear models, we obtained adjusted means for LnST and LnTT for each of the surgeons for each CPT code. These adjusted means are the estimated mean effects for LnST and LnTT for each surgeon, with the effects caused by other independent variables adjusted or controlled for. For instance, for CPT code 52000 and the dependent variable ST, the adjusted means for the 12 surgeons were the estimated mean LnST for each surgeon for cystoscopy adjusted for the effects caused by that surgeon’s patients’ ages, ASA class, etc. The adjusted mean LnST for surgeon was plotted against the weighted overall mean LnST for each procedure in figure 3. The variability of the logged adjusted means appeared homogeneous with respect to procedure duration.
We subsequently plotted the unlogged adjusted means (medians) for each surgeon against the overall median ST and TT for each procedure (fig. 4). This described, on the natural time scale, the linear model results, which used logged data. The variability among surgeons increased proportionately with respect to procedure time, implying that the variability among surgeons behaves as a multiplicative function of procedure time. 15
The mean square error of the linear models decreased with the log of procedure duration (fig. 5), indicating that the case-specific pure error variability measured in the log scale decreased as surgical procedure time increased. On the other hand, the F ratio for surgeon effect for each CPT code, when plotted against average surgical duration (fig. 6), indicated a relatively homogeneous surgeon effect, no matter what the mean log surgical duration was. Thus, surgeon effect remained essentially the same, even for longer-duration CPTs, where case-specific error was smaller. This can be interpreted as meaning that logging of the ST effectively removed variability caused by the surgeon effect. If we accept this interpretation, then the majority of variability in surgical procedure times can be thought of as related to surgeon work rate effect (different surgeons work at different rates, and thus they differ from one another more and more as time passes).
Unlogged adjusted means (medians) for anesthesia effect based on the linear model (which included surgeon effect) were plotted against median procedure time (fig. 7). General or regional anesthesia was often used for longer procedures, whereas local anesthesia was used often for shorter procedures. Compared with figure 4, there was no increased variability in anesthesia-adjusted median ST, which, importantly, indicated that the anesthesia effect did not vary as a multiplicative function of surgical duration.
When anesthesiologist was added to the ANOVA model as a sixth independent variable, there were insufficient data to complete the ANOVA for 15 of the previous 40 procedures for both ST and TT. The main-effect anesthesiologist was significantly (P < 0.05) associated with ST for only 2 of 25 codes (8%) and with TT for only 3 of 25 codes (12%). Inspection of the main-effect surgeon with and without the addition of anesthesiologist revealed that the addition of anesthesiologist as a factor did not substantially affect the results for surgeon for either ST or TT. For this reason, we chose to report the results from the five-factor ANOVA (without anesthesiologists) with 40 procedures, rather than the six-factor ANOVA (with anesthesiologists) with 15 fewer procedures.
Using a data mining approach, we examined a large and diverse surgical data set by segmenting it into 40 representative subsets based on headers from the CPT classification. Each subset was analyzed using a main-effects linear model to identify and quantify specific sources of variability in surgical procedure times. The most important source of variability in surgical procedure times was surgeon work rate effect, which increased proportionate to procedure duration. Type of anesthesia, gender, age, and ASA risk class were, in order of importance, additional sources of variability in surgical procedure times. Case-specific relative variability was high for short cases, probably related to time penalties inherent in repeated turnovers of shorter surgeries.
Our analyses indicated that variability among surgeons may be a result of their working at relatively constant but differing rates, independent of procedure duration. The absolute impact of this work rate effect is pronounced absolute variability among surgeons for surgeries of long duration. As an example, if a surgeon’s work rate is 10% apart from the average of his peers, then after 60 min the surgeon diverges by only 6 min. By contrast, if the surgeon works 10% apart from his peers for 10 h, the surgeon diverges by a full 60 min. We have demonstrated that this important work rate phenomenon exists at a single institution, but we believe it is generalizable to other institutions. Obviously, however, specifics for individual surgeons must be established independently at each institution.
Trends in surgical variability associated with type of anesthesia, age, gender, and ASA risk class are not proportionate to procedure duration. Trends in these factors may also be more easily extrapolated to surgical scheduling in other institutions. Anesthesiologist as a factor had little effect on the variability of ST and TT; however, we expect it would have much more effect on the variability of anesthesia time, a dependent variable not specifically addressed in this study.
It is interesting that type of anesthesia is a factor associated with variability in both ST and TT. It is not surprising that TT is affected by type of anesthesia, because TT contains anesthesia induction and emergence times. However, it is surprising that type of anesthesia is strongly associated with ST, because ST (in theory) contains no component of anesthesia induction, anesthesia emergence, or positioning times within it. The authors believe the observed significant anesthesia factor effect is probably related to patient selection phenomenon, whereby healthy patients, with straightforward procedures, are favored for local or monitored anesthesia. 16Because this study is retrospective, however, we cannot establish causality and only note the association.
Intrinsic case-specific variability is higher for short surgeries than for longer surgeries. The coefficient of variation is highest for short procedures, although the absolute variability is expected to be highest for longer procedures. We conjecture that case-specific absolute variability does not improve with repetition of shorter procedures because start-up delays for surgery, although not supposed to be “measured” by ST, may nonetheless affect it. Thus, when shorter procedures are repeated throughout the day, unmeasured variability may be added to the procedures by either changing patients (laboratory work, history, pathology may differ) or by changing personnel (scrub nurses, circulators, and house staff physicians may differ among cases).
One might presume that longer surgeries should have higher intrinsic case-specific variability. That this does not occur may be a result of the fact that with longer procedures, the surgical team may learn to adapt to one another and as a consequence may become more efficient for longer procedures. In addition, surgeons may be able to adjust ST for longer procedures in ways they are unable to accommodate for shorter procedures, saving time by abbreviating elements of the procedure. We believe that these trends in intrinsic case-specific variability should also extrapolate to other institutions.
The variability in surgical procedure times induced by surgeon work rate effect is of such absolute magnitude that it swamps the effects of other variables in surgeries of longer duration (fig. 2and Web fig. 1). This has important implications in an era of managed care because increased variability in surgical schedules increases the costs of both underutilization and overutilization. 17,18It suggests two strategies that might be used to control costs. One clear strategy for an institution is to schedule surgeries by taking into account the specific surgeon, a particularly important factor for longer surgeries. An additional strategy would be to schedule the longest case first (which we have observed to be the most variable case), 4,19,20because doing so allows the surgical schedule to be dynamically altered by the addition or deletion of shorter subsequent surgeries in such a way as to minimize the costs to the institution of underutilization and overutilization.
A key factor determining the effectiveness of advanced scheduling and allocation systems is the method used to make accurate estimates of surgical procedure times. 21Although guidelines exist to estimate procedure times, 3,22surgeons work at different rates, and few procedures are of standard length. Surgeons’ and schedulers’ estimates and historical averages have all been tried, but few attempts have been made to validate the accuracy of these estimates. 23,24Most schedules presently require adjustments made from memory by an experienced booking clerk. When the clerk is absent, variability increases, and the schedules are of inferior quality. 20,25,26Our results demonstrate that surgeon-specific databases that describe factors affecting the variability of surgeries are needed to improve scheduling 27; however, suitable databases are currently largely unavailable.
The 20 CPT categories from which we chose data represent a full spectrum of adult surgical physiology and subspecialties. CPT codes are important because they are the most widely accepted nomenclature for reporting of physician procedures and services under government and private health insurance programs. 28In our experience, surgeons typically specialize in surgeries contained within a single CPT primary category, and the uniform language is useful for administrative management, claims processing, and a basis for local, regional, and national utilization comparisons. The CPT classification was a useful means to subdivide our data without sacrificing diversity. Nonetheless, our data were not representative of the incidence of particular surgeries among the population because of the necessity of arriving at our analytic database.
To better estimate surgeon effects, we reduced the diversity of our data (both in number of different procedures and surgeons) by eliminating surgeons with minimum experience from our analysis. However, we believe that the variability among surgeon work rates that we detected is probably similar to the population. For none of the other main effect variables did we insist that there be a minimum number of cases in each categorization of that variable. This may have reduced slightly our ability to detect statistically the effects of these other variables. As a result, the percentages of the significant variables identified in table 6may be somewhat higher if we had insisted on minimum sample sizes for these variables as well.
Our data are the experience of a single academic healthcare institution and therefore contain biases inherent in the hospital’s own subspecialization, as well as in the teaching of residents. In addition, procedures with distinct CPT codes are sometimes similar, for example, three- and four-vessel coronary artery bypass grafts, cystoscopy with and without biopsy, etc. Thus, our insistence on choosing the three most frequent procedures in each category also reduces the diversity of our sample subsets to some extent. However, this compromise is necessary to obtain sufficient cases for us to explore differences among surgeons for each CPT chosen.
Although the surgeon work rate phenomenon we observed may be generalizable to other institutions, surgeon specifics will not be. Type of anesthesia, age, gender, and ASA risk class are additional factors that affect variability in surgical procedure times, and this knowledge may be extrapolated to other institutions. Case-specific variability unexplained by the preceding factors is high for short procedures relative to longer procedures and may be an unmeasured time penalty inherent in the repeated turnovers of shorter surgeries. The evidence suggests that surgeon-specific procedure times and knowledge of the sources of surgical variability are needed to improve modeling and thus surgical scheduling. Poor scheduling leads to suboptimal use of the surgical suites. Surgical suites are costly functional areas within hospitals and must be scheduled efficiently for the financial health of the institution as a whole.
The authors thank Dr. Gerard Bashein for assistance with the manuscript.