Better predictions of each case's duration would reduce operating room labor costs and patient waiting times. A barrier to using historical case duration data to predict the duration of future cases is the absence for some cases of previous data for the same scheduled procedure from the same facility. The authors examined sample size requirements for pooling case duration data from several facilities to create a 90% chance of having case duration data for almost all procedures.

Four academic medical centers provided data, totaling 200,401 cases classified by the scheduled Current Procedural Terminology codes.

The 12% of cases in which procedures occurred once or twice accounted for 79% of procedures or combinations of procedures. When a procedure was being performed for the first time at a facility, that same procedure had been performed previously at least once at one or more of the other three facilities only 13-25% of the time. More than 1 million cases would be needed to have a 90% chance of having at least 3 cases for each procedure observed in the original 200,401 cases. However, with N = 200,401 cases in our initial data set, we observed less than one third of the estimated total number of possible procedures.

The lack of historical case duration data for scheduled procedures is an important cause of inaccuracy in predicting case durations. However, millions of cases probably would be required to provide historical case duration data for almost all procedures.

DATA about the duration of surgical cases are needed to predict the time required for future cases. If case durations could be predicted more accurately, decision-support software could help to reduce operating room (OR) overtime labor costs, 1–3patients’ waiting time on the day of surgery, 3–5,6the number of days patients must wait for surgery, 7and the time patients wait in their surgeons’ afternoon clinics. 8Generally, 4the strategy for predicting case durations uses historical data classified by surgeon, type of anesthetic, and scheduled procedure. 1–3,7–9If surgeon-specific data are not available, case duration can be estimated from the type of anesthetic and scheduled procedure. 9If information about the anesthetic is not available, historical data classified by the scheduled procedure alone can be used. 9,10If none of this information is available, case duration data could be obtained from another facility. In this study, we address how many facilities would need to pool data for that approach to be useful.

Ideally, decision-support software built into the OR information system would help the OR manager to make better decisions. However, efforts to improve OR management by using decision support are hampered by the occurrence of cases involving uncommon procedures with little or no historical data. Importantly, by “case” and “procedure,” we mean the following. When a patient enters and then leaves an OR, one *case* is performed. A case can involve one or more *procedures* . For approximately half of all surgical cases, 11,12more than one procedure is performed. *In this article, we use “procedure” to mean one or a combination of procedures performed in a case.* For example, two procedures that are often performed during the same case are phacoemulsification and aspiration of cataract and insertion of intraocular lens. The combination of these two procedures would count as one procedure in this article.

We performed the initial work that identified the absence, for a surprisingly large percentage of cases, of previous data for certain procedures. 11At an academic medical center, examples of uncommon procedures with limited historical data included anorectal myomectomy, excision of mandibular abscess, coccygectomy, partial ostectomy of the sternum, and intratemporal decompression of the facial nerve. 11These are typical examples of uncommon procedures. They do not have easily identified analogs. They are procedures that are so uncommon that little or no case duration data are available for them.

We were interested in these procedures that are so uncommon that little or no case duration data are available for them. The National Survey of Ambulatory Surgery showed that uncommon procedures occur nationally. 12Twenty percent of outpatient surgery cases performed in the United States are of a procedure that is performed annually 1,000 times or less in the United States. 12Thirty-six percent of outpatient surgery cases may be of a procedure that is performed less than once per facility per year. 12

Procedures that are performed infrequently are important to OR decision support because they have a disproportionately large negative effect on OR efficiency. These uncommon procedures account for much of the dichotomy between scientific success in using historical data to predict case duration 1–3,7–9and OR management frustration with cases that run late. 1For example, when using case duration data to reduce OR overtime labor costs and provide suitable breaks for OR nurses and anesthesia providers, the objective is to predict the time required to complete a series of consecutive elective cases. 1,4,5If as few as 15% of cases have limited historical data (half 12the national figure) and each OR averages three cases per day, by random chance, more than one third of ORs would include at least one case with limited historical data. 1,11One late-running case can adversely affect the entire day's schedule.

An important step to improve OR efficiency through decision support is to compensate for the cases that consist of uncommon procedures. This does not mean just determining (or guessing) an expected (average) duration for such procedures (*e.g.* , by asking the surgeon). OR decision support relies on the tails of the probability distributions (*e.g.* , the shortest and longest times the case could take). 4,7,8A logical approach would be to pool case duration data among facilities, thereby increasing the sample size of historical cases sufficiently to have data even for the uncommon procedures.

For example, suppose a surgeon schedules an anorectal myomectomy at a hospital at which the procedure has not been recently performed. Data for this procedure's duration could be obtained from other hospitals. The resulting statistical analysis would involve calculating a naïve pooled estimate of the historical case durations, or using mixed-effects modeling to compensate for heterogeneity of case duration among facilities. Whatever statistical method is used, the strategy requires that if a procedure is performed infrequently or not at all at one facility, it must be performed at another facility.

In this study, we determine the necessary sample size for pooling case duration data among facilities to have at least three historical case durations for each procedure. If a relatively small sample size is adequate (*e.g.* , 50,000 cases), a surgical facility could partner with several other facilities within its healthcare system to obtain case duration data. However, if the required sample size is large (*e.g.* , 5 million cases), pooled perioperative data would need to come from scores of facilities because many cases would be of procedures that are very uncommon. To answer these questions, we used detailed surgical procedure data from four academic medical centers and less detailed data from a national survey.

## Methods

Four academic medical centers provided data about the scheduled procedures for all cases performed at their main and ambulatory surgery facilities during specified time periods (table 1, fig. 1). Two facilities had 4 yr of data available in computerized format, one facility had 3.25 yr, and the fourth facility had 1.3 yr of data. This provided 200,401 cases (table 1).

Cases were classified by their scheduled 13Current Procedural Terminology (CPT) codes and the presence or absence of an anesthesia provider. 9If a procedure was designated by more than one CPT code, that combination of codes was considered to characterize a unique procedure. Combinations were considered the same regardless of the order in which the procedures were listed. CPT codes were adjusted to follow January 1, 1999 values. In this article, a procedure is defined as a unique combination of scheduled procedures with or without an anesthesia provider. Repeating the example from the Introduction, two procedures that are often performed during the same case are phacoemulsification and aspiration of cataract and insertion of intraocular lens. The combination of these two procedures performed with topical anesthesia would count as one procedure in this article. The combination of those two procedures with monitored anesthesia care would count as a different procedure in this article. We observed 26,829 different procedures (table 1).

The “observed” number of cases for each procedure was determined from the data from the four medical centers. We calculated the percentage of cases that were of uncommon procedures, defined as one to three historical cases (see two paragraphs below). We also calculated the percentage of procedures that had few previous cases. We calculated standard errors for each of these percentages.

To further explore the effect of pooling data among facilities, we split the data into two sets. The most recent 1,500 cases from each facility were considered to be “new” cases. All preceding cases were used as “historical” data. We calculated the percentages of procedures among the new cases that were not in the historical data.

We hope that pooling the data will decrease the number of procedures with limited data. However, two concerns can limit the usefulness of the pooling of data. First, some procedures may still only appear in the database once or twice. This is important because at least three historical cases of the same procedure are needed to provide good predictive accuracy. 11Second, some procedures may be so uncommon that they do not appear in the database at all. We evaluated these two concerns, using the methods described in the next two paragraphs.

First, it is desirable to have a minimum of three instances of a given procedure in the database. However, some uncommon procedures appear only once or twice. We calculated the minimum number of additional cases needed to create a 90% chance of getting one or two *more* cases of each of the uncommon procedures. |PNThis power analysis would provide at least three cases for each procedure that appears in the database.

Second, we addressed those procedures that do not appear in the database. The true number of different procedures will always exceed the observed number. 15Increasing the sample size will decrease the underestimation of observed *versus* actual procedures. Statistical methods can decrease the effect of sampling error on the estimate of the number of different procedures. Let F^{1}equal the number of procedures with only one observed case (*i.e.* , singletons), let F^{2}equal the number of procedures with two observed cases (*i.e.* , doubletons), and so forth. S^{obs}is the total number of procedures observed. S^{abund}is the number of “abundant” procedures, defined, based on simulation of combinatorial statistics, 16as those with 11 or more cases. The remaining procedures are considered in these statistical methods to be “rare,” meaning that S^{obs}= S^{abund}+ S^{rare}. Then, a conservative (lower) value can be derived for the total number of different procedures:

The corresponding variance of this estimator equals

This method is conservative because its calculation is performed without estimating the heterogeneity of the frequency of occurrence (F^{k}) of different procedures. This coefficient of variation of the F^{k}, referred to by γ, 2can be estimated by taking the maximum of 0 and

where N^{rare}, the number of cases that are of a rare procedure, equals

The resulting more sophisticated estimator for the true number of different procedures in the population equals

Both of these estimators can underestimate the true number of different procedures when the sample sizes are sufficiently small that more than 30% of the different procedures are not observed. 19,20Therefore, we used graphical methods to evaluate how many more procedures surgeons perform than we observed.

We pooled data among the four medical centers to create a histogram of the number of cases of each procedure, plotted on a logarithmic scale (fig. 2). Such plots with logarithmic scales typically yield normal distributions. 1,19,21If our sample size was too small to observe many of the rare procedures, the plot would resemble a bell-shaped curve with an opaque card covering the left side of the curve with the most uncommon procedures. 21The relatively common procedures, on the right tail of the histogram, would be detected.

We used an additional data set to assure that our expectation was correct for the shape of the underlying statistical distribution of the number of cases of each procedure (fig. 3). We used data from the United States National Center for Health Statistics’ 1994 to 1996 National Survey of Ambulatory Surgery. We recently reported details and demographics of this survey. 12,22We used Excel Visual Basic 6.0 (Microsoft, Redmond, WA) to analyze the raw data for the sample of 228,332 completed surgical cases with an anesthesia provider. The 24,084 different procedures were classified by up to six ICD-9-CM (International Classification of Disease) codes.

The survey used probability sampling so that nationally representative results could be obtained without surveying every ambulatory surgery case in the United States. 4,12The National Center for Health Statistics assigned each case a weight using statistical methods that considered the probability of selecting the case's facility, the probability of selecting the case among all cases at the case's facility, and the response rates of facilities and locations within facilities. For example, some cases had weights of 10 (*i.e.* , represented 10 outpatient cases nationally), and others had weights of 20,660 (*i.e.* , represented 20,660 outpatient cases nationally). We created a histogram showing nationally representative results by calculating, for each observed procedure, the sum of the weights of all cases of the procedure and then dividing by the sum of the weights of all cases. We were not able to apply the two nongraphical statistical methods to this data because the survey's use of probability sampling violated the assumptions of these statistical methods.

## Results

In this article, a procedure is defined as a unique combination of scheduled procedures with or without an anesthesia provider. Table 1shows that after pooling 200,401 cases from the four academic medical centers, 11.9% of the cases were of a procedure that occurred only once (singletons) or twice (doubletons). Figure 1shows this graphically.

The 12% of the 200,401 cases that were singletons or doubletons accounted for 79% of the procedures (table 1). This was larger than the percentages of singletons or doubletons for any one facility (table 1). That is, the sample size of 200,401 cases was sufficiently small that pooling the data among the four facilities did not decrease the incidence of rare procedures. For the first instance of a procedure being performed at a facility, 13–25% of the time, that procedure had been performed previously at least once at one or more of the other three facilities (table 2, last row). More than 1 million cases would be needed to have a 90% chance of having at least 3 cases for each procedure observed in the original 200,401 cases (table 3, third row of numbers).

A histogram of the frequency of each procedure from the National Survey of Ambulatory Surgery was bell shaped when plotted on a logarithmic scale (fig. 3). This curve is consistent with the expectations described in the Methods. 19,21Procedures with a moderate number of cases were more numerous than procedures that had a small or exceedingly large number of cases. This contrasted with the pooled cases from the four facilities, in which more than half of the procedures were scheduled only once (fig. 2). As explained in the Methods, these graphical and inferential methods suggest that many procedures that surgeons perform were not observed in the 200,401 cases (*i.e.* , the sample size of 200,401 cases was too small to provide sufficient data).

We calculated a lower bound on the number of procedures that surgeons perform. The estimate included not only the procedures we observed, but also those that were sufficiently rare that we did not observe them (table 4). The conservative (lower) estimate was 93,340 procedures. The more sophisticated estimator for the estimated total number of different procedures was 96,497. Therefore, from 200,401 cases, we observed fewer than one third of the estimated total number of procedures (table 1).

Furthermore, 96,497 procedures is an underestimate of the true number of different procedures. As exemplified by the findings for each facility (table 4), these estimators underestimate the true number of different procedures when the sample sizes are sufficiently small that more than 30% of the different procedures are not observed. 19,20The graphical and inferential results both suggest that this is true for the pooled data from the four facilities.

## Discussion

### Implications for Predicting Case Durations

We previously showed that more than one third of outpatient surgery cases may be of a procedure performed less than once per facility per year. 12In this article, we report pooling case duration data among four facilities to try to increase the number of previous cases for each scheduled procedure. However, we failed to decrease the percentage of procedures that were performed only once or twice. To obtain data for almost all procedures, more than 1 million historical cases are needed (tables 3 and 4).

Our results provide insight into appropriate pooling of procedure-specific surgical data among facilities. Pooling such data may be important because data about surgical case duration are needed to minimize OR overtime labor costs, patients’ waiting times on the day of surgery, and patients’ waiting times in surgeons’ afternoon clinics. 1–5,7,8However, we showed in this study that informal arrangements for pooling data among several small- to moderate-sized facilities within a healthcare system are unlikely to be sufficient to improve the accuracy of case duration predictions. Instead, to increase the accuracy by simply pooling case duration data, larger databases will be needed.

We focused on the sample size required to obtain at least three historical cases for each procedure. When estimating the average duration of a case, three historical cases is a sufficiently large number that variability in case duration is affected more by intrinsic variability in case duration for the procedure than by uncertainty in the value of the true mean. 11However, when predicting the shortest or longest times needed to complete a case (*e.g.* , for calculating optimal patient arrival times 4or scheduling breaks between cases, 6respectively), more cases are needed for the effect of uncertainty in parameter estimates, from the small sample size, to have a smaller effect than intrinsic variability in case duration. 4As such, our focus on obtaining three previous cases had the deliberate effect of underestimating the necessary size of a case duration database.

#### Inferential and Graphical Methods of Estimating the Number of Different Procedures

We know that surgeons schedule at least 26,829 different CPT codes and combinations of CPT codes because we observed that many. The true number of different procedures must be higher. To estimate the true diversity of procedures, we used statistical methods developed for other scientific fields. 15–21The less and more sophisticated methods estimated 90,368 and 96,497 different procedures, respectively (table 4). We do not know which one is closer to being correct, nor do we think that it matters. The point is that despite the large effort required to acquire 200,401 cases, we did not come close to observing all of the procedures.

We used the graphical method to address the same question: Was our sample size of 200,401 even close to being large enough to detect all surgical procedures? Clearly it was not, because if it had been, the histogram of the number of procedures with a certain number of observed cases (fig. 2) would have more closely resembled its true statistical distribution (fig. 3). 21

##### Alternative Strategies to Pooling Data among Facilities

The OR manager can take some steps to decrease the effect that uncommon procedures have on OR efficiency. We previously showed that when choosing the sequence of elective cases performed by the same surgeon in the same OR on the same day, the OR manager's primary nonmedical criterion could be to avoid limitations in equipment or personnel that would result in staffed but unused OR time. 7This approach not only serves to minimize hospital and anesthesia costs, but also to maximize surgeon and patient convenience. 7,23When there are no such restrictions, the criterion for sequencing a series of elective cases by the same surgeon on the same day could be to minimize the time patients wait at the surgical suite on the day of surgery. 4,7Scheduling cases with common procedures before a case of an uncommon procedure will decrease the expected difference between scheduled and actual case start times. This will benefit the patients and surgeons both.

###### Limitations

A limitation to our work is that we may have pooled data from the “wrong” four medical centers. Hypothetically, out of the thousands of different surgical facilities worldwide, we may have chosen the four facilities with the highest incidence of rare procedures. However, previous results from the National Survey of Ambulatory Surgery suggest that this did not happen. 12

Another limitation to our work is that we considered combinations of procedures to be new, unique procedures. We did this because we are not aware of algorithms that can accurately predict the time needed for a case that is a combination of procedures from the case duration of each of the procedures separately. If such a methodology could be developed, the sample sizes that we calculated could be decreased. However, the absence of current literature describing this approach does not reflect a lack of research effort, but the failure, to date, of that work.

The relatively high frequency of very uncommon procedures is not a “trick” of the CPT system. Procedures with easily identifiable analogs are generally not uncommon. For example, coronary artery bypass with three venous grafts (CPT code 33512) is not uncommon. Coronary artery bypass with four venous grafts (CPT 33513) is also not uncommon. These two procedures differ only in the last digit of the CPT code. Both procedures are less common than if the CPT system combined them in one code. Nevertheless, neither is so uncommon as to be absent. Otherwise, the CPT system would not distinguish between them.

## Conclusions

The lack of historical case duration data for scheduled surgical procedures is an important cause of inaccuracy in predicting case durations. For the strategy of simply pooling case duration data among facilities to provide data for almost all procedures, databases with millions of cases probably will be needed. Therefore, many hospitals would need to pool data if this strategy were used.