Lower prediction bounds (e.g., for fasting), upper prediction bounds (e.g., to schedule delays between sequential surgeons), comparisons of operating room (OR) times (e.g., when sequencing cases among ORs), and quantification of case uncertainty (e.g., for sequencing a surgeon's list of cases) can be done accurately for combinations of surgeon and scheduled procedure(s) by using historic OR times. The authors propose that when there are few or no historic data, the predictive distribution of the OR time of a future case be centered at the scheduled OR time, and its proportional uncertainty be based on that of other surgeons and procedures. When there are a moderate or large number of historic data, the historic data alone are used in the prediction. When there are a small number of historic data, a weighted combination is used.

This Bayesian method was tested with all 65,661 cases from a hospital.

Bayesian prediction bounds were accurate to within 2% (e.g., the 5% lower bounds exceeded 4.9% of the actual OR times). The predicted probability of one case taking longer than another was estimated to within 0.7%. When sequencing a surgeon's list of cases to reduce patient waiting past scheduled start times, both the scheduled OR time and the variability in historic OR times should be used together when assessing which cases should be done first.

The authors validated a practical way to calculate prediction bounds and compare the OR times of all cases, even those with few or no historic data for the surgeon and the scheduled procedure(s).

OPERATING room (OR) operational decisions on the day before and on the day of surgery rely on the uncertainty in the estimates of OR times, particularly those decisions that affect patient and surgeon waiting times.

For example, the decision as to when a patient should stop drinking and be ready on the day of surgery relies on the shortest expected time(s) for the preceding case(s) in the patient’s OR.1,,2 A lower prediction bound for the OR time of a case is the value that will be exceeded by the next randomly selected case of the same type at the specified rate. There is a 5% chance that the OR time of a case will be briefer than its 5% lower prediction bound.1,,2

For example, suppose that two surgeons are scheduled in the same OR on the same day, and the OR workday is not filled. To reduce the expected tardiness of the start of the second surgeon, a delay can be scheduled between the cases, thereby reducing the expected waiting time of the second surgeon if the first surgeon were to finish late. An appropriate delay relies on the longest time the cases are likely to take (*e.g.* , their 90% upper prediction bounds).1,,3

For example, a surgeon is doing the first case of the day in OR 1, and a different surgeon is doing the first case in OR 2. Several add-on cases have been submitted for the day, the longest of which needs the microscope being used by the surgeon in OR 1. There are no other cases in OR 2. The OR 2 is scheduled to finish before any other OR and has staffing planned for the same hours. To reduce expected overutilized OR time, the longest add-on case (*i.e.* , the one with the microscope) should be the add-on case performed in OR 2. That decision is good provided the case in OR 1 will finish before the first case of the day in OR 2. That probability depends not only on the expected OR time of the cases, but also on the uncertainty of the estimates.1,,4

For example, consider the sequencing of a surgeon’s list of cases in the same OR on the same day. To reduce the mean tardiness of cases, cases with short predictable OR times should be performed before cases that are longer or have large uncertainty in OR times.1,,5 A brief mediastinoscopy should be performed before esophagectomy. The unpredictability of the preceding case in an OR can be quantified by its probability of finishing more than 1 h late.

For these four and other common probabilistic problems, when there are historic data available for the surgeon and the procedure(s) scheduled, the previously developed statistical methods are very accurate. The percentages of cases exceeded by lower and upper prediction bounds match those specified within 1%. For example, if the 90% upper prediction bound for the OR time of a newly scheduled case is 3.0 h, the actual probability of that case having an OR time longer than 3.0 h is likely somewhere between 90% and 91%. The calculated probability of one case lasting longer than another case matches the actual probability within 1%. For example, if the calculated probability that one case will take longer than another case is 67%, the actual probability is likely somewhere between 66% and 68%.

However, these methods cannot be applied to the approximately 22% of cases of a procedure(s) that the surgeon has not scheduled at least twice before. This limitation is commonplace. Among outpatient cases in the United States with an anesthesia provider, 20% were of a procedure(s) performed four times or less per workday *nationwide* , and 36% of cases were of a procedure(s) likely performed an average of less often than once per facility per year. At private hospitals, there can be more than 5,600 surgeon preference cards (*i.e.* , at least 5,600 combinations of surgeon and procedure(s)). At academic hospitals, there can be more than 13,000 surgeon preference cards. More than 75% of procedure(s) can be performed just once or twice annually. More than half of cases can be of a procedure(s) scheduled by the surgeon less than three times per year.

The 22% of cases with few or no data disproportionately affect decision making under uncertainty. These cases tend to be distributed randomly among cases at facilities. If any one such case is part of a series of cases, then the time to complete that series of cases cannot be estimated using the previously described2,,3,,6,,4 methods. The more cases performed per OR per day, the less useful the existing science is.

Previous efforts to address this challenge of cases with few or no historic data focused on using data classified by procedure(s) alone, or with type of anesthesia.2,,12,,13 Despite some encouraging results,2,,12,,13 these methods have problems in implementation.

First, pooling procedure(s) among surgeons or facilities provides only modest gains, because the problem is generally not that a surgeon schedules a procedure(s) that usually is performed by another surgeon at the facility, but rather that many procedure(s) are rare.10,,14 There are thousands of Current Procedural Terminology or International Classification of Diseases’ procedure codes, and surgeons persistently produce new technologies.

Second, performing analyses with data both pooled and not pooled by surgeon can result in inconsistent decisions, making instructions for clerks confusing. For example, consider a mean of 3.0 h and 90% upper prediction bound of 5.5 h for 11 cases scheduled by 5 surgeons *versus* a mean of 2.0 h and 90% upper prediction bound of 4.0 h for 2 cases of the surgeon scheduling the new case. Qualitative instructions cannot reconcile such results.

Third, studies show consistently that the surgeon is a strong predictor of OR time, second only to the procedure(s). Some surgeons are consistently slower or faster than others. Therefore, excluding classification by surgeon appropriately reduces face validity of recommendations.

In this article, we propose and validate a solution: Bayesian statistical methods. When there are no historic data, the predictive distribution of the OR time of a future case is centered at its scheduled OR time. In addition, its proportional uncertainty is based on that of other surgeons and procedures. When there are a moderate or large number of historic data, those data alone are used in the prediction. When there are a small number of historic data, a weighted combination of the scheduled OR time and historic data are used.

## Methods

### Data Set Used

The data set included the OR times of all 65,661 cases performed between January 1, 1996, and December 31, 1999, at an academic hospital’s tertiary surgical suite and ambulatory surgery center. Procedures were as defined by the Current Procedural Terminology code(s). There were 19,838 different combinations of surgeon, scheduled procedure(s), and presence or absence of an anesthesia provider. We used data analyzed previously, so results2,,3,,6,,4,,10 can be compared. Scheduled (*vs.* actual) procedure code(s) were used, because for a future case for which a prediction bound is being calculated, only the scheduled procedure(s) would be known.

### Lower and Upper Prediction Bounds

Let the random variable *X ^{k}* refer to the natural logarithms of OR times classified by the

*k*th combination of surgeon, scheduled procedure(s), and anesthetic,

*k*= 1, 2, …,

*p*. For brevity, we henceforth refer to each of the

*p*combinations as “surgeon and procedure(s).” The

*n*previously observed (historic) OR times in hours for the

^{k}*k*th combination of surgeon and procedure(s) are exp(

*x*

^{k1}, exp(

*x*

^{k2}, … exp(

*x*

^{knk}. The sample mean of the

*n*historic data

^{k}*x*

^{k1},

*x*

^{k2}, …

*x*

^{knk}. equals x̄

^{k}, and the sample variance equals σ̂

^{2}

^{k }. The logarithm of the scheduled OR time in hours for the next case is

*xs**

^{k}. The objective is to predict the OR time of the next case

*x**. We use * to represent the next case throughout the article.

^{k}From 8,17 in the , the 5% lower Bayesian prediction bound equals

where

and is the *t* ^{0.05, 2a*k} percentile of the *t* distribution with α*^{k} degrees of freedom (*e.g.* , *t* ^{0.05, 2α*k}=−1.65 for large α*^{k}). The 90% upper bound is calculated with.

The Bayesian setup allows us to estimate prediction bounds () for cases with no or very few historic data. Furthermore, the setup shows how to combine optimally prior information with available historic data. The values of α, β, and τ are obtained (see ) by using all historic data, without regard to the combination of surgeon and procedure(s). The prior estimates of α and β are revised as data on *n* ^{k} OR times become available for the *k* th surgeon and procedure(s). 3,4 show how the sample variability revises the prior estimates of α and β, which affect the variance component in the predictive distribution (*i.e.* , the square root term in ). In addition, the parameter τ expresses how the prior (logarithm of the scheduled OR time)*xs* *^{k} and the historic mean OR time x̄^{k} should be combined. shows that has a substantial influence for *n* ^{k} < τ but that its influence diminishes with larger sample sizes. For large sample sizes, the Bayes approach of converges to the result previously reported and given in .

### Testing for the First New Case Taking Longer Than the Second New Case

The probability that one case will last longer than another is estimated using 18,19,20 in the .

### Probability That a New Case Will Finish More Than 1 h Late

From in the , the probability that a new case will finish more than *L* hours late equals

where *t* (2α*^{k})refers to the *t* distribution with 2α*^{k} degrees of freedom. For testing, we use *L* = 1 h.

### Mean Operating Room Time

From the posterior distributions in the and 2,3,4, the point estimate of the mean OR time can be estimated as

The mean is relevant economically for scheduling, because it is proportional to the total OR time.

### Testing the Accuracy and Usefulness of the Estimates and Predictions

To test the validity of lower prediction bounds from , several hundred thousand samples were taken with replacement from the population of 65,661 cases. For each sampled case, the following six steps were followed:

The logarithm of the scheduled OR time (

*xs**^{k}) was calculated.The case’s combination of surgeon and procedure(s) (

*i.e.*,*k*) was determined.All other cases of that same combination of surgeon and procedure(s) were determined and used as historic data. Therefore, equaled the number of cases in the data set of the

*k*th combination minus the one case selected at random.From the first and third steps, μ*

^{k}, α*^{k}, and β*^{k}were determined from 2,3,4, respectively.Using the posterior parameters from the fourth step, the prediction bound was estimated by applying .

If the actual OR time of the selected case was less than the value of the lower prediction bound from the fifth step, an indicator value was set equal to 1. Otherwise, the indicator value was set equal to 0.

The relative frequency of 1s and 0s from the sixth step was determined. The samples were drawn with replacement until the width of the 95% confidence interval for the proportion of cases exceeded by the prediction bounds was calculated to within 0.12%. The process was repeated for the upper prediction bounds. The process was also repeated after limiting the data to the 44,120 cases not used to estimate α, β, and τ in 2,3,4.

To evaluate the accuracy of the process of comparing two future OR times using 18,19,20, two cases were selected at random with replacement from the 65,661 cases. The case with the longest scheduled OR time was tested for the probability of its taking longer than the case with the shorter of the scheduled OR times. Whether the first case was truly longer than the second case was recorded as a 1 for yes and 0 for no. The process was repeated 2 million times. The results were stratified by the estimated probability rounded to the nearest 0.05. The average of the probabilities for each bin of a width of 0.05 was compared to a ratio, with the numerator being the sum of the count of occurrences for the first case truly being longer than the second case and the denominator being the number of simulations with entries in the bin.

To compare the mean absolute errors of scheduling a new case using either or the scheduled OR time, samples were drawn with replacement from the 65,661 cases. The process was continued until the mean difference had been estimated to within 0.1 min.

## Results

### Lower and Upper Prediction Bounds

The Bayesian prediction bounds were accurate to within 2% (see ). The 90% upper bounds were exceeded by 9.7% of the actual OR times (*i.e.* , were conservative by 0.3%). The 5% lower bounds exceeded 4.9% of the actual OR times. Excluding the 33% of cases used to estimate the parameters (*i.e.* , only studying surgeon and procedure(s) combinations with fewer than 29 historic cases), the 90% upper bound accuracy was 10.2%, and the 5% lower bound accuracy was 5.5%.

A heuristic for upper and lower prediction bounds is to add or subtract half of the scheduled OR time. The heuristic was reasonable for the studied hospital because, in the absence of historic data for the combination of surgeon and procedure(s), the Bayesian 90% upper prediction bound equaled 149% of the scheduled OR time and the 5% lower prediction bound equaled 57% of the scheduled OR time. Use of this heuristic would reduce patient and surgeon waiting *versus* trying to use the scheduled OR time itself plus or minus some value (*e.g.* , 1.5 h). Nevertheless, it would be a poor choice to skip use of the historic data and instead simply take a percentage of the scheduled OR time. shows that the variance differs among combinations of surgeon and procedure(s). Using the same percentages for all combinations results in an overall accurate coverage rate, but a rate that is too high or low for many combinations. shows this result in a histogram of the ratio of the 90% upper Bayesian prediction bound to the scheduled OR time for all cases.

### Testing for the First New Case Taking Longer Than the Second New Case

The Bayesian method was used to compare the OR times of 100% of the pairwise combinations of cases. The predicted probability of one case being longer than another was estimated to within 0.7% (). This means, for example, that if the predicted probability is 67%, likely the actual probability is also close to 67%.

### Probability That a New Case Will Finish at Least 1 h Late

A heuristic to sequence a surgeon’s list of cases to reduce waiting past scheduled start times (tardiness) is to schedule the shortest case(s) first.1,,5 Theoretical justification for use of this heuristic for ORs has not previously been described. When historic data are absent or ignored, in 2,3,4. shows that, then, each increase in the scheduled OR time results in an increase in the probability that the case will finish more than 1 h late. shows the result graphically. The equations apply for any other interval desired (*e.g.* , case finishing 15 min late).

The most common scheduled OR time in the studied data was 2.5 h. All such cases were considered. Historic data were ignored (*i.e.* , set equal to 0, while our estimated values for α and β were used). The predicted percentage of cases finishing more than 1 h late was 13%. The actual percentage was 15% ().

When historic data are available, they can be used to make a Bayesian prediction of the probability of a case finishing more than 1 h late. shows that different combinations of surgeon and procedure(s) have different variances. Consequently, among cases with the same scheduled OR time, depending on the historic data, there can be a wide range of Bayesian predicted probabilities that each case will finish more than 1 h late (, vertical axis). Actual proportions accurately track these observed probabilities, as does the management endpoint of tardiness (0 if finishes early, otherwise minutes finished late).

The implication is that for sequencing surgical cases, historic OR times should be used when they are available, in addition to the scheduled OR times.

### Mean Operating Room Time

The above results apply to decision making under uncertainty. When scheduling an OR case based on the efficiency of use of OR time, the mean OR time is relevant.1,,17,,18

The Bayesian estimate of the mean in and the scheduled OR times were compared with the actual OR time. With respect to bias, whereas the Bayesian method underestimated the actual OR time by an average of 3.2 min, the scheduled OR time underestimated it by an average of 8.6 min. With respect to precision, the mean absolute error of the Bayesian estimate was 3.0 min less than that of the scheduled OR times. Although the Bayesian method was significantly (*P* < 0.0001) more accurate on both criteria, the differences were too small to be of practical advantage when focusing on the mean OR time.

## Discussion

We proposed and validated a practical way to calculate prediction bounds and compare the OR times of cases, even when there are few or no historic data for the surgeon and the scheduled procedure(s). As summarized in the introduction, the latter situation arises frequently in both tertiary surgical suites and outpatient surgery centers. Cases without data have a disproportionately large impact on OR management decision making, because any one case without data previously meant that no data-driven recommendation could be obtained. The Bayesian method can be used for (1) deciding when patients should be ready on the day of surgery, (2) adding or filling holes in the OR schedule, (3) sequencing of cases when faced with a constraining resource (*e.g.* , the surgeon or an expensive piece of equipment), and (4) sequencing a surgeon’s list of cases.

We also applied our Bayesian formulation to study how and when historic data can improve OR management decision making. When historic data are available, they should be used in combination with (not in lieu of) the scheduled OR time. The results show that this is not because the historic data substantially improve the estimate for the average OR time. Rather, historic data provide value in estimating the proportional variation in OR time.

### Benefits

The benefit of the Bayesian method is in reducing physician and patient waiting, but probably not a direct reduction in overutilized OR time. The OR management decisions aimed at reducing expected overutilized OR time include scheduling cases before the day of surgery, scheduling add-on cases, and releasing allocated OR time.1,,17,,18 Provided OR staffing and allocations have been made based on finding the optimal balance of underutilized and overutilized OR time, such decisions can then reasonably be made using the expected mean OR time of each case.1,,17,,18 When there are at least two historic cases of the same combination of surgeon and procedure(s), the incremental reduction in overutilized OR time has been shown to be negligible from increasing the accuracy of the prediction of the OR time of a new case *versus* using the mean of the OR times of the historic cases. The reason is that decisions to reduce overutilized OR time are affected much more by the day-to-day variability in the total hours of cases than by variability in OR time prediction. People generally work late because of extra cases, not underestimation in the time to complete each case, because rarely does the underestimation change whether and when the case is performed. In the current article, we show that the scheduled OR time alone is nearly as good a predictor of the expected mean OR time of a new case as is the Bayesian method. Therefore, the incremental reduction in overutilized OR time from further improvements in predicting OR times is likely negligible.

Several features of the Bayesian method can be important for implementation. First, because the scheduled OR time affects *whether* a case is scheduled into an OR allocation on a specific day,1,,18 some surgeons resist reduction in their autonomy in choosing the value. Such organizational resistance to change is irrelevant to the Bayesian method, because the scheduled OR time alone can still be used for that purpose. Second, the premise of relying on expert judgment when there are few data *versus* predominantly historic data when there are many data has face validity to hospital audiences without statistical training. Third, the information system using the Bayesian method can always provide an estimate or decision recommendation, eliminating the need to educate scores of clerks, physicians, and nurses as to the interpretation of absence. Fourth, 2,3,4 show that the information system need not use the raw data for calculation, only the sample size, mean, and variance (*i.e.* , one record for each combination of surgeon and procedure(s)). Finally, we speculate about the advantage of combining the Bayesian method with new real-time methods of collecting OR time data (*e.g.* , from vital signs). An information system can now function autonomously, providing recommendations and progressively increasing its usefulness as its historic data increases.

### Limitations

The Bayesian method relies on the logarithms of the scheduled OR times not being consistent underestimates of the logarithms of OR times. At some facilities, surgeons and schedulers may systematically underestimate OR times to get cases onto the OR schedule. Without adjustment, the Bayesian method may be inaccurate, and the argument of the preceding paragraph will be incorrect. Bias can be monitored and incorporated into the equations in the Methods simply by changing each listed to *xs* *^{k} to (*xs* *^{k}+Δ), where Δ is the overall or specialty-specific proportional bias. Feedback can alternatively be provided immediately at the time when the case is scheduled if the Bayesian estimate for the expected mean OR time differs substantially from the scheduled OR time.

Our work is limited in providing one Bayesian method, in no way necessarily the best. We only considered statistical methods that require, in practice, no mathematical calculation other than some arithmetic and exponentials. That way, the method (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) can be implemented as an SQL database query and/or run from a Web page. The latter is what we have been doing in practice. Bayesian methods with different distributional assumptions generally have computational solutions requiring numerical integration, which we have found challenging for widespread hospital implementation.

#### Appendix

##### Previous Studies Based on Assumption of Log Normal Distribution

Assume that *x* ^{k}, the natural logarithm of OR time, follows a normal distribution:

where is the unknown variance. This assumption has been validated previously.13,,22 For example, shows the natural logarithms of OR times for a surgeon’s 105 strabismus surgery cases with one muscle (chi-square test of normality, *P* = 0.71; Lilliefors test of normality, *P* = 0.69). shows the corresponding probability plot. 5,,6 of Dexter and Traub show data for laparoscopic cholecystectomy, and dependence of the skewness of these distributions on.

From the assumption of , the 5% lower prediction bound for the OR time of a new case, that is for exp(*X* *^{k}, can be obtained accurately1,,2 by taking

where x̄^{k} and σ̂^{2}^{k } are the sample estimates from the *n* ^{k} historic data. The 90% upper bound is calculated by *t* ^{0.90.nk-1}substituting into (*e.g.* , *t* ^{0.90.nk-1}= 1.28 for large *n* ^{k}).1,,3,,6 If *n* ^{k} < 2, the prediction bounds σ̂^{2}^{k } cannot be calculated, because cannot be estimated from 0 or 1 cases. Furthermore, although the prediction bounds are valid for small *n* ^{k},2,,3,,6 they are often not useful being so wide.

From the assumption of , the comparison of the OR times of two new cases is addressed by the Behrens-Fisher problem (*i.e.* , Student *t* test with unequal variances).1,,4,,23 We refer to the first new case with subscript 1 and the second new case with subscript 2, whichever combinations of surgeons and procedures(s) they are from. To test for the first new case being longer than the second new case, that is for exp(*X* *^{1}) ≥ exp(*X* *^{2}) let

and

Then,

where

Because the values of *df* are not integers, we calculate the probabilities of the *t* distribution from its relation with the Incomplete Beta Function. Press *et al.* provide the computer code. The primary limitation to 9,10,11,12 is that to calculate σ̂^{2}^{1} and σ̂^{2}^{2} and *n* ^{1}≥ 2 and *n* ^{2}≥ 2.

##### Two Additional Assumptions for Inference and Prediction in the Absence of Historic Data

Suppose that no surgeon- and procedure(s)-specific historic OR time data are available. To predict the OR time of a new case and to calculate the prediction bounds, we make two assumptions, one about the prior distribution of σ̂^{2}^{k } and the other about *M* ^{k.}

Assume that σ̂^{2}^{k } follows an inverse gamma distribution:

where α and β are unknown parameters common to all combinations of surgeons and procedure(s) at the facility. Following Strum *et al.* 13,,22 in their investigations of the statistical distributions of OR times, we considered those combinations of surgeon and procedure(s) that were performed a moderate to large (*n* ^{k}≥ 30) number of times. There were 302 such combinations of surgeon and procedure(s) corresponding to 21,541 cases. The probability plot in shows that the assumption of the inverse gamma distribution is reasonable (chi-square test, *P* = 0.51; Kolmogorov-Smirnov, *P* = 0.88).

Assume that the conditional prior distribution of *M* ^{k} given ς^{k } is normal with prior mean μ^{k } and variance σ̂^{2}^{k }/τ:

The validity of is considered in the next section. The τ represents the ratio of the variance of an individual observation to the variance of the mean. If τ were a small value (*e.g.* , τ= 1 historic case), our prior information about *M* ^{k} would be vague, making the incremental value of any historic data relatively influential.

A consequence of 7,14 together is that the conditional prior distribution of *X* ^{k } given ς^{k } is normal with prior mean μ^{k } and variance ς^{k }^{2}+ (ς^{k }^{2}/τ). Combining terms, the variance equals ς^{k }^{2}·((τ+1)/τ). Bringing in the inverse gamma distribution for ς^{k }^{2} from , the predictive prior distribution of the logarithm of the OR time of the next case, *X* ^{k }*, is a scaled Student *t* distribution:

##### Prior Values for Use in Inference and Prediction in the Absence of Historic Data

We obtained the prior values for α and β using the data in . Using the method of moments, the corresponding parameters of the inverse gamma distribution were α= 2.32 and β= 0.142.

We considered the logarithm of the scheduled OR time of the new case, *xs* *^{k}, to be the prior value for μ^{k}, and the prior prediction of *xs* *^{k}. shows a histogram of the prediction errors, (*xs* *^{k}–*xs* *^{k}), for the 18,381 cases for which *n* ^{k}≥ 2. The symmetric distribution around zero confirms that *xs* *^{k} is a good prior value, because it provides an unbiased prediction of the logarithm of the actual OR time in hours. This implies that exp(*xs* *^{k}), provides an unbiased prediction of the 50th percentile of (*xs* *^{k}), which is slightly less than the expected (mean) value of (*xs* *^{k}). We would therefore expect that the scheduled OR time would slightly, but significantly, underestimate the actual OR time. This was the finding reported in the final paragraph of the Results. In those analyses, the expected (mean) value averaged the 52nd percentile of (*xs* *^{k}), with lower and upper quartiles of 51% and 52%, respectively.

We estimated the prior value for τ from the variance of the prediction errors. Using the 21,541 cases for which we estimated α and β above, *Var* (*xs* *^{k}–*xs* *^{k}) = 0.120. The variance of a *t* distribution with 2α degrees of freedom equals /(2α– 2). From the result of ,

Substitution of the prior values for α= 2.32 and β= 0.142 from two paragraphs above, cases.

The assumption in was validated by studying the distribution of (x̄^{k}–*xs* *^{k}), with x̄^{k} and *xs* *^{k} taking the places of *M* ^{k} and μ^{k}, respectively. When the variances from were rounded to the nearest 0.05, the interval 0.05 ≤σ̂^{2}^{k } < 0.10 included the most cases, 10,562, as well as many (156) different values of *k* . shows a normal probability plot for The functional (normal) form of is reasonable. From Strum *et al.* , the undulation around the straight line in was the expected consequence of our data’s *xs* *^{k} being in 15-min intervals, resulting in (x̄–*xs* *^{k}) taking a small set of values.

##### Updating the Estimates and Predictions Using Historic Data

Our prior distributions in 13,14 have the advantage that for any sample size and for any values of the observations sampled from the model of , the posterior distributions of σ̂^{2}^{k } and *M* ^{k} belong to the same family. The Bayesian literature refers to such prior distributions as conjugate priors. Specifically, the posterior distribution of σ̂^{2}^{k } is again an inverse gamma distribution. The posterior of *M* ^{k} and the posterior predictive distribution of *x* *^{k} are scaled *t* distributions, with revised parameters that update the prior values with the sample information:

with parameters given in 2,3,4.

From the posterior predictive distribution of , 9,10,11,12 are modified to provide the Bayesian comparison of the OR times of two cases:

and

where

##### Assessment of Sensitivity of Estimates to Prior Values of α, β, and τ

Using only cases from 1996 and 1997, there were 124 combinations of surgeon and procedure(s) with *n* ^{k}≥ 30 *versus* 302 combinations with all the data. The estimated α= 1.70 and β= 0.089, *versus* α= 2.32 and β= 0.142 with all the data. From these 7,405 cases, τ= 14.9 cases, *versus* from all 21,541 cases, τ= 8.68 cases. Applying the α, β, and τ from 1996 and 1997 to all of the data, the 90% upper bounds were exceeded by 10.0% of cases. The 5% lower bounds exceeded 5.1% of cases. Therefore, the Bayesian method was insensitive to the choice of α, β, and τ. Furthermore, repeating with τ= 1.49 cases (an order of magnitude less), the results were the same, within 0.2%.

In the next set of calculations, we assumed that the prior values are updated annually (*i.e.* , 1996–1997 prior values were applied to 1998 data, and 1996–1998 prior values were applied to 1999 data). We assumed that the lookup table with the running totals of *n* ^{k}, ∑*x* ^{ki }, and ∑*x* ^{ki }^{2} for each combination of surgeon and procedure(s), for use in 2,3,4, is updated nightly. For example, to estimate prediction bounds for cases performed on Tuesday August 4, 1998, the running totals used were for January 1, 1996, through August 3, 1998. Then, the calculated 90% upper bounds were exceeded by 8.0% ± 0.1% of cases (n = 32,930). The 5% lower bounds exceeded 6.9% ± 0.1% of cases. Repeating using τ= 1.49 cases, results were the same, within 0.2%. Therefore, improvement in accuracy would rely on modifying the assumption of of a common α and β for those combinations of surgeon and procedure(s) with little to no data, not those combinations with small to large numbers of historic cases. Those cases with little to no data are precisely those for which α and β cannot be estimated directly.