Abstract
Daily scheduling of surgical operations is a complicated and recurrent problem in the literature on health care optimization. In this study, we present an often overlooked approach to this problem that incorporates a rolling and overlapping planning horizon. The basis of our modeling approach is a Markov decision process, where patients are scheduled to a date and room on a daily basis. Acknowledging that both state and action space are only partially observable, we employ our model using a simulationbased method, where actions are derived from a heuristic search procedure.
We test the potential of using this modeling approach on the resulting hospital costs and number of patients that are outsourced to avoid violating constraints on capacity. Using data from a Danish hospital, we find a distinct improvement in performance when compared with a policy that resembles a manual planner. Further analysis shows that substantial improvements can be attained by employing other simple policies.
Introduction
Surgical procedures are one of the key elements of a hospital and involves many different clinical specializations from organ to orthopedic surgery.
According to statistics from the Organisation for economic cooperation and development [14], the number of surgical procedures is increasing relative to the population size across several countries. Australia, Canada, Finland, and the UK have experienced an increase of roughly 50% over the past 20 years. For Austria and Denmark, the increase is roughly 100%, and in Portugal, the increase is roughly 350% over the same period.
It seems reasonable that the use of resources in this part of the hospital has received a substantial amount of attention in the Danish health care sector. In September 2015, the Danish Ministry of Health [22] published a report on the overall status of the public health care sector, showing that waiting time for surgery has been increasing for a fourth of the investigated departments in the period from 2011 to 2014. Other governmental reports suggest a lack of resource utilization for Operating Theatres (OTs) as well. In March 2015, the National Audit Office of Denmark [21] published a report on the use of staff resources based on four departments in orthopedic surgery with the conclusion that staff working hours are not ensured to be fully utilized for a majority of cases.
On the other hand, ensuring an efficient utilization of surgical resources can be quite intractable. To attain an efficient use of both staff and equipment resources, several decisions on multiple organizational levels have to be considered [15, 18]. These range from longhorizon planning problems, such as deciding on the overall required capacity, to daytoday scheduling (and rescheduling) of patients.
From interviews, we found that manual planners have a timeconsuming and complicated task at hand. In our case hospital, planners have to ensure that equipment in the room is compatible with the surgical procedure. Simultaneously, planners must ensure that patients are treated by the same surgeon they were examined by and that the waiting time does not violate a hard upper limit. Therefore, including overtimecosts and capacity efficiency yields an impractical if not impossible task for the manual planner.
Our objective in this study is to provide hospital planners with a decision tool capable of optimizing the scheduling of patients for operation, respecting the constraints that are relevant to the hospital. In our case, we consider that patients are scheduled on a daytoday basis and require that a rolling and overlapping planning horizon is taken into account. Thus, the decisions that are made on each day have to be anticipative.
Our methodological approach will be a simulationbased Markov Decision Process (MDP) that minimizes the longterm costs of overtime and setting up operating rooms.
In Section 2, we describe the scheduling problem in details. Next, in Section 3, we present our simulationbased MDP and describe how an allocation of patients can be derived using this approach. In Section 4, we apply our approach to data from a Danish hospital and assess our MDP implementation by comparing with other scheduling methods. Finally, we present our conclusion and suggestions for future work in Section 5.
Literature Review
The problem of OT planning is a recurrent topic that has been covered by a substantial amount of papers. There exists several surveys on the subject of which some of the recent have been conducted by Cardoen et al. [5], Guerriero and Guido [15], May et al. [18], and lately Samudra et al. [25] where 137 journal papers on the subject of OT planning were found in the period of 2004 to 2014.
With respect to the organizational decision levels, Guerriero and Guido [15] find that the studies can be classified into three categories: Strategic (longterm decisions), tactical (mediumterm decisions), and operational (shortterm decisions). May et al. [18] add further three decision levels denoted: Very longterm, very shortterm, and contemporaneous. The decisions relevant to the very longterm are related to the layout of physical resources [32], such as the construction of operating rooms. Longterm decisions are related to patient flow patterns and assigning overall capacity to surgical groups [3, 29]. Mediumterm decisions involve defining the master surgical schedule, where the clinical specializations are assigned to specific rooms and timewindows [31]. On the shortterm, the patient procedures are assigned to a specific time and room on a daytoday basis, and on the very shortterm and contemporaneous level, lastminute scheduling and rescheduling are conducted [1, 4, 9,10,11,12, 17, 20, 23, 27].
Focusing on the shortterm operational level of OT planning, we have found a range of different approaches and problem structures. Studies can mainly be categorized into considering completely deterministic “offline” problems [4, 6, 12, 30, 33], to incorporating uncertainty features such as random procedure time [1, 9, 17] and disruptions caused by emergency demand [11, 17]. Surprisingly, we only encountered two studies on the allocation of patients where stochastic future arrivals were accounted for [23, 34]. However, Samudra et al. [25] show that incorporating stochasticity constitutes more than half of the papers on OT planning.
The specific modeling approaches of shortterm OT planning range from mathematical programming and heuristics [1, 4, 9, 11, 12, 30, 33] to Discreteevent and Monte Carlo simulation [10, 27], and further to a mixture of these [17]. For the purely deterministic cases, Xiang et al. [33] and Van Huele and Vanhoucke [30] combine the surgical scheduling problem with a staff rostering problem. Xiang et al. [33] develop a modified Ant Colony Optimization algorithm and test the model by using both data from the literature and real data from a Chinese hospital. Van Huele and Vanhoucke [30] approach the problem by using Mixed Integer Linear Programming (MILP) based on the most frequent objectives and constraints from the literature. In Fei et al. [12] and Cardoen et al. [4], the focus is more on the scheduling and sequencing of the surgical procedures. Fei et al. [12] use an approach comprising two phases where patients are firstly assigned a date by using a columngenerationbased heuristic, and subsequently sequenced by using a hybrid genetic algorithm. Cardoen et al. [4] focus on the sequencing of procedures and develop MILP models which lead to either exact or heuristic solutions.
For the studies that incorporate uncertainty, Batun et al. [1] and Lamiri et al. [17] use Stochastic Programming (SP) to minimize the total cost of scheduling patients over a planning horizon. Specifically, Batun et al. [1] develop a twostage stochastic MILP and investigate the impact of parallel surgery processing and pooling operating rooms. Related hereto, Lamiri et al. [17] develop an SP model, and moreover a method combining Monte Carlo simulation and a MIP model to schedule elective patients within a specific planning horizon, and emergency patients on the same day of arrival.
Methods based on MILP modeling can in some cases become too inefficient as found by Erdem et al. [11], where a MILP model and Genetic Algorithm (GA) are developed to reschedule elective patients upon the arrival of emergency patients. For the MILP model, Erdem et al. [11] find that a commercial solver is sufficient for only a limited “light” case, and therefore develops a GA to find solutions close to optimality for the more complex cases. In addition, Denton et al. [9] focus on heuristic methods for deriving the sequencing of patients in operating rooms and find that a simple sequencing rule can be used to optimize both waiting time and overtimecosts.
As the above shows, random duration of procedures and the impact from emergency arrivals draws a lot of attention, but there is limited focus on overlapping planning horizons originating from uncertain future arrivals. However, Range et al. [23] schedule elective patients based on a MILP model and solve the problem with column generation. Future arrivals are accounted for by measuring the expected number of future patients who cannot be scheduled for surgery.
Another study that accounts for uncertain future arrivals is Zhang et al. [34]. Similar to the approach by Fei et al., they use a model that consists of two phases. In the first phase, future decisions are incorporated by selecting unscheduled patients with an MDP that minimizes the expected longterm costs. In the second phase, the selected patients are finally assigned to the respective surgical blocks with an SP model.
In this study, we also consider the problem of allocating patients to a day and room as a sequential decision problem with overlapping planning horizons. We assume that surgical operations can begin at any time within the opening hours of the operating room and even stretch into overtime. Our model is a heuristic approach that accounts for random interarrival times, and procedure duration, and is based on a simulationbased MDP that derives an allocation of patients in one phase by minimizing the combined longterm costs of overtime and setting up operating rooms.
Problem Description
Finding a good schedule yields multiple objectives for the hospital planner. Long waiting times and overcrowding of wards may cause a decrease in both subjective and objective care quality as was found by Hansagi et al. [16] and McMillan et al. [19]. To make things even more difficult, planners in our case hospital are required to keep the patient waiting time within a hard upper limit, where insufficient operating room capacity leads to expensive overtimecosts. Outsourcing patients to other units is one way of avoiding these problems but comes with qualitative and logistical costs.
In this study, we consider a hospital where a planner schedules surgical operations on a daily basis. The hospital treats both elective and emergency patients in a range of different clinical specializations, but operating rooms are reserved for each patient type and for each respective area of specialization.
In contrast to other studies, such as Erdem et al. [11], we assume that emergency resources are rarely insufficient so that we may limit our scope to the scheduling of elective patients only. We further focus on a single clinical specialization and assume that resources are negligibly shared with other specializations.
Patients have continuous random interarrival time, but we assume that the hospital planner can postpone allocating the patients until the end of the day. Procedure requests from elective patients occur only on regular workdays; hence, the hospital planner has to make a maximum of five decisions a week.
We assume that all patients have an upper limit on waiting time from the moment the surgical operation is requested. Thus, due to uncertainty in procedure duration and interarrival times, capacity may in some instances be insufficient so that patients have to be outsourced to an internal or external treatment unit.
An overview of the scheduling problem is depicted in Fig. 1.
For the remaining of this paper, we will refer to scheduled surgical operations as procedures. Unscheduled procedures are referred to as requests. Days on which the number of requests are positive, such that the hospital planner must decide on an allocation of these, will be referred to as allocation epochs.
Constraints and Dynamics of the Problem
Our aim is to derive a costoptimized schedule for all requests that have arrived during the day, repeating this process for all future days. When a decision is made, the hospital planner considers a discrete planning horizon of total length, \(H \in \mathbb {N}\), such that from the end of the current allocation epoch, \(t \in \mathbb {Z}\), all days that are considered in the scheduling problem are \(t+1,t+2,\dots ,t+H1,t+H\).
Let \(X \in \mathbb {N}_{0}\) be a random variable defining the total amount of requests received on an arbitrary day. Then for all days where X > 0, a scheduling problem has to be solved with a planning horizon that has been “rolled” accordingly. Let \(\delta \in \mathbb {Z}_{>t}\) define the subsequent allocation epoch to t. Further, let Ω_{i}, where \(\lvert {\Omega }_{i} \rvert = H\), define the specific set of days contained in the planning horizon of an allocation epoch, i. Hence, if δ < t + H, then Ω_{t} ∩ Ω_{δ} ≠ ∅, as illustrated in Fig. 2.
Let R define a finite set of operating rooms available to the hospital, then the planner has to make a decision involving both the finite and discrete planning horizon, and the operating room resources in R. The feasibility in scheduling a procedure for a specific room, r ∈ R, depends on a predefined surgical schedule as well as other constraints which are presented in the following Section 2.1.1. Furthermore, all allocations may induce a cost from setting up the room or when procedures stretch into overtime. Our assumptions related to these costs will be presented in Section 2.1.2.
Constraints
Constraints relevant to the scheduling problem range from the availability of predefined capacity to less tangible factors such as preferences of the staff. In the below, we present each of these constraints separately.

1.
Number of rooms. The hospital planner can choose to allocate a request to an unopened room provided that this action does not violate an upper limit on open rooms. Let y_{kl} ∈ {0, 1} be 1 if room k ∈ R is being used on day l ∈ T, where T = {t + 1, ⋯ , t + H} is the set of workdays within the current planning horizon; and otherwise 0. Further, let \(c_{l} \in \mathbb {N}\) define the maximum number of rooms that is allowed to be opened on day l ∈ T. We assume that the structure of c_{l} is weekly cyclical such that c_{l} = c_{l+ 5}. Then, an allocation to a room k ∈ R on day l ∈ T is only allowed if subsequently \({\sum }_{k \in R} y_{kl} \leq c_{l}\).

2.
Equipment. Any procedure cannot be allocated to any room, even if c_{l} is not violated. To account for potential equipment requirements, as well as other preferences that may exist, each procedure type i ∈ P, where P is the set of all procedures that may occur, is constrained to a subset, U_{i}, of the available rooms, such that \(U_{i} \subseteq R\).

3.
Physicians. When a request is received by the hospital planner, a specific physician has already been assigned to conduct the procedure. We assume that physicianrosters are not flexible so requests can only be allocated to days for which the physicians are expected to be available at the hospital. Let J define the set of scenarios (or patterns) of days for which physicians will be available. As T is always a finite set, so is J. We assume that a request is randomly assigned to a specific pattern j ∈ J with known probability.

4.
Opening hours. Lastly, all operating rooms have a prespecified time interval for which they are expected to be open. Let \(Y_{i} \in \mathbb {R}_{>0}\) be a random variable with known distribution that defines the duration of a procedure type, i ∈ P. Furthermore, let \(r_{ikl} \in \mathbb {N}_{0}\) define the number of procedure i ∈ P that are allocated to room k ∈ R on day l ∈ T. Then, an allocation to a room k ∈ R on day l ∈ T is only allowed if there exists at least one sequence such that all procedures are expected to start within the opening hours. That is, \({\sum }_{i \in P \setminus \alpha } (r_{ikl} \cdot E[Y_{i}]) + ({\sum }_{i \in P} (r_{ikl})  1) \cdot m < w_{k}\), where \(m \in \mathbb {R}_{>0}\) is a fixed buffer time, \(w_{k} \in \mathbb {R}_{>0}\) is the timecapacity of room k ∈ R, and α is the allocated procedure with the longest expected duration for that room and day.
We assume that all allocations are final such that each respective procedure is locked in both room and date. Further, as neither of the above constraints are allowed to be violated, and the occurrence of requests is independent from the current schedule, we allow for requests to be outsourced to yield a feasible solution with a maximum number of allocations. In this regard, we assume that allocating all current and future requests is always preferred over outsourcing any of them.
Costs
In combining a suitable schedule, the hospital planner has to consider that there might be a number of implications related to each respective solution. We found from interviews as well as from other studies [27] that some hospitals assess their performance on the utilization of time capacity for each operating room. Such measure is convenient with respect to daytoday monitoring and obtaining sufficient data, but does not provide an immediate relation between setting up new rooms and the risk of stretching procedures into overtime. For this reason, we evaluate the implications related to a specific schedule on a sum of “penalties.” We refer to these as costs as we mainly relate them to direct costs, such as overtime, cleaning, and setting up equipment. We have categorized these costs into two respective groups, presented below:

1.
Setup. To account for the logistical costs related to equipment and staff preparation, we assume that by opening a room, the hospital receives a fixed setup cost. That is, the setup cost is induced only when the first procedures are allocated to the room and does otherwise not depend on the utilization of time capacity.

2.
Overtime. As mentioned earlier, all procedures are subject to a random duration and thus are in risk of stretching into overtime. If this is the case, we assume the hospital always pays a supplement to the staff independent of the type of procedure. In addition, some amount of discontent may arise among the staff leading to more errors and a decrease in the treatment quality. As a result, we notice that the total penalty related to overtime can be a nonlinear increasing function of the duration of overtime.
Modeling and Solution Approach
In this section, we present the approach we use to minimize the longterm expected costs of scheduling requests for operation. Our modeling approach is based on a Markov Decision Process (MDP) framework, for which, due to the problem size, we propose a simulationbased “online” solution method.
In Section 3.1, we present the specific structure of our modeling approach along with an exact solution method from standard theory. Next, in Section 3.2, we present our solution approach which is based on a simulationbased rollout method resulting in a heuristic policy.
A Markov Decision Process
Now, recall that we consider a finite set of procedures, P = {ProcedureA,ProcedureB,⋯ }. Any procedure, i ∈ P, is to be conducted within a fixed planning horizon, \(H \in \mathbb {N}\), such that the set of future workdays in the planning horizon are in the set T = {t + 1,t + 2, ⋯ , t + H}, where \(t \in \mathbb {Z}\) is the day from which the planning horizon is observed. Further, let R = {Room A,Room B,⋯ } define the total set of available operating rooms, and \(r_{ikl} \in \mathbb {N}_{0}\) define the number of procedure i ∈ P that have been scheduled on future day l ∈ T in room k ∈ R.
In addition, we consider a finite set of all patterns for which physicians can be available within the planning horizon, J = {Pattern A,Pattern B,⋯ }. Together with P, these availabilitypatterns make up all of the attributes of any request that may occur. In other words, for any current day, let \(p_{ij} \in \mathbb {N}_{0}\) specify the number of requests of type i ∈ P that are constrained by pattern j ∈ J. Lastly, let W = {Monday,Tuesday, ⋯ , Friday} define the set of weekdays for which procedures can be allocated, and d ∈ W, then based on the above definitions, we introduce an MDP with state definition,
divided into three parts: (1) The number and attributes of all current requests, p_{ij}, (2) the amount of each procedure scheduled to future day and room, r_{ikl}, and (3) the current weekday, d. Notice that d can be redundant depending on the structure of the problem from one case to another. If the constraints on room capacity, c_{l}, and availabilitypatterns, J, can be generalized such that they are independent on the type of weekday in l ∈ T, then the state definition can be reduced to s = [p_{ij},r_{ikl}].
Furthermore, the reader should notice that the value of p_{ij} is generated by a purely stochastic process, whereas the transition into a state with any value of r_{ikl} will always be deterministic in terms of the decision by the planner. Now, let λ_{i} define the stationary occurrence rate of requests of type i ∈ P, and \(X_{ij} \in \mathbb {N}_{0}\) be a random variable defining the occurrence of a request i ∈ P constrained by pattern j ∈ J. Then the requests, X_{ij}, are generated according to a multivariate Poisson process with parameters λ_{ij} = λ_{i}ξ_{ij} ∀i,j ∈ P,J. Here, \(\xi _{ij} \in \mathbb {R}_{0<\xi _{ij} \leq 1}\) is the probability that a request of type i ∈ P is constrained to pattern j ∈ J; hence, \({\sum }_{j \in J} \xi _{ij} = 1 \text {\quad } \forall i \in P\). The assumption that requests are generated by a Poisson process was found adequate by Spratt et al. [26].
In the following, we present how this modeling approach relates to the action space and transitions of the MDP.
Actions and Transitions
From one day to the next, the MDP changes from a current state s ∈ S to a new state s^{∗}∈ S. This transition occurs consistently and with fixed time interval. In addition, for each transition, an action has to be chosen from the action space, A_{s}, available at each decision epoch—that is, at the end of every day, where the planner must decide on an allocation of the requests. Let π define a policy such that for any s ∈ S, π(s) = a, where a ∈ A_{s}. Thus, for any arbitrary policy π ∈π, where π is the set of all policies, the MDP will evolve as a Markov chain in discrete time.
Let a be a vector of the elements \(a_{ijkl} \in \mathbb {N}_{0}\), defining the number of requests of type i ∈ P constrained by pattern j ∈ J that are allocated to room k ∈ R on future day l ∈ T. To account for the outsourcing of requests, we further extend a with the elements \(q_{ij} \in \mathbb {N}_{0}\), defining the number of type i ∈ P and pattern j ∈ J that are outsourced. Thus, a has a total of \(\lvert P \times J \times R \times T \rvert + \lvert P \times J \rvert \) elements. The size of A_{s} is, however, dependent on the values of r_{ikl} in the state s, which is limited by the constraints presented in Section 2.1.1. A_{s} contains any feasible value of a; hence, \(1 \leq \lvert A_{s} \rvert \leq (\lvert R \times T \rvert )^{{\sum }_{i,j \in P,J}p_{ij}}\).
Notice that \({\sum }_{k,l \in R,T} a_{ijkl} + q_{ij} = p_{ij} \text {\quad } \forall i,j \in P,J\), and as the planning horizon is rolling \({r}_{ikl}^{s} + {\sum }_{j \in J} a_{ijkl} = {r}_{ik,l1}^{s^{*}} \text {\quad } \forall i,k,l \in P,R,T\setminus \{ t+1 \}\), where \({r}_{ikl}^{s}\) and \({r}_{ikl}^{s^{*}}\) are the schedules for the current state s ∈ S and subsequent state s^{∗}∈ S, respectively. Moreover, notice that for l = t + H, all rooms are freed such that r_{ikl} = 0 ∀i,k ∈ P,R. However, as procedures are constrained to specific rooms, the only feasible solution may for some cases be to outsource all current requests. If for some decision epoch the number of requests \({\sum }_{i,j \in P,J} p_{ij} = 0\), then the only action is to let \({\sum }_{i,j,k,l \in P,J,R,T} a_{ijkl} + q_{ij} = 0\), in which case the MDP merely transitions into the next state resulting in \(r_{ikl}^{s} = {r}_{ik,l1}^{s^{*}} \text {\quad } \forall i,k,l \in P,R,T\setminus \{ t+1 \}\).
Lastly, the transition probability, \(\mathbf {p}_{\mathbf {a}}^{ss^{*}}\), of changing from s ∈ S to a subsequent s^{∗}∈ S by choosing a ∈ A_{s}, is merely \(\mathbf {p}_{\mathbf {a}}^{ss^{*}} = Prob\{X_{11} = p_{11},X_{12} = p_{12},\cdots ,X_{\lvert P \rvert \lvert J \rvert } = p_{\lvert P \rvert \lvert J \rvert } \}\) if \(r_{ikl}^{s} + {\sum }_{j \in J} a_{ijkl} = r_{ik,l1}^{s^{*}} \text {\quad } \forall i,k,l \in P,R,T\setminus \{ t+1 \}\); otherwise \(\mathbf {p}_{\mathbf {a}}^{ss^{*}} = 0\).
Cost Function
In the previous section, we introduced the policy π ∈π, where π is the set of all possible policies for the MDP. Furthermore, recall that for any policy the MDP evolves as a discretetime Markov chain. Let \(V_{\infty }^{\pi }(s)\) define the expected longterm costs that are induced by this Markov chain, starting at state s ∈ S, under the policy π ∈ π. That is,
where C(s_{t},π_{t}(s_{t})) is the cost induced from taking action π(s_{t}) in state s_{t} at time t, \(\gamma ^{t} \in \mathbb {R}_{<1}\) a discount factor, and t = 0 is any arbitrary point in time. We define the optimal policy, π^{∗}, as the policy which obtains \({V}_{\infty }^{\pi ^{*}}(s) = \min \limits _{\pi \in {\Pi }} {V}_{\infty }^{\pi }(s) \text {\quad } \forall s \in S\), and thus an essential element in minimizing the expected longterm costs is the definition of how each action is penalized through the cost function, C(s_{t},π_{t}(s_{t})) = C(s,a). The reader should notice that the optimal myopic solution to the scheduling problem is included in the set π, and thus we have that \({V}_{\infty }^{\pi ^{*}}(s) \leq {V}_{\infty }^{\pi ^{\eta }}(s)\), where π^{η} is the policy for which \(\pi ^{\eta }(s) = \arg \min \limits _{\mathbf {a} \in A_{s}} E [ \gamma ^{0} C(s,\mathbf {a})] \text {\quad } \forall s \in S\).
Now recall that we consider two different types of costs:

1.
A fixed setup cost, \(\kappa \in \mathbb {R}_{>0}\), is induced whenever a procedure is scheduled to a new room—that is, whenever \({\sum }_{i \in P} r_{ikl} = 0\) and \({\sum }_{i,j \in P,J} a_{ijkl} > 0\) for any k ∈ R and l ∈ T in the current state, s.

2.
An overtimecost that accounts for procedures stretching into overtime for any k ∈ R. Let the total capacity utilization of a room be defined by \(\tau \in \mathbb {R}_{0}\), and let f(δ) define the overtimecost for an overtime of size \(\delta \in \mathbb {R}_{0}\), where δ is the amount of time that τ exceeds the capacity, w_{k}, for a room k ∈ R. Now, let p_{k}(τ) define the probability density function for a capacity utilization of amount τ in room k ∈ R. We then penalize an action according to the total expected amount of overtime, \({\sum }_{k \in R} o_{k}\), for the subsequent day, l = t + 1, where o_{k} is defined in (3). Notice that this formulation generalizes to any continuous distribution, p_{k}(τ), for which τ ≥ 0 and overtimecost function, f(δ), for which δ ≥ 0.
To ensure that actions are penalized for outsourcing requests, we further introduce a large penalty, \(\phi \in \mathbb {R}_{>0}\) for every outsourced request. Finally, the resulting cost function is presented in (4), where \(y_{kl}^{s}\) and \(y_{kl}^{s^{*}}\) are 1 if a room k ∈ R is scheduled for use on day l ∈ T in the current state s ∈ S or subsequent state s^{∗}∈ S, respectively; and otherwise 0.
A Heuristic Approach
In our case, the size of a single state and especially the state space, S, can be very large, even for small problem instances. Assuming a rather limited case where physicians are always available such that \(\lvert J \rvert = 1\), procedures are constrained to only one room, and further that c_{l} = c_{l+ 1} ∀l ∈ T, leading to s = [p_{i},r_{il}], there are a total of \(\lvert P \rvert + \lvert P \times T \rvert \) elements in each state. That is, for a case with merely \(\lvert P \rvert = 10\) different procedures, and a planning horizon of \(\lvert T \rvert = 20\) days, a single state is comprised of 210 elements. Additionally, by assuming a maximum number, n, of requests per type, i ∈ P, and a capacity limit, m, of procedures per day, the state space would have a total size of \(\lvert S \rvert = (n+1)^{\lvert P \rvert } \cdot \Big (\frac {1}{\lvert P \rvert !} {\prod }_{i=1}^{\lvert P \rvert } (m+i) \Big )^{\lvert T \rvert  1}\) states—for which a direct implementation of an exact algorithm would be a computational challenge. Furthermore, in the worst case, the action space attains a size of \(\lvert A_{s} \rvert = (\lvert R \times T \rvert )^{{\sum }_{i,j \in P,J}p_{ij}}\). For these reasons, we assume that an analytical approach to solve the MDP would be completely intractable.
The method presented in this section is based on a simulationbased approach. That is, instead of deriving an optimal action π^{∗}(s) for each of the states s ∈ S, we only rely on deriving a good action for the current state. Our approach is based on a rollout algorithm proposed by Bertsekas and Castañon [2], and later extended to parallel rollout by Chang et al. [8].
Consider some arbitrary allocation epoch, t, in which the requests p_{ij} are scheduled. These requests will be constrained by the occupation of the procedures that are already in the schedule, r_{ikl}, and for any policy induce the longterm cost \({V}_{\infty }^{\pi } (s)\). Now consider an optimal policy, π ∈π, that has been derived for a finite modelhorizon \(H^{\prime }\). Then as \(H^{\prime } \rightarrow \infty \), the policy \(\pi \rightarrow \pi ^{*}\) for the infinite case. The cost of such a policy would then be,
quite similar to (2). We assume for the remaining of this paper that γ^{t} = 1 for \(t = 0,1,\dots ,H^{\prime }\). From the definition in (5), we note that the decision a hospital planner has to take from the current state s, should be derived from the sum of first the current known cost C(s,a), and second an expected longterm cost from a sequence of future actions. Thus, we let a rollout policy, \(\pi ^{\prime }\), be defined as the result of a sequence of actions that has been derived under (6),
where \(E[\tilde {V}_{H^{\prime }1}^{\pi }(\cdot )]\) approximates (5), and \(\tilde {V}_{H^{\prime }1}^{\pi }(\cdot )\) represents the total cost of a path of decisions over the horizon t = 1 to \(H^{\prime }\). Further, we let the subsequent state relative to s be defined as s^{∗} = f(s,a,ω). That is, the combined result of the current state, s, the action, a, and a random disturbance of the system ω.
Similar to Bertsekas and Castañon, 1999 [2], we fix the disturbances, ω, to a finite set of values such that we limit our scope to a mere sample of the potential subsequent states. That is, we randomly sample N disturbances and then evaluate \(\tilde {V}_{H^{\prime }1}^{\pi }(f(s,\mathbf {a},w^{j}))\) for \(j = 1,2,\dots ,N\), yielding the N paths illustrated in Fig. 3. Thus, for the decision of choosing an action in the rollout policy, \(\pi ^{\prime }\), (6) changes to (7).
Notice that in practice, w^{j} is sampled by using pseudorandom numbers that are then converted into obtaining the requests, p_{ij}, at each subsequent state.
How to evaluate \(\tilde {V}_{H^{\prime }1}^{\pi }(f(s,\mathbf {a},\omega ))\) will be presented in the following Section 3.2.1. Moreover, the expression in (7) requires a full enumeration of the statedependent action space A_{s}. As mentioned previously, the size of A_{s} can be quite intractable, and therefore we require a robust search procedure to reduce the computational requirements. We present this procedure in Section 3.2.2.
SimulationBased Value Evaluation
Let Λ define a nonempty finite set of policies that all perform well for the hospital scheduling problem. By choosing the one policy that performs the best related to the current state, s, we allow for a rollout policy that continually adapts to the system. This is the basis of parallel rollout [8]. A related approach is to choose an action based on the current weighted average performance of the policies in Λ, which is the method we will employ in this study. We base our approach on a Simulated Annealing Multiplicative Weights (SAMW) algorithm proposed by Chang et al., 2007 [7]. Let ϕ(π) define the weighting of policy π ∈Λ, such that \({\sum }_{\pi \in {\Lambda }} \phi (\pi ) = 1\). The aim of the SAMW algorithm is then to concentrate the weighting on the currently (related to s) best policies in Λ.
Let ϕ^{i}(π) define the weight of policy π at iteration i. Then,
where \(\tilde {V}_{i}^{\pi }\) corresponds to \(\tilde {V}_{H^{\prime }1}^{\pi }(f(s,\mathbf {a},\omega ^{j}))\) at iteration i for any of the disturbances ω^{j}. In addition, we have that π ∈Λ and \({\upbeta }_{i} \in \mathbb {R}_{>1}\) is a “cooling” parameter that decreases as function of iteration i. Furthermore, Z^{i} is a normalization parameter,
Now, we let \({{\omega }_{1}^{j}}, {{\omega }_{2}^{j}}, \dots , {\omega }_{H^{\prime }}^{j}\), where \(\omega ^{j} = {{\omega }_{1}^{j}}\), define a path of random disturbances such that we get \(\tilde {V}_{i}^{\pi } = {\sum }_{t=1}^{H^{\prime }} C(s_{t},\pi (s_{t}))\), where \(s_{t} = f(s_{t1},\pi (s_{t1}),{\omega _{t}^{j}})\), s_{0} is the current state s, and π(s_{0}) is the current action a. In each iteration, we generate a new range of disturbances (except for \({{\omega }_{1}^{j}}\)) and calculate \(\tilde {V}_{i}^{\pi } \text {\quad } \forall \pi \in {\Lambda }\).
Letting \(\mathcal {T}\) define a fixed number of iterations, we get the sample mean estimate \(\psi (\pi ) = \frac {1}{\mathcal {T}} {\sum }_{i=1}^{\mathcal {T}} V_{i}^{\pi }\) for each policy π ∈Λ, which finally yields the approximation,
We use (10) to derive the last term of our rollout expression in (7). That is, (10) is used for each of the subsequent states that are illustrated in Fig. 3. The overall structure of the SAMW algorithm is presented in Algorithm 3, where we predefine \(\mathcal {T}\) experimentally to ensure a limited runtime of the algorithm. Moreover, notice that all disturbances, \({\omega _{t}^{j}}\), can be generated prior to the running of Algorithm 1 as will be elaborated in Section 3.2.2.
The Search Procedure
Our approach for deriving an action, a, from the current action space, A_{s}, is based on a Greedy Randomized Adaptive Search Procedure (GRASP) [24]. That is, we conduct an iterative search consisting of two levels: (1) A greedy randomized solution, followed by (2) a local search procedure. We use this approach due to the combinatorial and greedy cost structure of the problem, ensuring that any immediate greedy allocation of requests will result in a lowcost solution.
GRASP is generally known to perform well in various scheduling problems, as shown in the bibliography by Festa and Resende [13], and has previously been employed to solve a surgical scheduling problem by Cartes and Medina [6] where the proposed model performed adequately compared with the optimal solution. A generalized structure of the GRASP heuristic is presented in Algorithm 2.
For the greedy randomized solution, we generate a candidate list by enumerating all feasible allocations for each of the current requests, p_{ij}. Next, each of these allocations are ranked according to their apparent lowest cost increase. We then restrict this list to the \(\alpha \in \mathbb {N}\) allocations with highest rank, and finally pick an allocation by random for insertion in the schedule, r_{ikl}. This process is conducted recursively until all requests have been allocated to the schedule.
For the ranking of each candidate allocation, we conserve runtime for the later local search procedure, by only considering the current cost function, C(s,a). Recall from (4) that the cost induced at every state is comprised of firstly a fixed setup cost, secondly an overtimecost, and lastly a penalty for outsourcing requests. Thus, for an allocation to a room k ∈ R on a day l ∈ T, we evaluate a candidate on the difference,
where Δ^{i} is the increase in cost if the i th allocation is conducted for \(i = 1,\dots ,{\sum }_{i,j \in P,J} p_{ij}\). Further, \({o}_{kl}^{i} \in \mathbb {R}_{0}\) and \({y}_{kl}^{i} \in \{0,1\}\) are the overtimecost and openroom indicator for the room k ∈ R and day l ∈ T for which the request is allocated, similar to (4). Notice that we can consider the cost on allocation, so \(o^{i}_{kl} \geq {o}_{kl}^{i1}\) and \({y}_{kl}^{i} \geq {y}_{kl}^{i1}\). In addition, q ∈ {0, 1} indicates if the request is outsourced; and κ and ϕ are the fixed setup cost and outsource penalty, respectively. Lastly, \({o}_{kl}^{0}\) and \({y}_{kl}^{0}\) are inherited directly from the current state, s.
Afterwards, the local search procedure improves the solution that has been created on the greedy randomized level by intensifying the search. This is the only time in the search that the value function, \(\tilde {V}_{H^{\prime }1}^{\pi }\), is taken into account. We base this level on a firstbest hill climber using the evaluation function,
based on the expression in (7). Our implementation of GRASP for the problem of searching for a suitable action a ∈ A_{s} is presented in Algorithm 3.
Here, we construct the neighborhood, \(\mathcal {N}\), from an enumeration of every feasible single move of a procedure to a new room or day along with all feasible swaps between two procedures. We terminate the local search procedure by using an upper bound on evaluations without improvement, or if the entire neighborhood has been evaluated.
To make all solutions to the action a comparable, the disturbances that are required for the SAMW algorithm, as well as for (12), are generated during the initialization of the algorithm. Furthermore, we reuse the \(\mathcal {T}\) sequences, \({{\omega }_{2}^{j}},{{\omega }_{3}^{j}},\dots ,{\omega }_{H^{\prime }}^{j}\) between each sample path j. So, accounting for the model horizon, \(H^{\prime }\), the number of iterations in the SAMW algorithm, \(\mathcal {T}\), and the N subsequent states, we require a total of \(N + \mathcal {T} \cdot (H^{\prime }1)\) randomly generated disturbances for the execution of Algorithm 3.
Implementation and Results
In this section, we demonstrate our simulationbased MDP based on data from a Danish hospital. We use data on patient arrivals and ward resources to estimate the occurrence of requests, procedure duration, and room availability.
In Section 4.1, we present the hospital case along with a number of assumptions related to our model implementation. Next, in Section 4.2, we present the parameter tuning, followed by Section 4.3 where our approach is compared with a range of myopic policies using simulation.
Case and Data Description
For the investigated hospital, requests occur according to \(\lvert P \rvert = \mathbf {288}\) different types. The occurrenceprocess is further assumed to be Poisson, in accordance with Spratt et al., 2019 [26], and stationary with known parameters. Each request will be subject to an availabilitypattern for which we assume that every successive period of 5 days has at most 1 day where the designated physician is unavailable. In addition, all patterns occur with equal probability. Furthermore, the procedure duration is random, but with known mean and variance.
Data for the ten most frequent types of requests, accounting for 52% of the total occurrence rate, are presented in Table 1.
We assume that all requests have to be allocated. However, if the hospital does not have sufficient capacity within the current planning horizon, then a minimum number of requests are allowed to be outsourced. The fixed planning horizon is set to H = 20 days within which the capacity on the number of open rooms depends on the weekday, as shown in Table 2. In total, the hospital has three different rooms at disposal for which the opening hours result in a total time capacity of w_{k} = 7.5 h.
The planner further has to account for equipment compatibility between procedure types and rooms. The compatibility between procedures and rooms for the ten most frequent types is presented in Table 3, where 1 indicates that the procedure is compatible with the room; otherwise, the indicator is 0. Between all allocated procedures, we assume a fixed buffer time of m = 0.5 h.
Model Implementation
In accordance with hospital data studied by Spratt et al. [26] and Strum et al. [28], we assume the capacity utilization of room k ∈ R is distributed according to a lognormal distribution. This distribution has probability density function
where \(\gamma _{k} = \ln ({{\mu }_{k}^{2}} / \sqrt { \sigma _{k} + {{\mu }_{k}^{2}} } )\), \(\varrho _{k} = \sqrt {\ln (\sigma _{k} / {{\mu }_{k}^{2}} + 1 )}\), μ_{k} is the sum of the expected durations for all procedures allocated to room k ∈ R, and \({{\sigma }_{k}^{2}}\) is the corresponding sum of their variances. In addition, we use a polynomial function to evaluate the cost of performing procedures in overtime δ, assuming that f(0) = 0. That is,
In practice, the parameters b_{1} and b_{2} would be adjusted to attain the desired slope and relation to the payed overtimecosts, and the more intangible costs of stretching the procedure duration into overtime. Later, we will assess the result of adjusting these parameters on the performance of our model.
For the SAMW algorithm, we employ two basepolicies in the set Λ. These have been chosen to account for the uncertainties in the resulting costs and at the same time maintain a reasonably fast evaluation time. We will refer to these basepolicies as:

1.
The Anticipative Increased Cost Policy (AIP)

2.
The Anticipative Weighted Cost Policy (AWP)
In both policies, the current requests, p_{ij}, are allocated to the schedule, r_{ikl}, according to their expected duration, E[Y_{i}], in ascending order. Each request at a time, they evaluate all feasible roomday pairs, k,l ∈ R,T, within the planning horizon and allocate the requests based on the lowest anticipative cost. The latter is estimated differently in each of the policies.

1.
The AIP estimates the increased cost similar to (11), but for the difference, \(o^{i}_{kl}  o^{i1}_{kl}\), accounts for the future procedures that have not appeared in the schedule, yet. Specifically, the total capacity utilization is estimated from μ_{kl} and σ_{kl} (cf. the distribution in (13)), where each parameter is a sum of the already allocated procedures and an estimate of the future procedures. Thus, prior to allocating the request, the AIP assumes that
$$ \mu_{kl} = \eta_{l} \cdot E[Y_{G}] + \sum\limits_{i \in P} (r_{ikl} \cdot E[Y_{i}]) + (\sum\limits_{i \in P} (r_{ikl})  1) \cdot m $$(15)and
$$ {\sigma}_{kl}^{2} = \eta_{l} \cdot Var(Y_{G}) + \sum\limits_{i \in P} (r_{ikl} \cdot Var(Y_{i})) $$(16)for each feasible roomday pair, k,l ∈ R,T, where \(E[Y_{G}] = {\sum }_{i \in P} E[Y_{i}] \cdot \lambda _{i}/\lambda _{G}\) is the global weighted average duration, \(Var(Y_{G}) = {\sum }_{i \in P} Var(Y_{i}) \cdot \lambda _{i}/\lambda _{G}\) is the global weighted average variance, and \(\lambda _{G} = {\sum }_{i \in P} \lambda _{i}\) is the global rate of occurrence. Lastly, \(\eta _{l} \in \mathbb {R}_{>0}\) estimates the additional number of requests that day l ∈ T will be subject to in the future. Thus,
$$ \eta_{l} = \sum\limits_{x = t+1}^{l} (\lambda_{G} / (H \cdot \lvert R \rvert  d_{x}) ) $$(17)for l ≥ t + 1; otherwise, η_{l} = 0. Further, \(d_{x} \in \mathbb {N}_{0}\) is the number of roomday pairs that are closed (due to capacity depletion) within the horizon relative to day \(x \in \mathbb {N}_{t+1\leq x \leq l}\).

2.
For the AWP policy, each allocation depends again on the requests that have not appeared in the schedule yet. However, the AWP is based on the notion that uncertainty should be employed as a “weight” rather than an estimate of the potential overtimecosts. Consider the difference \({o}_{kl}^{i}  {o}_{kl}^{i1}\) from (11). This time \({o}_{kl}^{i1}\) is evaluated by merely summing over the known procedures in r_{ikl}, whereas the resulting overtimecost, \(o^{i}_{kl}\), is based on (15)–(17). However, we change (17) to \(\eta _{l} = \nu \cdot {\sum }_{x = t+1}^{l} (\lambda _{G} / (H \cdot \lvert R \rvert  d_{x}) )\), where \(\nu \in \mathbb {R}_{>0}\) determines the “weight” of these uncertain requests and is determined experimentally.
Adjusting the Parameters
The MDP model parameters were assessed and adjusted by applying the model to a simulation framework. That is, we simulated the arrival of requests and their resulting utilization of capacity in the system by generating pseduorandom numbers. In this simulation, we have assumed that requests occur according to a Poisson process, and that the total capacity utilization of any room is distributed according to a lognormal distribution as defined by (13).
We randomly generated three different sets of seeds covering a simulation period of 565 days and then replicated each run of the simulation on each respective set twice. A total of 365 days were used to burnin the simulation for which we used the AIP policy to save runtime, leaving 200 days to assess the model performance of the MDP. On each respective day, the MDP was given a 20min time limit (cf. Algorithm 3). This limit was chosen to ensure that results resemble a practical setting.
Tests were conducted using three different levels for each respective parameter. The parameters that were subject for testing, and their levels, are presented in Table 4. The number of sampled paths, N, and the SAMW iterations, \(\mathcal {T}\), were tested with interaction resulting in a total of (3 × 3 + 3 + 3) × 2 × 3 = 90 simulations. The remaining parameters were adjusted during preliminary testing of the model.
For the cooling schedule, β_{i}, we tested both a fixed cooling parameter, such that β_{i} remained constant for all \(\mathcal {T}\) iterations, and an exponential decreasing continuous function, \({\upbeta }_{i} = \upbeta (i) = 1 + C(\epsilon ,\mathcal {T})^{(i1)}\), where \(C(\epsilon ,\mathcal {T}) = e^{\frac {ln(1+\epsilon )}{\mathcal {T}}}\), and 𝜖 defines the final cooling value after \(\mathcal {T}\) iterations.
Lastly, for the overtime function in (3), we used a setting with b_{1} = 10 and b_{2} = 4, and a fixed setupcost of κ = 100. The penalty for outsourcing requests was set to ϕ = 1 ⋅ 10^{6}.
We measured the performance of each setting on the cumulated cost (over the 200day simulation period) of both the overtime and setupcosts; and the penalty from outsourcing requests. The parameters were then compared in a dotplot and on their respective correlations to the amount of cumulated value. Interestingly, the cooling schedule showed to be more effective when held constant at β_{i} = 2 and decreasing in performance as 𝜖 increases; hence, when β_{i} decreases at a faster rate. The cooling schedule had a distinct effect on the performance, whereas the remaining parameters were more inconclusive. The effect from the number of SAMW iterations, \(\mathcal {T}\), was almost negligible with respect to the cumulated value, whereas the number of sampled paths, N, and the model horizon, \(H^{\prime }\), depended more on the specific set of seeds for the simulation.
As regards the value of ν in the AWP, we employed a hill climber heuristic where the average performance was recursively evaluated over ten different sets of seeds until convergence. This resulted in a final weight of ν = 16.969.
Numerical Experiments
In this section, we apply our MDP model based on the results from the parameter tests and compare the performance with a range of different policies. These include a policy that resembles the behavior of a “manual” planner, which we will refer to as the Manual Policy (MP). Next, we compare the MDP performance with a more advanced heuristic search procedure.
The MP is based on the following assumptions:

1.
The expected duration of each procedure is known to the planner.

2.
The planner is familiar with procedure variability, but the exact distribution nor spreading is not known. For this reason, a fraction of the available capacity is used as a buffer such that a new procedure is not allowed to start within this time interval. We will test two different buffer levels in our numerical experiments. To ensure the planner can utilize the remaining capacity, we allow the buffer to be violated by a maximum of 10% of the total capacity.

3.
The exact costs are unknown to the planner. For this reason, the planner will try to utilize the setupcost for a new room as much as possible. Firstly, the requests are sorted in ascending order similar to the policies in Λ and then allocated in sequence to the roomday pair that results in the least amount of excess capacity. If there are no feasible allocations for the roomday pairs that are already in use, the planner will allocate the request to the latest unused roomday pair such that this new room will be subject to as many future requests as possible.
Our experiments were conducted using simulations similar to the tests in Section 4.2. Thus, a period of 365 days were used to burnin the simulation, and 200 days to assess the performance of the model. However, simulations were extended to eight different sets of seeds and replicated five times on each set. Besides testing the model on a range of different seeds, we varied the parameters in the overtime function (14) on four different levels, presented in Table 5. Later, we will refer to the overtimecost settings using the conventions presented in this table. Again, the MDP runtime was fixed at 20 min for each day over the entire length of the simulation.
Our experiments in this section include the MDP, MP, and the policies in Λ. For the MP, we decided to employ a capacity buffer of 20% which resembles the fraction that is most commonly used by our case hospital. Also, to test if higher room utilization leads to a better performance, we included tests with a 10% buffer.
In order to compare the performance across the different combinations of seeds (and thereby the behavior of the requests generated) and overtimecosts, we standardized the cumulated cost, including the penalty for outsourcing, by employing the conversion
where y_{ijk} is the resulting cost of simulation run \(k \in \mathcal {K}_{ij}\) using seeds i and overtimecost setting j. Thus, for the five replications of the MDP, the MP with both 10% and 20% capacity buffer; and the policies in Λ, \(\lvert \mathcal {K}_{ij} \rvert \) includes 9 runs for each combination of i and j.
The results are presented in Table 6 showing the performance of each model and overtimecost setting, presented as both the average and standard deviation standardized cost. Furthermore, average runtimes of each model are presented in Table 7.
Table 6 shows a distinct difference between the MDP and the remaining policies, measured on both the average and standard deviation performance. The difference is especially distinct between the MDP and the MP, regardless of the capacity buffer. Notice that the MP with 20% capacity buffer yields an average of 1.000 and a standard deviation of 0.000 for the first three overtime settings because this policy resulted in the highest cost across all eight sets of seeds.
Interestingly, the benefit of using the MDP increases as function of the overtime cost. Simultaneously, the difference between the policies decreases, resulting in quite indifferent performance at the highest overtimecost level. Otherwise, the MP performs substantially better with a 10% instead of a 20% capacity buffer. Still, the anticipative policies in Λ yield substantially lower costs than both MP settings, where AWP results in both lower average and standard deviation cost than all the remaining policies, except at the highest level. The relative difference between the average performance of the MDP and AWP is quite large, but given the runtimes in Table 7 which shows that AWP derives a decision in about 3 ms, the latter might be a suitable choice in many settings.
We emphasize that performances of advanced approaches (such as the MDP) are only relevant if a computational setup can be introduced into the hospital operations, which is an obvious advantage of a manual approach. However, if this is possible, then we should consider how another wellknown scheduling method compares with the MDP performance. This is the purpose of the following section.
Further Validation
In this section, we compare our MDP with a GRASP heuristic with a myopic coststructure. That is, we reused the basic algorithmic structure that was presented in Algorithm 2, but without the anticipative costs. Instead, during the local search, we evaluate the solution on the sum of the expected overtimecost and the fixed setupcost over the entire planning horizon. Thus,
We chose to compare our MDP with the myopic GRASP due to its performance in other scheduling problems [13, 24], and because the method can be applied without excessive computer programming, which is an advantage to hospitals converting from manual to computational planning.
Just as in our previous experiments, we simulated the performance of the GRASP heuristic with 20 min of runtime and replicated each run five times on each combination of seeds and overtimecost setting.
The result of the simulations is presented in Table 8, showing the average and standard deviation performance for each model and overtimecost setting. The average performance has further been depicted in Fig. 4. Again, the models are compared on their standardized cost according to (18) but recalculated to fit the only two models that are compared in this section.
Table 8 shows that the performance of the two models is much more equal compared with our previous experiments. In fact, the GRASP outperforms the MDP in both average and standard deviation cost when the overtimecost is set to “Low.” This corresponds to the parameters b_{1} = 10 and b_{2} = 4 which is the setting that the MDP was adjusted for. However, as the cost of stretching procedures into overtime increases, so does the MDP performance resulting in lower average (cf. Fig. 4) and standard deviation cost for the remaining overtime settings.
We should further emphasize that in the cheapest setting, the overtimecost does not exceed the cost of opening a new room until about 3 h into overtime, which is longer than the expected duration for most of the occurring requests in our data. This may not apply to many real hospital settings. In addition, these experiments were conducted for a reasonably short simulation of 200 days; hence, if the improvement of exploiting the rolling and overlapping nature of the problem is small, then such advantage might only show for much longer periods of simulation.
Conclusion
Aiming to apply and test a new approach to the problem of scheduling OTs, we developed a simulationbased MDP. The advantage of this type of modeling approach is that a sequence of decision problems are taken into account, which is often disregarded in OT planning.
Specifically, our approach derives a heuristic policy by evaluating a number of potential future scheduling paths from the currently observed state. This process is based on a predefined set of basepolicies and employs an algorithm known as Simulated Annealing Multiplicative Weights (SAMW) [7]. We further consider that the statedependent action space is intractable, and for this reason, we derive an action with a Greedy Randomized Adaptive Search Procedure (GRASP).
In order to validate our approach, we conducted a number of numerical experiments based on simulation and compared our results with both simple and more advanced myopic scheduling methods. Firstly, we validated a policy that resembles a manual planner. This indicated that a distinct improvement can be attained if our model is employed rather than scheduling requests manually. Furthermore, we found that a substantial improvement can be attained by employing a policy that accounts for future requests by weighting their contribution to the overtimecosts. We refer to this as the Anticipative Weighted Cost Policy (AWP). In addition, we found that a GRASP disregarding the rolling horizon performs only slightly worse than our MDP, and in fact better when the cost of stretching into overtime is sufficiently low.
Future Work
In future work, more extensive numerical experiments should be considered. The difference in performance between the simulationbased MDP and myopic GRASP should be investigated by extending the period over which simulation is conducted, employing more levels on the overtimecost setting, and more effective basepolicies in Λ. Additionally, further work into more simple policies should be investigated to benefit the hospital cases where requests have to be allocated within a short time (e.g., below a few seconds).
References
 1.
Batun S, Denton BT, Huschka TR, Schaefer AJ (2011) Operating room pooling and parallel surgery processing under uncertainty. INFORMS J Comput 23 (2):220–237. cited By 41
 2.
Bertsekas DP, Castañon DA (1999) Rollout algorithms for stochastic scheduling problems. J Heuristics 5(1):89–108. cited By 155
 3.
Blake JT, Carter MW (1997) Surgical process scheduling: a structured review. J Soc Health Sys 5(3):17–30. cited By 59
 4.
Cardoen B, Demeulemeester E, Beliën J (2009) Optimizing a multiple objective surgical case sequencing problem. Int J Prod Econ 119(2):354–366. cited By 63
 5.
Cardoen B., Demeulemeester E., Beliën J. (2010) Operating room planning and scheduling: A literature review. Eur J Oper Res 201(3):921–932
 6.
Rubilar Ignacio Cartes, Duran Rosa Medina (2016) A grasp algorithm for the elective surgeries scheduling problem in a chilean public hospital. IEEE Lat Am T 14(5):7530430, 2333–2338
 7.
Chang HS, Fu MC, Hu J, Marcus SI (2007) An asymptotically efficient simulationbased algorithm for finite horizon stochastic dynamic programming. IEEE T Auto Contr 52(1):89–94. cited By 12
 8.
Chang HS, Givan R, Chong EKP (2004) Parallel rollout for online solution of partially observable markov decision processes. Discrete Event Dynamic Systems: Theory and Applications 14(3):309–341. cited By 36
 9.
Denton B, Viapiano J, Vogl A (2007) Optimization of surgery sequencing and scheduling decisions under uncertainty. Health Care Manag Sci 10(1):13–24. cited By 170
 10.
Denton BT, Rahman AS, Nelson H, Bailey AC (2006) Simulation of a multiple operating room surgical suite. pages 414–424,cited By 39
 11.
Erdem E, Qu X, Shi J (2012) Rescheduling of elective patients upon the arrival of emergency patients. Decis Support Syst 54(1):551–563. cited By 10
 12.
Fei H, Meskens N, Chu C (2010) A planning and scheduling problem for an operating theatre using an open scheduling strategy. Comput Ind Eng 58(2):221–230. cited By 92
 13.
Festa P, Resende MGC (2002) Grasp An annotated bibliography. Oper Res/comput Sci Interfaces Ser 15:325–367
 14.
Organisation for Economic Cooperation and Development (OECD). Oecd.stat  health care utilisation, Available at https://stats.oecd.org/Index.aspx?DataSetCode=HEALTH_STAT#
 15.
Guerriero F, Guido R (2011) Operational research in the management of the operating theatre: a survey. Health Care Manag Sci 14(1):89–114
 16.
Hansagi H, Carlsson B, Brismar B (1992) The urgency of care need and patient satisfaction at a hospital emergency department. Health Care Managt Rev 17(2):71–75. cited By 59
 17.
Lamiri M, Xie X, Dolgui A, Grimaud F (2008) A stochastic model for operating room planning with elective and emergency demand for surgery. Eur J Oper Res 185(3):1026–1037. cited By 127
 18.
May JH, Spangler WE, Strum DP, Vargas LG (2011) The surgical scheduling problem: current research and future opportunities. Prod Oper Manag 20 (3):392–405
 19.
McMillan JR, Younger MS, De Wine LC (1986) Satisfaction with hospital emergency department as a function of patient triage. Health Care Manag Rev 11 (3):21–27. cited By 77
 20.
Min D, Yih Y (2010) Scheduling elective surgery under uncertainty and downstream capacity constraints. Eur J Oper Res 206(3):642–652. cited By 53
 21.
National Audit Office of Denmark. Beretning til statsrevisorerne om hospitalernes brug af personaleresurser, Available at http://rigsrevisionen.dk/publikationer/2015/102014/
 22.
Ministry of Health. Status paa sundhedsomraadet, Available at http://www.sum.dk/Aktuelt/Publikationer/Statuspaasundhedsomraadetsept2015.aspx
 23.
Range TM, Kozlowski D, Petersen NC Dynamic job assignment: a column generation approach with an application to surgery allocation. Discussion Papers on Business and Economics
 24.
Resende, Mauricio GC, Celso C. Ribeiro (2014) Grasp Greedy randomized adaptive search procedures. Search methodologies: introductory tutorials in optimization and decision support techniques, 2nd Edn, pp 287–312
 25.
Samudra M, Van Riet C, Demeulemeester E, Cardoen B, Vansteenkiste N, Rademakers FE (2016) Scheduling operating rooms: achievements, challenges and pitfalls. J Sched 19(5):493–525
 26.
Spratt B, Kozan E, Sinnott M (2019) Analysis of uncertainty in the surgical department: durations, requests and cancellations. Aust Health Rev 43(6):706–711
 27.
Steins K, Persson F, Holmer M (2010) Increasing utilization in a hospital operating department using simulation modeling. Simulation 86(89):463–480. cited By 23
 28.
Strum DP, May JH, Vargas LG (2000) Modeling the uncertainty of surgical procedure times  comparison of lognormal and normal models. Anesthesiology 92 (4):1160–1167
 29.
Tancrez JS, Roland B, Cordier JP, Riane F (2009) How stochasticity and emergencies disrupt the surgical schedule. Studies Comput Intell 189:221–239. cited By 3
 30.
Van Huele C, Vanhoucke M (2014) Analysis of the integration of the physician rostering problem and the surgery scheduling problem. J med sys 38(6):43. cited By 4
 31.
Van Oostrum JM, Van Houdenhoven M, Hurink JL, Hans EW, Wullink G, Kazemier G (2008) A master surgical scheduling approach for cyclic scheduling in operating room departments. OR Spectrum 30(2):355–374. cited By 83
 32.
Vanberkel PT, Blake JT (2007) A comprehensive simulation for wait time reduction and capacity planning applied in general surgery. Health Care Manag Sci 10(4):373–385. cited By 74
 33.
Xiang W, Yin J, Lim G (2015) A shortterm operating room surgery scheduling problem integrating multiple nurses roster constraints. Artif Intell Med 63(2):91–106. cited By 2
 34.
Zhang J, Dridi M, El Moudni A (2019) A twolevel optimization model for elective surgery scheduling with downstream capacity constraints. Eur J Oper Res 276(2):602–613
Acknowledgments
We would like to thank the Department of Production, Research and Innovation for providing us with essential data and information on the operations of the Danish hospital operating theatres. In addition, we would like to thank Assistant Prof. Charlotte Vilhelmsen for providing us with insight into the literature on scheduling operating theatres and Professor Bo Friis Nielsen with statistical advice.
Funding
This research was financially supported by the Danish governmental organization Region Sjælland.
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Andersen, A.R., Stidsen, T.J.R. & Reinhardt, L.B. SimulationBased Rolling Horizon Scheduling for Operating Theatres. SN Oper. Res. Forum 1, 9 (2020). https://doi.org/10.1007/s4306902000096
Received:
Accepted:
Published:
Keywords
 Patient scheduling
 Stochastic optimization
 Decision processes
 Heuristics