1 Introduction

The “global cancer burden” is projected to exceed 27 million new cancer cases per year by 2040, which exceeds the estimated figures of 2018 (the most recent estimate of 18.1 million cases) by a factor of 1.5, as indicated in the 2020 world cancer report (Wild et al. 2020). This immense rise together with the general pressure to rationalize health care expenses requires health care facilities such as radiotherapy centers to manage machines and resources more efficiently (Tancrez et al. 2013). Radiation therapy, or short radiotherapy, is a commonly used treatment for patients diagnosed with cancer (in addition to or instead of surgery and chemotherapy) with the goal of killing tumorous cells while sparing surrounding healthy tissue. In general, radiotherapy treatment appointments are planned a few days or weeks in advance and emergency patients who need to receive treatment immediately are rare. Nevertheless, real-world data uncovers high uncertainty in treatment durations for radiotherapy appointments, even though medical physicists are able to accurately estimate the planned irradiation duration during the intense treatment planning process. Uncertainty is a key challenge in any appointment scheduling process (Gupta and Denton 2008), but the underlying uncertainty in radiotherapy treatment durations has not yet been considered in the radiotherapy appointment scheduling literature (see Sect. 2).

As has been shown in previous studies (Kreitz et al. 2016, e.g.), waiting time is one of the main factors influencing patient satisfaction. Waiting time to the first treatment appointment (i.e., the start of the recurring treatment process) has been addressed as a crucial objective in several academic papers on radiotherapy appointment scheduling, but daily waiting time between the planned treatment start and the actual time when treatment is performed has not been addressed so far in this area. Since many patients are treated consecutively in a treatment facility, a delay of one single patient typically affects the starting times of several successive patients and can even cause waiting time for all upcoming patients on a given day. Practitioners tend to focus on the optimization of resource usage solely, due to high fixed costs associated with machines and staff, such that they prefer tight schedules. The variability in the actual treatment durations, however, may then lead to resource conflicts and hence delays, an effect that aggravates the tighter the schedule is planned (Gupta and Denton 2008). This suggests an explicit consideration also of waiting times when developing a planning method.

Thus, in order to enhance patient satisfaction and wellbeing, we propose an optimization model that takes both patient waiting time and resource usage into account and increases robustness during the execution of the schedule by considering the stochasticity of the activity durations. We show that for the highly constrained and stochastic problem of radiotherapy appointment scheduling in special ion beam facilities where only one beam resource is available, it is beneficial to insert activity time buffers. Our proposed buffer concept is based on quantiles of probability distributions and can thus be applied to any distribution of treatment durations.

The decision to be optimized in our approach consists of three components: we determine a treatment plan specifying the days on which each patient should receive treatments, a priority list of patients for determining the daily schedules, and a value for the buffer parameter which controls the size of the time buffers. From these three components, we compute a baseline schedule including buffer times. During actual execution of the baseline schedule, modifications of starting times may become necessary in view of the stochastic nature of durations. To gradually compute the modified schedule from the baseline schedule while the true activity durations are revealed one after the other in the course of a current day, we use a dynamic procedure which we call “reactive procedure”. The treatment plan, the patient priority list and the buffer parameter are (heuristically) optimized by a Genetic Algorithm (GA) in the planning phase based on the current data on patients, their treatment requirements and their preferred time windows for treatment. As the objective function in our optimization problem is an expectation of a complex random expression, we need a way to evaluate its value and use the Sample Average Approximation (SAA) technique for this purpose. This essentially results in applying the principle of Monte Carlo simulation.

This article is organized as follows: Sect. 2 reviews related work on radiotherapy scheduling, stochastic appointment scheduling, and general strategies to deal with uncertainty. Section 3 presents a formulation of a general optimization problem of which the stochastic radiotherapy appointment scheduling problem addressed in our application is a special case. Section 4 is dedicated to the proposed solution methodology: we specify the structure of the overall optimization approach, describe the buffer concept, present the reactive procedure, and explain the GA as well as the used SAA technique. In Sect. 5 we analyze data on appointment durations gathered from an ion beam facility in Austria, and fit distributions to the different activity categories. The results of our intensive computational tests are given in Sect. 6. Finally, Sect. 7 concludes and proposes some possible directions for further research.

2 Related work

Radiotherapy Appointment Scheduling The specific problem of scheduling radiotherapy appointments has been addressed in the literature since 2006 (Petrovic et al. 2006), and variants of the problem have been modeled mathematically and solved using heuristics such as greedy randomized adaptive search (Petrovic and Leite-Rocha 2008) or GAs (Petrovic et al. 2009; Petrovic and Castro 2011). Vieira et al. (2016) provide a broad overview of both radiotherapy treatment scheduling and resource planning. Maschler et al. (2017a, 2017b) and Vogl et al. (2018a, 2018b) address specific deterministic, long-term scheduling problems that arise in ion-beam facilities in which multiple rooms are supplied by only one beam resource. They build deterministic baseline schedules by applying different metaheuristic search techniques. However, they completely neglect patient waiting time as a major factor of schedule quality and focus entirely on deterministic resource usage optimization. In the present work, we deal with the same setting, but treat it in a short-term, stochastic environment.

Uncertainty in relation to radiotherapy scheduling is addressed by two papers: Sauré et al. (2012) identify effective policies for allocating demand to still unknown patients using a Markov decision process in an effort to minimize the time that patients must wait before the treatment starts. Legrain et al. (2015) also address uncertainty related to the arrival of patients to radiotherapy facilities and develop a hybrid online stochastic optimization algorithm to tackle the stochasticity. However, to the best of our knowledge, none of the existing studies on radiotherapy scheduling considers stochastic activity durations, which we investigate in this paper.

Stochastic Appointment Scheduling Problems Stochasticity in appointment scheduling problems can be of a twofold nature: First, the patients to be treated may not be known in advance and instead get successively revealed during the planning horizon. For example, emergency patients need to be immediately treated, and planned patients do not always show up for their appointments. Second, the treatment duration may be subject to uncertainty (Gupta and Denton 2008). We concentrate on this latter aspect. Ahmadi-Javid et al. (2017) review optimization studies that consider random appointment durations (or “service times”). For the special problem of operating room scheduling, Cardoen et al. (2010) consider studies that incorporate uncertain procedure durations. A more recent review by Samudra et al. (2016) also includes a section dedicated to uncertainty.

Belien and Demeulemeester (2004) propose models for building robust cyclic surgery schedules when the procedure duration is stochastic. Robustness also plays an important role in the work by Hans et al. (2008), who use advanced optimization techniques combined with historical data on surgery durations to improve capacity utilization. Denton et al. (2007) conclude that sequencing patients according to the expected variance of their activity duration achieves the best results when the scheduling involves a single server. However, following this strategy might not be beneficial in highly constrained settings. In our case, the preferred time windows would render such a simple strategy ineffective, because a lot of constraint violations would be produced. Kaandorp and Koole (2007) propose a local search procedure to optimize a weighted average of patient waiting time and doctor idle time. They assume the activity duration to be exponentially distributed. Their objective function looks similar to our setting, yet our problem is much more constrained, involving aspects like predefined patient treatment patterns and preferred time windows.

Koeleman and Koole (2012) show that different service time distributions lead to different optimal baseline schedules when a local search algorithm is used and emergency arrivals are included into consideration. Begen et al. (2012) propose a sampling-based approach to address the problem of discrete random appointment durations, which produces a near-optimal solution with high probability in polynomial time. Erdogan and Denton (2013) present two stochastic models considering uncertain durations, as well as no-shows. Tancrez et al. (2013) take uncertainty in operating room planning on a more strategic decision-making level into account and present a Markov process which allows to evaluate the impact of stochasticity on various performance measures. The approach of Kemper et al. (2014) to schedule patients for any convex loss function and any service time distribution entails that customers should be scheduled in non-decreasing order of their scale parameter. Finally, Berg et al. (2014) propose three solution methods to address the two-stage stochastic problem of scheduling patient appointments on a single stochastic server, extending work by Denton et al. (2007).

Methodological Approaches to Address Stochasticity in Scheduling Problems Many approaches deal with uncertainty in project and appointment scheduling, as summarized by Herroelen and Leus Herroelen and Leus (2005) in their overview of literature pertaining to reactive, stochastic, fuzzy, and proactive scheduling approaches. They mention the stochastic resource-constrained project scheduling problem (SRCPSP) as a paradigmatic problem formulation for scheduling under uncertainty and outline the most usual strategy to tackle with the SRCPSP, namely to apply a dynamic scheduling policy that makes decisions at certain points in time, such as the termination of an activity, based on the current state of the project. (We adopt the concept of a scheduling policy in the form of our reactive procedure.) What remains to be done when following this general approach is the optimization of parameters or input variables for the scheduling policy. Considering the stochasticity of essential variables, this leads to a stochastic optimization problem.

Van De Vonder et al. (2005) thoroughly analyze the impact of buffers in project management, and this research group has also considered various combinations of proactive and reactive approaches (Davari and Demeulemeester 2017; Demeulemeester et al. 2008; Van De Vonder et al. 2006, 2007, e.g.).

The classical approaches to stochastic combinatorial optimization problems are based on two- or multi-stage mathematical programming, or on dynamic programming. However, for many real-world problems of this type, these methods are infeasible due to their size and the limited running time. Metaheuristics then offer good alternatives for solving problems marked by uncertainty. A survey of metaheuristics for solving stochastic combinatorial optimization problems (SCOPs) given by Bianchi et al. (2009) lists three possible ways to compute objective functions for SCOPs: (1) if closed-form expressions for expected values are available, compute the objective function exactly; (2) if closed-form expressions are not available or their repeated evaluation is too time consuming, use ad hoc, fast approximations; and (3) if the problem is too complex in terms of dependencies, estimate the objective function by simulation. We consider the latter approach, because we address a highly constrained problem with extensive probabilistic dependencies. This variant also is known as Sample Average Approximation and was applied successfully to SCOPs by Kleywegt et al. (2002) as well as Mancilla and Storer (2012). Finally, Juan et al. (2015) review simheuristics—typically an extension of metaheuristics by the integration of simulation in the optimization procedure—and identify multiple subcategories of this term, depending on how much time is spent on simulation and optimization. Their general scheme for solving SCOPs includes a fast simulation process and few replications to estimate the quality of the solution during the optimization (Bianchi et al. 2009).

The contribution of the present paper is threefold: First, we address stochastic activity durations in radiotherapy appointment scheduling (or related applications involving a core resource, preparation and exiting activities, and assigned facilities) by a stochastic optimization approach. We do not only maximize resource usage when building schedules, but additionally strive for minimizing patient waiting time. Secondly, we present a buffer concept relying on quantiles of activity duration distributions. Third, we propose a reactive procedure that mimics the decision process of the human decision maker during the execution of a baseline schedule, and embed it into the stochastic optimization framework. For the numerical solution of the optimization model, we use a Genetic Algorithm, based on objective function evaluations obtained by Sample Average Approximation (and thus Monte Carlo simulation).

3 Problem statement

This section contains a thorough description of the problem at hand. We shall first describe the problem in a more general form to show its fairly broad range of applicability.Footnote 1 Then, we will specify the particular features of the model for our case study, the radiotherapy appointment scheduling case described in detail in Sect. 5. The symbols and abbreviations used throughout this article are listed in Tables 8 and 9 in the Appendix.

3.1 The general model

We assume a medical unit consisting of L treatment rooms and one core resource; the latter can be personnel, a medical device, or whatever. Treatments for a given set of patients are to be scheduled on certain days of a planning period consisting of D days in total. The set of patients is denoted by \({{\mathcal {P}}}\).

Patient \(p \in {{\mathcal {P}}}\) needs \(N_p\) treatments \((1 \le N_p \le D)\) during the planing period, each treatment on a different day. For each \(p \in {{\mathcal {P}}}\), the number \(N_p\) is assumed as given in advance. Moreover, we assume that for each patient \(p \in {{\mathcal {P}}}\), a set of feasible treatment patterns is given. A treatment pattern is a vector \(x = (x_1,\ldots ,x_D)\) with \(x_d \in \{0,1\}\) \((d=1,\ldots ,D)\), where \(x_d = 1\) if the patient is treated on day d and 0 otherwise. The set of feasible treatment patterns x for \(p \in {{\mathcal {P}}}\) is denoted by \({{\mathcal {X}}}_p \subseteq \{0,1\}^D\). For each \(x \in {{\mathcal {X}}}_p\), the binary vector x contains the same number \(N_p\) of bits with value 1.

A treatment plan is a matrix \(X = (x_{pd})_{p \in {{\mathcal {P}}}, d=1,\ldots ,D}\), defining the treatment pattern for each patient. The treatment plan X is called feasible if \(x_{p} \in {{\mathcal {X}}}_p\) for each patient \(p \in {{\mathcal {P}}}\), where \(x_{p} = (x_{p1},\ldots ,x_{pD})\). Let \({{\mathcal {X}}}\) denote the set of all feasible treatment plans.

The L treatment rooms are heterogeneous (allowing for different kinds of treatments) and each patient \(p \in {{\mathcal {P}}}\) is assumed to be assigned already in advance to a specific treatment room, based on medical considerations. This assignment does not change during the period of D days. A basic feature of our model is that we suppose each treatment to consist of three phases: (a) a preparation phase, (b) a core treatment phase during which the core resource is needed in an exclusive way (it cannot be used for another treatment during this time), and (c) an exiting phase. While going through these three phases, the patient does not leave the room.

In addition to the decisions on the days when to treat each patient, appointment times for the start of the treatments have to be determined and communicated to the patients. As far as possible, these times should respect the preferences of the patients p which are given by a ready time \(r_p\) (earliest appointment time) and a due time \(\tau _p\) (latest appointment time) for each patient \(p \in {{\mathcal {P}}}\).

For each patient p, each treatment phase \(i=1,2,3\), and each day \(d \in D\), the activity duration \(\varTheta _{pid}\) describes the time needed for carrying out the corresponding treatment activity (pid). Taking sources of uncertainties discussed in the next subsection into account, we assume that the activity duration \(\varTheta _{pid}\) is not precisely known in advance. Rather than that, we model the variables \(\varTheta _{pid}\) as random variables and assume that their joint distribution \({{\mathcal {D}}}\) can be estimated. This is contrary to a simpler version of the model investigated by Vogl et al. (2018b) where deterministic activity durations are assumed. Note that although the realizations of the random variables modeling the activity durations are day-specific, they follow the same day-independent distributions.

As it is typical for the difference between deterministic and stochastic scheduling, completely different solution methods are needed for the stochastic case. The main difficulty of the stochastic case in our situation is that starting times of activities cannot be determined anymore in advance, since they have to depend on the realizations of the random variables \(\varTheta _{pid}\). As a consequence, a “solution” to the problem cannot be described anymore by a treatment plan plus a static vector of starting times for treatment activities. Rather than that, a solution consists of a scheme which we shall call “design”, from which the actual starting times are generated in two steps (one taking place before the planning period, the other during the execution of the treatments). Parts of this design are the treatment plan and a priority list determining in which sequence patients are inserted into the schedule to be constructed. Priority lists can in fact be seen as priority policies, as discussed extensively by Möhring and Stork (2000). A priority policy is used at each decision point during scheduling to decide which job (or patient, in our case) to choose from a set of eligible ones. It is commonly defined as a linear ordering of the entities to be scheduled. In the context of the problem at hand, it is simply given as a permutation of patient indices.

Moreover, when determining appointment times, we do not base them on expected activity durations, but include buffer times. The sizes of the buffers are a matter of optimization as well. We shall control them by a single parameter \(\beta \) called buffer parameter; larger values of \(\beta \) entail larger buffer sizes. The precise way how \(\beta \) influences the buffer sizes can be chosen in a problem-specific way. We propose a special dependence between buffer parameter and buffer sizes for our concrete application below.

From treatment plan, priority list and buffer parameter, a “baseline schedule” is computed. During the execution, actual starting times are generated “on-the-fly” using the baseline schedule and the current realizations of the random activity durations, which turns the baseline schedule into an actual schedule.

Let us now describe the approach outlined above in more formal terms. Formally, a design is a triple \(Z = (X,\pi ,\beta )\), where (1) \(X \in {{\mathcal {X}}}\) is a feasible treatment plan, (2) the permutation \(\pi \in \varPi \) is a priority list of patients (\(\varPi \) denotes the set of all permutations of patient indices), and (3) \(\beta \in [0,1]\) is the buffer parameter.

Our general framework requires the specification of two procedures \(\mathtt{sched}\) (“schedule construction”) and \(\mathtt{react}\) (“reactive procedure”) for the particular application under consideration. We shall specify them in Sects. 4.1 and 4.2 for the case of our radiotherapy application. The general strategy of the approach is the following:

First of all, from a given design \(Z=(X,\pi ,\beta )\), a baseline schedule \(S_{bas}\) is derived by the application of the schedule-construction procedure \(\mathtt{sched}\):

$$\begin{aligned} S_{bas} = \mathtt{sched}(Z) = \mathtt{sched}(X,\pi ,\beta ). \end{aligned}$$

This is carried out before the beginning of the planning period. The baseline schedule \(S_{bas}\) contains “planned” starting times for all activities, where buffer times have already been included. The baseline schedule also defines the appointment times announced to the patients.

In view of the randomness of the variables \(\varTheta _{pid}\), it is not sure whether \(S_{bas}\) can be executed as it is, considering that certain buffer times may be exceeded. This makes the above-mentioned modification during the execution of the schedule necessary. We carry it out by the application of the reactive procedure \(\mathtt{react}\), which takes \(S_{bas}\) and the durations \(\varTheta _{pid}\), as they are gradually revealed during the planning period, and determines from this input the actual schedule \(S(\omega )\). Therein, \(\omega \) stands for the influence of randomness. Thus, with \(\varTheta = \varTheta (\omega )\) representing the collection of the random activity durations \(\varTheta _{pid}\), the actual schedule results from the design by

$$\begin{aligned} S(\omega ) = \mathtt{react}(S_{bas},\varTheta (\omega )) = \mathtt{react}(\mathtt{sched}(X,\pi ,\beta ),\varTheta (\omega )). \end{aligned}$$

Finally, for obtaining an optimization problem, an objective function has to be defined. Suppose that a cost function F is given that evaluates each actual schedule S by a cost value F(S). In our context, the evaluation by F includes both the economic usage of the core resource (the overall time span the core resource is needed should be minimized by means of elimination of idle times) and the waiting times of the patients (which should be minimized as well). We shall specify the cost function for our concrete application in precise formal terms below.

Since the realized schedule \(S(\omega )\) depends on the influence of randomness, the cost function value \(F(S(\omega ))\) is a random variable as well. We take its expected value \(\mathbf{E}[F(S(\omega ))]\) as our objective function. In this way, we get an evaluation of any chosen design \(Z=(X,\pi ,\beta )\) by a function f:

$$\begin{aligned} f(Z) = \mathbf{E}[ F(S(\omega )) ] = \mathbf{E}[ F(\mathtt{react}(\mathtt{sched}(X,\pi ,\beta ), \, \varTheta (\omega )))]. \end{aligned}$$
(1)

The aim of our optimization approach is to find a design that will produce minimal expected cost during execution. This produces the following stochastic optimization problem:

$$\begin{aligned} \min \,&\mathbf{E}\left[ F\left( \mathtt{react}(\mathtt{sched}(X,\pi ,\beta ), \, \varTheta (\omega ))\right) \right] \end{aligned}$$
(2)
$$\begin{aligned} \text{ s.t. } \,&X \in {{\mathcal {X}}} \end{aligned}$$
(3)
$$\begin{aligned}&\pi \in \varPi \end{aligned}$$
(4)
$$\begin{aligned}&0 \le \beta \le 1 \end{aligned}$$
(5)

Obviously, this is a mixed-integer stochastic optimization problem. Note that although \(\mathtt{react}\) is a dynamic procedure, the problem above is not a dynamic optimization problem anymore since we fix the procedure \(\mathtt{react}\) in advance. Note further that while \(\beta \) appears as a decision variable similar to X and \(\pi \) in formulation (2)–(5), it is in fact a hyperparameter of our proposed approach to solve the minimization problem given by Eq. (2) and is thus determined and fixed before the search for an optimal treatment plan and priority list takes place.

3.2 Application: radiotherapy scheduling

In our application study, we deal with appointment scheduling in an ion beam facility, in which one particle beam device serves multiple treatment rooms. This device is our core resource, it represents the main bottleneck of the scheduling problem. In our specific setting, we schedule patients in \(L = 3\) treatment rooms, one with a vertical, one with a horizontal and one with a 90 degree flexible beam angle. The particle beam in ion therapy consists of either protons or carbon ions, and switching between those particle types demands a set-up time of a few minutes (in our case about 3 min). The beam is accelerated to two-thirds the speed of light in a linear accelerator, followed by multiple circulations through a synchrotron. As soon as the beam has reached its designated speed, it gets dispatched to one of the available treatment rooms and the patient waiting inside the room is treated. The beam device can only serve one room at a time.

This facility layout is related to the one described by Vogl et al. (2018b), who solve the long-term deterministic appointment scheduling problem, striving to maximize overall resource usage. This long-term optimization, however, does not consider uncertainty in appointment durations. Therefore, we go beyond that existing work by the stochastic optimization approach described above. The considerably increased computational complexity of this approach makes it necessary that we do not solve the long-term problem as a whole, but rather decompose it into short-term planning problems for \(D = 5\) days each, corresponding to a week from Monday to Friday.

During the five days of our planning horizon, each patient \(p \in \mathcal {P}\) must attend a predefined number \(N_p\) of irradiation treatments in a predefined treatment room. The constraints defining feasible treatment patterns are based on medical considerations and were provided by the ion beam facility:

(a) Patients finishing treatment in the given week have between 2 and 5 irradiation treatments left (the initial treatment plan over the long-term planning horizon forbids a final week with a single left treatment). They must engage in a treatment activity on each day starting on Monday until the number of missing treatments is met. Hence, there is no flexibility in the treatment pattern for these patients.

In our notation, this means that for a patient p finishing treatment in the given week, \({{\mathcal {X}}}_p\) is one of the four singleton sets \(\{(1,1,0,0,0)\}\), \(\{(1,1,1,0,0)\}\), \(\{(1,1,1,1,0)\}\) or \(\{(1,1,1,1,1)\}\).

(b) Patients starting their treatment in the given week need to receive between 3 and 5 treatments. The idea is to let those patients commence their treatments as late as possible, to ensure a transition to the next week without breaks. Consequently, the first treatment day is Wednesday, Tuesday, or Monday, for 3, 4 and 5 treatments, respectively.

Thus, for such a patient, \({{\mathcal {X}}}_p\) is one of the three singleton sets \(\{(0,0,1,1,1)\}\), \(\{(0,1,1,1,1\}\), or \(\{(1,1,1,1,1)\}\).

(c) All other patients have either 4 or 5 treatments scheduled in the current week.

If patient p gets 5 treatments, then \({{\mathcal {X}}}_p\) is the singleton \(\{(1,1,1,1,1)\}\). If patient p gets 4 treatments, an earliest weekday d(p) for the treatment break is pre-defined. Hence, for such a patient, \({{\mathcal {X}}}_p\) is the set

$$\begin{aligned} \{(x_1,\ldots ,x_5) \in \{0,1\}^5 \, | \, \sum _{d=1}^5 x_d = 4 \quad \text{ and } \quad x_d = 1 \text{ for } d < d(p)\}. \end{aligned}$$

A daily treatment (DT) consists of three sub-activities: in-room preparation of the patient, irradiation, and post-irradiation exiting. All these three sub-activities occupy the predefined treatment room; in addition, the irradiation also blocks the beam for the exclusive use of the current patient p.

There are multiple reasons that might cause disruptions to the daily executed baseline schedule, making the radiotherapy appointment durations highly stochastic: (1) The in-room preparation might take considerably longer than expected if the positioning of the patient (i.e., the verification that the patient lies on the treatment bench correctly) fails and a second attempt is necessary. Furthermore, patients are sometimes less mobile and need additional help when entering the room. (2) If a patient moves during the irradiation or needs a break, the irradiation needs to be either interrupted or even aborted. An interruption leads to an extension of the planned activity duration; aborting the irradiation leads to a shorter than planned activity. (3) Machine error or room unavailability might cause an activity to take longer than expected. For example, a room might need to be cleaned after a patient exit, which delays the start of the next patient’s treatment in the same room.

Hence, we consider the actual activity durations of the preparation \((i=1)\), irradiation \((i=2)\) and exiting \((i=3)\) activities for each patient p and each day d as random variables \(\varTheta _{pid}\), in accordance with the general model introduced in the previous subsection. The distribution \({{\mathcal {D}}}\) of these random variables will be estimated in Sect. 5 for our specific application instance from data on patients that have been treated throughout one complete year.

The cost function F(S) is composed of three terms that are weighted by parameters \(\lambda _1\), \(\lambda _2\) and \(\lambda _3\). All three components of the objective function are measured in minutes and need to be minimized. The components are: (i) The actual beam active time for each day d, denoted by \(\varphi _{d}\), which is defined by the finishing time of day d’s last activity on the beam resource. This component measures the economic use of the beam resource. (ii) The time window violations, denoted by \(\gamma _{pd}\), which are caused by treatment appointments that are scheduled past their due time (latest time) \(\tau _p\) on day d and are measured by the amount of excess of the appointment times over the times \(\tau _p\). (iii) The actual waiting time of all patients \(p \in \mathcal {P}\).

The patient waiting time consists of two parts: (i) pre-preparation waiting time, denoted by \(\delta _{pd}\), which reflects any delay in the start of the preparation activity, and (ii) pre-beam waiting time, denoted by \(\rho _{pd}\), which gives the time span between the completion of the in-room preparation activity and the actual start of the irradiation activity.

Thus, in total,

$$\begin{aligned} F(S) = \lambda _1 \cdot \sum _{d=1}^D \varphi _d + \lambda _2 \cdot \sum _{d=1}^D \sum _{p=1}^n \gamma _{pd} + \lambda _3 \cdot \sum _{d=1}^D \sum _{p=1}^n (\delta _{pd} + \rho _{pd}), \end{aligned}$$
(6)

where patients \(p \in {{\mathcal {P}}}\) have been indexed by \(p=1,\ldots ,n=|{{\mathcal {P}}}|\). Recall that the cost function is not applied to the baseline schedule \(S_{bas}\), but to the schedule \(S = S(\omega )\) produced by the reactive procedure while processing the actual realizations of the random activity durations. In particular, the variables \(\varphi _d\), \(\delta _{pd}\) and \(\rho _{pd}\) occurring in (6) are random variables, as they depend on the random durations; the overall cost is therefore a random variable as well. Its expected value gives the objective function according to Eq. (1).

The two types of waiting times, resulting from deviations of the actual activity durations from the planned ones, are visualized in Fig. 1. Patient P1’s irradiation treatment took longer than expected, which delayed P2’s irradiation activity. However, when P2 started the in-room preparation, the delay of P1 was not foreseeable, and P2 had to wait for the start of the irradiation activity. In addition, P1’s exit was extended. Therefore, P4 could not enter room 1 on time, leading to a delayed start of P4’s preparation activity. Note that P3’s preparation started later than expected as well, considering P3 was supposed to start preparation by the time P1 finished the irradiation. The delay of P1’s irradiation finish has a direct impact on P3’s preparation starting time. Section 4.2 details different reactions to deviations from the baseline schedule.

Fig. 1
figure 1

Exemplary Planned Baseline Schedule and Actually Executed Schedule (white: preparation times; dark gray: irradiation times; medium gray: exiting times; light gray: pre-beam waiting times)

4 Methodology

Let us now turn to the question how the generic model architecture from Sect. 3 can be treated numerically. For this purpose, five components have to be specified: (1) the concept for computing buffer sizes from the buffer parameter \(\beta \), (2) the procedure \(\mathtt{sched}\), computing a baseline schedule \(S_{bas}\) from a given design Z, (3) the reactive procedure \(\mathtt{react}\), determining “on the fly” the actual schedule \(S(\omega )\) from the baseline schedule \(S_{bas}\) and the current realizations of the activity durations \(\varTheta _{pid}\), (4) the way how the expected value in the objective function of (2) is evaluated, and (5) the algorithm for performing the optimization (or: heuristic optimization, if the problem cannot be solved to optimality).

From a chronological perspective, the following steps are taken:

  1. 1.

    At the beginning of a week, a design has to be determined that remains fixed throughout that week. The true activity durations are of course not known in advance, but the underlying distributions are given as an input.

  2. 2.

    From day to day, actual decisions on the starting times have to be derived from the design and the actual observations. The realizations of the random variables representing the activity durations are revealed successively over each day and processed by the reactive procedure \(\mathtt{react}\), turning a baseline schedule into an actual schedule.

4.1 Buffer concept and schedule generation procedure

To create robust baseline schedules, we include time buffers (Van De Vonder et al. 2005) in the planned activity durations. The planned activity duration for activity (pid) including the time buffer, denoted by \(t_{pid}\), is governed by a global buffer parameter \(\beta \in [0,1]\) and is defined as the \(\beta \)-quantile of the marginal distribution \({{\mathcal {D}}}_{pid}\) of the corresponding activity duration \(\varTheta _{pid}\). In other words, \(t_{pid}\) is that value for which \(P(\varTheta _{pid} \le t_{pid}) = \beta \). Since \(\beta = 0.5\) gives the median of the corresponding activity duration, only values \(\beta \ge 0.5\) make sense in view of the aim of increasing robustness. We chose the determination of the buffer sizes through a quantile because the \(\beta \) parameter can be applied to different distributions without the need of rescaling the value of \(\beta \).

The schedule generation procedure \(\mathtt{sched}\) takes a design, i.e., a treatment plan X, a patients priority list \(\pi \), and a buffer \(\beta \), and computes from this input a baseline schedule \(S_{bas}\). A concrete example for X and \(\pi \) is shown in Fig. 2.

Fig. 2
figure 2

Treatment plan X and patients priority list \(\pi \) for the radiotherapy application case, \(n=15\) patients

figure a

Algorithm 1 shows the pseudocode of the schedule generation procedure \(\mathtt{sched}\). The planned starting times of the activities, \({\bar{s}}_{pid}\), are determined according to a strategy first suggested by Vogl et al. (2018a). For each day, treatments of patients are inserted sequentially into the schedule. The sequence of insertions for a specific day is given by the global patient sequence \(\pi \), except that any patients who do not have a planned treatment on that particular day get removed from the daily list. The three appointment phases (in-room preparation, irradiation, and exiting) need to be scheduled without idle time when constructing the baseline schedule. That is, we can fix the starting time of the preparation activity and deduce the other starting times from that value. The earliest time when the preparation activity for a patient p can start is the ready time \(r_p\) of the patient-specific time window \([r_p, \tau _p]\). It is examined if \(r_p\) is a feasible starting time across all required resources over all activity phases. If yes, we fix the starting time and block the resources accordingly. If not, we increment the planned starting time until we find a feasible insertion position. If the final starting time for patient p is larger than the corresponding due time \(q_p\), we record a penalty \(\gamma _{pd}\) in the objective function.

This approach differs from pure chronological scheduling in that we can fill “holes” in the schedule, as was proven to be beneficial by Vogl et al. (2018a) in a deterministic and static setting. Holes might occur if a patient with a later ready time appears early in the patient list, or if two patients are assigned successively to the same treatment room, creating idle time for the beam resource. In this second scenario, we might schedule another patient who requires a different treatment room in the interim and thereby minimize beam idle time.

4.2 The reactive procedure

As already anticipated in Sect. 3.1, the procedure react determines an actual schedule \(S(\omega )\) based on a baseline schedule \(S_{bas}\) and random activity durations. In this context, it is assumed that the random variables \(\varTheta _{pid}\), corresponding to the activity durations, are not realized up front, but rather revealed successively during the execution of the reactive procedure. By adopting this scheme, the procedure tries to mimic a human planner, who also has to deal with longer (or shorter) patient preparation, irradiation or exit times as they occur. However, this is not the only reason for the specific design of the procedure. It was an initial requirement of the ion beam facility’s managers to make it flexible and generic enough to be able to incorporate day-time dependent probability distributions and/or correlation between treatment times of subsequent patients. In essence, the reactive procedure not only provides the “true” durations and thus the starting and completion times of activities, it also allows for a preponement of activities in case the sampled durations of preceding activities are smaller than projected in the baseline schedule. A detailed description of the reactive procedure is given in Appendix D.

4.3 Solution evaluation

In view of the complexity of the functions \(\mathtt{sched}\) and \(\mathtt{react}\), it seems hopeless to look for an analytic representation of the objective function of (2) as a function of the design \((X,\pi ,\beta )\). Therefore, we resort to Monte Carlo simulation in order to get a sufficiently precise approximation of the expected value occurring in (2). The simulation method approximates an expected value with respect to a distribution \({{\mathcal {D}}}\) by an average over a sample of randomly selected realizations drawn from \({{\mathcal {D}}}\) or from a related distribution. If the distribution is not changed during sampling (the latter is done, e.g., in the so-called importance sampling method), the weights of the realizations in the computation of the average are to be chosen as identical, which is the option we implemented.

It is obvious that the accuracy of the approximation is the better, the larger the sample size is. We shall work with different sample sizes, depending on the needed precision of the estimate (for details, see Sect. 5).

The formula for the estimation of the objective function value of a given design \((X,\pi ,\beta )\) is shown in Eq. (7) below. A number H of sets of realizations of random variables, i.e., activity durations \(\varTheta _{pid}\), \(p \in {{\mathcal {P}}}\), \(i=1,2,3\), \(d =1,\ldots , D\), are generated i.i.d. from distribution \({{\mathcal {D}}}\). Note that activity index \(i=1\) corresponds to the in-room preparation, \(i=2\) to the irradation and \(i=3\) to the post-irradiation exiting. For each set h, specified by concrete realizations \(\varTheta _{pid}^h\) of the activity durations, we apply the reactive procedure \(\mathtt{react}\) to the baseline schedule \(\mathtt{sched}(X,\pi ,\beta )\) and the durations \(\varTheta _{pid}^h\), and compute the three terms in the objective function Eq. (6). This produces the following SAA estimate:

$$\begin{aligned} {\bar{F}} = \frac{1}{H} \bigg (\sum _{h=1}^{H} \Big [\lambda _1 \cdot \sum _{d=1}^{D} \varphi _{d}^h + \lambda _2 \cdot \sum _{d=1}^{D} \sum _{p=1}^{n} \gamma _{pd} + \lambda _3 \cdot \sum _{d=1}^{D} \sum _{p=1}^{n} (\delta _{pd}^h + \rho _{pd}^h)\Big ]\bigg ). \end{aligned}$$
(7)

Therein, \(\varphi _{d}^h\) denotes the actual beam finishing time in the hth realization, and \(\delta _{pd}^h\) and \(\rho _{pd}^h\) are the corresponding actual waiting times. Note that the time window violations, \(\gamma _{pd}\), can already be calculated directly from the baseline schedule and do therefore not depend on the sampling procedure.

Equation (7) produces an unbiased estimate of the objective function value. We abbreviate the described evaluation strategy by STO. A disadvantage is that evaluation strategy STO is computationally expensive already for a medium-sized number of realizations. Therefore, we also investigated two faster ways of getting estimates (though not unbiased ones anymore) of the true objective function value:

  1. 1.

    A deterministic approach (DET) to the problem approximates actual beam active time by the deterministic beam active time of the baseline schedule \(X_{bas}\). Potential waiting times of patients are not taken into consideration at all. Consequently, this approach systematically underestimates the true objective function value. On the other hand, as the buffer increases, waiting times will diminish in general, possibly making the deterministic approach more competitive.

  2. 2.

    A quasi-deterministic variant that we call “waiting time estimation strategy” (WTE) approximates actual beam active time again by the deterministic beam active time \({\hat{\varphi }}_d\) of the baseline schedule. However, we do not neglect waiting times in WTE, but estimate actual waiting time by leveraging the observed correlation between idle time on the beam resource (excluding set-up time to particle type switches) and actual patient waiting time. Using this correlation, we can estimate patient waiting time by fitting a linear regression of the form

    $$\begin{aligned} (\text {waiting time}) = A + B\times (\text {baseline schedule idle time}). \end{aligned}$$
    (8)

    The data for this regression is gathered during the heuristic optimization from some specific intermediate schedules (created by the heuristic) and evaluating the sample averages of waiting times for those schedules. The regression gets updated regularly during the execution of the algorithm.

4.4 The (Meta-)heuristic solution approach

We apply a GA metaheuristic, as implemented successfully for various radiotherapy scheduling problems (Petrovic et al. 2009; Petrovic and Castro 2011; Vogl et al. 2018a, b), all of which deal with deterministic variants of the problem presented here. We modify the GA variant published in Vogl et al. (2018b), which relies on the offspring selection GA introduced by Affenzeller and Wagner (2004), such that it operates on the problem-specific solution encoding of a “design”, as presented in Sect. 3.1. Note that the patient priority list as a part of a design is held constant across all days of the planning horizon. On the one hand, allowing for individual patient sequences per day would increase the problem’s combinatorics dramatically, leading to excessive running times of the GA. On the other hand, in view of the desired adherence to time windows, synchronizing the patient sequences on successive days is at least not unreasonable.

The pseudocode of the used variant of the GA is shown in Appendix B of this paper. The offspring selection process of the GA favors individuals that outperform the fitness of at least one of their parents, where the fitness of an individual is given in our radiotherapy application by Eq. (1), with F defined by Eq. (6) and approximated by Eq. (7). The number of reproductive steps to build the next generation of individuals is limited.

The designs forming the initial population within the GA are generated in a randomized greedy fashion. For each patient p we choose a random treatment pattern \(x_p\) from the set \({{\mathcal {X}}}_p\) of feasible treatment patterns. This produces a treatment plan X. The priority list \(\pi \) is either purely randomized or sorted according to the due times \(\tau _p\) of the patients, with slight random variations. The initial value of \(\beta \), finally, is chosen at random between 0.5 and 0.99.

Selection of individuals is done using the rank selection operator. Additionally, a fixed percentage of best individuals of the current generation survive, i.e., they are included in the next generation (“elitism”). The used crossover operators should preserve feasibility of the solutions generated from two parent solutions. Hence, we use the well-known position-based crossover for the patient priority list. The treatment plan is simply inherited by either parent #1 or parent #2 according to a random choice. The value of \(\beta \) is inherited from one of the parents in the same way. To create more diverse descendants, we use mutation operators applied to all three parts X, \(\pi \) and \(\beta \) of the design Z: (i) the treatment pattern \(x_p\) of a single random patient p is reset and newly generated, (ii) a randomly chosen patient shifts to a random new position in the priority list, and (iii) \(\beta \) is multiplied by a random number between 0.75 and 1.25.

5 Case study and data analysis

This section presents the practical case we faced. To estimate the underlying probability distributions of durations of the three radiotherapy activities (preparation, irradiation, and exiting), we analyzed real-world data from 113 patients and 2270 irradiation appointments. The data was collected in a newly opened ion-beam therapy center in Wiener Neustadt, Austria, among patients treated in 2017.

The main problem was to find a good compromise between distribution models that precisely reproduce the available past data, which would lead to over-fitting, and very general distribution models on the other hand, which would lead to under-fitting. We found that widely used distribution types for activity durations such as the lognormal distribution or the beta distribution yielded only a very poor fit with our data. That is why we searched for distribution types that better represent our data. We used Easyfit 5.6Footnote 2 as a dedicated software tool for this purpose.

It was assumed that the preparation and exiting activities of all patients follow the same family of distribution functions. The best-fitting family of distribution functions for these two types of activities turned out to be that of Burr distributions, with the general probability density function

$$\begin{aligned} f(x) = a \cdot k \cdot (\frac{x}{b})^{a - 1} \cdot \frac{1}{b \cdot (1+(\frac{x}{b})^a)^{k+1}} \end{aligned}$$
(9)

and the concrete parameter values shown in Table 10 (see Appendix C). Here, b is a scale parameter, and a and k are shape parameters. The resulting density functions are shown in Fig. 5 of Appendix C.

For the irradiation activities, we conjectured that a different family of distribution functions might probably give a better fit, and that the concrete parameters of the pdf might heavily depend on patient characteristics. Therefore, we clustered the patients into four groups according to treatment complexity and planned activity durations, and applied Easyfit to each of these groups. It turned out that in this case, the Dagum distribution, also known as the emphInverse Burr distribution, achieves the best fit. Again, the parameter values can be found in Appendix C (Table 10). A graphical representation of the four density functions is shown in Fig. 3.

King (2017) provides details on the families of Burr and Dagum distribution functions. All distributions are asymmetric and right-skewed, reflecting the comparably large probability of outliers that exhibit considerably higher duration than the expected values.

Fig. 3
figure 3

Distribution of Duration of Irradiation Activity for 4 Patient Groups

6 Results

This section provides results of extensive computational tests on randomly generated problem instances of varying sizes. In Sect. 6.1, we begin with a brief description of the randomly generated instances, optimization parameters, and environment used for the computational study. Then, Sect. 6.2 addresses the problem of determining the optimal buffer parameter \(\beta \), and Sect. 6.3 thoroughly compares the solution evaluation procedures from Sect. 4.3 on the different buffer parameter values \(\beta \) and objective function weights. Finally, Sect. 6.4 compares our GA approach to simple rules of thumb that might be used by a human planner.

6.1 Experimental setup of computational tests

To generate instances for the computational tests, we use data from MedAustron (see Sect. 5). We analyze test instances where patients are randomly assigned to the four groups described in Sect. 5 according to the following probabilities: with probability 43%, 29%, 22% and 6%, a patient is assigned to group 1 to 4, respectively. A sequence-dependent set-up time of 3 min is considered if two patients with different beam types (protons or carbon ions) are scheduled sequentially on the beam resource. We randomly assign the beam type to patients, such that 50% receive proton therapy and the other 50% are irradiated with carbon ions. The distribution of patients among the three treatment rooms is assumed to be balanced, with a probability of 33% for each room.

For the required number of treatments and the corresponding treatment pattern, we distinguish nine patient classes. The associated treatment patterns (cf. Sect. 3) are shown in Table 1. We generated 16 random instances. Instances 1 through 8 select patients from the nine classes according to the first column of probabilities in Table 1, whereas instances 9 to 16 use the second column.

Table 1 Classes of Patients and Corresponding Treatment Patterns

The instance size varies from 30 to 100 patients. The ready times for the daily treatments and the time window length are chosen randomly, where the average length of the time window slightly increases with the ready time: the average time window length for the first half of the day is 276 min, and it increases to 360 min for the second half of the daily planning horizon. Furthermore, we assume that 20% of patients do not have time window preferences. We test multiple combinations of the objective function weights \(\lambda _1, \lambda _2, \lambda _3\) for the actual beam active time, the time window violations, and the actual patient waiting time, respectively, including the balanced case \(\lambda _1 = \lambda _2 = \lambda _3 = 1.0\) and two cases favoring exploitation of beam capacity over patient-centered waiting time and time window violations, namely, \(\lambda _1 = 1.0, \lambda _2 = \lambda _3 = 0.5\) and \(\lambda _1 = 1.0, \lambda _2 = \lambda _3 = 0.1\).

For the GA from Sect. 4.4, we use the following parameters which have proven to be beneficial in preliminary tests: the population size is 100, the mutation rate is 10%, and the elite segment is 1% of the population. We aim to build 70% of the new population from children who meet the defined success criterion, while the number of reproductive steps is limited to 500 (if we fail in building a new population within this limit, we fill up the population with random children).

All algorithms were implemented in C++, and the experiments were carried out on the Vienna Scientific Cluster (VSC3) equipped with compute nodes with two Intel Xeon E5-2650v2, 2.6 Ghz, and 8 core CPUs each. The runs consisted of two phases, which will be described below. The CPU time for in phase 1 was chosen as n/10 h, where n indicates the number of patients in the corresponding instance. The CPU time in phase 2 was 3600 s.

6.2 Phase 1: optimal buffer determination

As discussed in Sect. 4.1, the value \(\beta \) of the buffer parameter has an essential influence on the performance of the approach and is therefore part of the decision. In principle, \(\beta \) could be (heuristically) optimized by our GA together with the other components of the design \(Z = (X,\pi ,\beta )\) specifically for each new week, based on the actual instance parameters for the current week, such as the exact number of patients per group, the assigned treatment patterns, time windows, etc. However, since the distribution of patient characteristics does not essentially change from week to week, this procedure is not the most efficient one. In preliminary experiments, we observed that the optimal value of \(\beta \) turned out as very stable under different instances generated according to the distribution described in Sect. 6.1, as long as the total number of patients and the weights in the objective function were kept fixed. Therefore, it saves computation time and does not deteriorate the results if the value of \(\beta \) is considered as a strategic decision, made only once for a larger number of weeks in which no essential change in the distribution of patient characteristics is to be expected, whereas the choices of X and \(\pi \) (and thus of the baseline schedule) are operational decisions to be made at the beginning of each week for the current data. In the current subsection, we deal with the strategic decision on \(\beta \), and will turn to the operational decision in the next subsection.

The main advantage of considering the choice of \(\beta \) as a strategic decision is that a much larger runtime can be devoted to a decision that has not to be repeated each week. We exploit this by creating larger samples of size 1000 during the evaluation of the objective function when applying the evaluation procedure STO from Sect. 4.3 in the context of the heuristic optimization of \(\beta \), increasing in this way the precision of the estimate. For the operational (i.e., weekly) planning, a smaller samples size will be used in order to be fast. In these latter computational runs, the value of \(\beta \) is not varied anymore by the GA, but “frozen” to the pre-determined value from the strategic computation.

In the strategic run, we executed the GA with computation times of 3, 5, 7, and 10 h for four different instance sizes n, namely 30, 50, 70, and 100 patients. Table 2 summarizes the findings for the patient mix described in Sect. 6.1 for different values of n and the vector \((\lambda _1,\lambda _2,\lambda _3)\) of objective function weights indicated above (characterized by \(\lambda _3\)). It can be seen that for a given vector \((\lambda _1,\lambda _2,\lambda _3)\), only small differences over the instance sizes n result. For each line, we find clear outliers in both the minimum and maximum optimized buffer parameter. Yet the inter-quartile range is generally small, leading us to conclude that the optimal buffer sizes are approximately 0.66 to 0.70 for \(\lambda _3 = 0.1\), 0.78 to 0.80 for \(\lambda _3 = 0.5\), and 0.80 to 0.83 for \(\lambda _3 = 1.0\).

Table 2 Statistics of optimized buffer parameters \(\beta ^*\) for 16 random instances with patient mix probabilities \(\{43\%, 29\%, 22\%, 6\%\}\); 16 replications per instance

To investigate the dependence of the optimal buffer parameters on the patient mix, we also generated instances with artificial patient mixes. Again, each instance ran 16 times with different random seeds. Table 3 presents the average optimized buffer parameters \(\beta ^*\) for different patient mixes and instance sizes. The instances with patients from groups 3 (\(\{0\%, 0\%, 100\%, 0\%\}\)) and 4 (\(\{0\%, 0\%, 0\%, 100\%\}\))—which feature the largest variance and simultaneously the highest expected activity durations—result in the smallest optimal buffers. Especially for patient group 4, beam active time increases drastically with a higher buffer percentile. This effect gets smaller as \(\lambda _3\), the weight of waiting time in the objective function, increases and the importance of the beam active time simultaneously diminishes.

The optimized buffer parameters \(\beta ^*\) of the other arbitrarily chosen patient mixes only slightly deviate from the optimized buffers that result from the patient mix we observed in the real-world data sets.

Table 3 Optimal buffer parameters of artificial patient mixes, 16 replications per instance
Table 4 Average results of 16 randomly generated instances and 16 replications for each instance
Table 5 Results of the Wilcoxon–Mann–Whitney test on a pairwise comparison of methods STO, DET and WTE

6.3 Phase 2: schedule optimization

The second part of our computational study focuses on the comparison of the different approaches DET, STO and WTE, respectively, to approximate the true objective function, i.e., the expected costs (see Sect. 4.3). For STO, preliminary test have shown that setting \(H=100\) during the intermediate evaluations in the GA yields a good compromise between evaluation accuracy and running time limits. In order to be able to compare the final results produced by the three approaches on safe grounds, we perform objective function evaluations based on 1,000,000 realizations of activity durations. Table 4 reports the average results for overall expected costs (i.e., the weighted objective), beam active duration, waiting times, and penalties for time window violations, for \(\lambda _1 = 1.0\), \(\lambda _2 = 0.5\) and \(\lambda _3 = 0.5\). Some general patterns can be observed.

First, the waiting time is the smallest for the stochastic optimization variant STO, as expected. DET and WTE may produce extremely large waiting times, especially when the buffer parameter \(\beta \) is small. The advantage of STO diminishes with an increasing buffer parameter \(\beta \), as might be expected. The larger the buffer parameter, the longer the planned activity duration, and the smaller the probability that a patient will take longer than the planned duration. Secondly, the beam active time increases with growing buffer parameter \(\beta \), yet the increase is surprisingly small. Third, the penalty term is almost negligible for most instances. However, for very large buffer parameter values, the total planned waiting time becomes so large that the time windows preferred by the patients cannot be respected anymore, leading to a significant penalty in the objective function. A buffer parameter of 0.90 or larger would only be optimal if the waiting time weight in the objective function were extremely high, which is not realistic in practice. Similar effects can be noticed when modifying the objective function weights, as shown in Appendix E for \(\lambda _2 = \lambda _3 = 0.1\) (see Table 11) and \(\lambda _2 = \lambda _3 = 1.0\) (see Table 12).

To assess the quality of the three approaches, we also performed statistical tests. The well-known Wilcoxon-Mann-Whitney test was applied to pairwise combinations of methods DET, STO and WTE. For each combination of \(\beta \) and \(\lambda _3\) and for each of 16 random instances, we analyzed 16 runs of the respective methods with different seeds. We chose a significance level of \(\alpha = 0.05\). Table 5 shows the results of the significance tests for 100 patients (\(\lambda _3 = 0.5\)). The values in the table indicate the number of instances in which one of the methods is significantly better, or where the tests reveal no significant difference among methods, respectively. The full table with results from the significance tests also for \(\lambda _3 = 0.1\) and \(\lambda _3 = 1.0\) is provided in Appendix E (Table 13).

When comparing the behavior of the three solution approaches, we can identify different patterns for different regimes of the buffer parameter \(\beta \): (1) For small \(\beta \), the stochastic approach STO has a clear advantage, in that it considers the waiting time of the patients directly by applying Sample Average Approximation to the evaluation of each solution candidate in the population. This effect gets stronger with a larger waiting time weight in the objective function (see Table 4). (2) Using the optimized buffer parameter values (phase 1), the picture changes slightly, and all solution approaches lead to comparable results. A small advantage accrues to the WTE approach, which performs slightly better than DET on average, followed by STO. However, the differences among the three approaches are rarely statistically significant in this buffer size regime. (3) A higher than optimal \(\beta \) favors DET over other approaches (though not statistically significantly, as Table 5 shows).

Fig. 4
figure 4

Average Results for 100 Patients from Table 12, Graphically

Although the stochastic STO approach usually does not provide superior results for operational (weekly) planning problems for optimized buffer sizes, relative to the two other approaches, it is required to solve the strategic problem of determining the optimal buffer parameter itself. For this purpose, it cannot be replaced by the other approaches.

Figure 4 shows the evolution of mean waiting times and beam active times, depending on the buffer parameter \(\beta \). Again, we see that the different solution approaches DET, STO, and WTE differ substantially in the waiting times for small \(\beta \). The larger the buffer parameter, the more similar the results.

6.4 Comparison to simple rule-based approaches

To be able to better assess the results achieved by the GA, as discussed in Sect. 6.3, we compare them to simple rule-based approaches that could also be employed by a human planner. The first rule, referred to as the “latest starting time” (LST) rule, has already been described in Sect. 4.4 in the context of the GA’s initial population. It creates a patient sequence by simply sorting the patients in non-decreasing order of their latest starting times. In case the same room is occupied by two directly consecutive patients, the decoder would account for this through its “hole-filling” strategy. The LST rule also exists in a randomized version, constructing the patient sequence in a stepwise manner. At each step, the patient for the next sequence position is randomly chosen from a list of patients with the ten smallest latest starting times among all remaining ones. To account for the beam-on time, it might be beneficial to schedule the patients in a sequence that causes the least idle time on the beam resource. Again, this is done in an iterative fashion, based on the patients’ earliest availabilities. We refer to this rule as the “earliest release time” (ERT) rule. A completely different approach is to rank the patients according to their treatment time variance in non-decreasing order. The idea of this “least duration variance” (LDV) rule is to move patients with highly varying treatment durations to later times of the day. The stable time windows are ignored, however. Finally, we take a look at purely randomized patient sequences, to get a glimpse of what is actually the absolute baseline performance.

Table 6 Performance of simple rule-based approaches for generating patient sequences (100 patients, \(\beta =0.8\))

Table 6 summarizes the results obtained by the rules described above and compares them to the best solution obtained by the GA under the same experimental conditions. The computational evaluation was conducted using a set of 16 instances with 100 patients (see Sect. 6.1), objective function weights \(\lambda _1=\lambda _2=\lambda _3=1.0\), a buffer parameter \(\beta =0.8\) and a sample size of 1000000. It can be observed that the two LST-based rules perform best among the simple rules from an overall perspective. The other rules achieve exactly what they were designed for, with obvious side effects: LDV is in fact able to achieve the lowest waiting time, but entails also the second highest time window violation penalties. A similar effect arises for ERT which in turn leads to the lowest beam on time.

6.5 Linkage to chance-constrained programming

From the facility’s management perspective, it might be desirable to guarantee, or at least advertise a certain level of service or reliability. Still considering the patients’ waiting time as the most crucial factor, this could be about trying to keep it within a certain range. A threshold value or upper bound on this waiting time might reflect a “bearable” time span that patients are still willing to spend in the waiting area of the facility before getting too impatient. It is not always necessary to ensure ultimate adherence to that threshold value for all patients, especially under uncertainty. Rather, it is common to define a certain probability by which the threshold can be exceeded. This is usually covered by chance-constrained programming (CCP) (Geletu et al. 2013; Shylo et al. 2013; Deng et al. 2019, see,e.g.). However, due to the complexity of the second stage decisions, involving non-linear decision making by the reactive procedure described in Sect. 4.2, we currently see no way of capturing this part of our approach in a classical, analytical stochastic programming formulation. Therefore, also the implementation of chance constraints in the conventional sense appears out of reach. Nevertheless, we try to cover the reliability aspect by adopting a slightly different approach by modifying the objective function such that a “stochastic threshold” can be considered.

Table 7 Average waiting time and threshold violation penalties for instances with 100 patients and various combinations of buffers and waiting time thresholds

We define a threshold value \({\overline{W}}\) for the average (total) waiting time of a patient and count the number of cases in which that threshold value is exceeded (across a sample of size H). The corresponding relative quantity can be interpreted as the probability of violating the mean waiting time upper bound constraint. To limit that probability to a predefined value, denoted by \(\alpha \), we modify the existing, sampling-based objective function F in the following fashion:

$$\begin{aligned}&{\tilde{F}} = \frac{1}{H} \bigg (\sum _{h=1}^{H} \Big [\lambda _1 \cdot \sum _{d=1}^{D} \varphi _{d}^h + \lambda _2 \cdot \sum _{d=1}^{D} \sum _{p=1}^{n} \gamma _{pd}\Big ] \bigg ) + \nonumber \\&\quad M \cdot \left\{ \frac{1}{H} \cdot \sum _{h = 1}^{H} {\mathbb {I}} \! \left( \frac{1}{D \cdot n} \sum _{d=1}^D \sum _{p=1}^n (\delta ^h_{pd} + \rho ^h_{pd}) \ge {\overline{W}} \right) - \alpha \right\} ^+, \end{aligned}$$
(10)

where \({\mathbb {I}}\) is an indicator function, yielding the value 1 if the expression passed as an argument evaluates to “true” and 0 otherwise. The parameter M is a penalty cost coefficient that is also provided as a constant. When embedding the overall objective function \({\tilde{F}}\) in our GA, we gradually increase M during the run of the GA to even more strongly enforce the creation of “feasible” solutions. Given a particular value \(\alpha \), a solution is called feasible if the proportion of the sample violating the waiting time upper bound \({\overline{W}}\) stays below \(\alpha \). Table 7 shows an analysis of GA runs using objective function \({\tilde{F}}\) (Eqn. (10)). It is based on 16 instances with 100 patients (16 independent runs per instance), \(\alpha =0.01\), \(\beta \in \{0.6, 0.7, 0.8\}\), \(M=10000\), and waiting time thresholds (\({\overline{W}}\)) of 10 and 20 min, respectively.

It can be observed that it is not possible to ensure that in less than 1% of the cases the threshold value for \(\beta =0.6\) and \(\beta =0.7\) and \({\overline{W}}=10\) is exceeded. Being able to ensure average waiting times of less than 10 min with a probability of 99%, as it is the case when setting \(\beta =0.8\), might be an appealing goal, also from a practical perspective. We think that this confirms once more the effectiveness of the buffer approach also in this quite specific setting.

7 Conclusion

The present study confirms the importance of considering stochastic activity durations, well-known in the literature on appointment scheduling, for the case of radiotherapy scheduling. In this context, an issue complicating the scheduling process is the presence of pre-treatment and post-treatment activities, the durations of which are stochastic as well. In total, this gives rise to a stochastic decision process with complex dependencies resulting from the precedence constraints at the patients’ level on the one hand, and from the fact that the beam resource is shared by multiple competing treatment rooms on the other hand.

To account for possible variations in activity durations and to produce more robust baseline schedules, we rely on a buffer parameter describing the quantile of the fitted distribution to which the planned activity durations are to be enlarged. The determination of the optimal buffer size itself requires stochastic sampling, as the objective function value in our model (the mathematical expectation of a quantity resulting from a complex scheduling procedure) cannot be computed analytically. In an attempt to save computation time at least in the course of the (heuristic) optimization of the schedule on the operational level of weekly planning, we also investigate two alternatives to stochastic sampling, namely a deterministic variant that completely ignores waiting time, and a quasi-deterministic variant that estimates the expected waiting time of a schedule through linear regression. Our numerical results suggest that if the buffer size has already been adjusted optimally, then on the operational decision level, stochastic sampling can be replaced by the faster regression-based approach without loss of solution quality.

Our approach might also be applicable to similar settings in other ion beam facilities, as long as only one beam resource is involved. The number of treatment rooms, on the other hand, can be arbitrary. Especially the buffer concept is quite generic, since it is independent of the fitted distribution(s).

We plan to continue studying more advanced techniques to estimate actual waiting times by identifying appropriate features of the baseline schedule. In particular, more sophisticated regression approaches from machine learning domains may be helpful. Solving the underlying problem to optimality using two-stage stochastic programming would be another interesting stream of research, which, however, would possibly require a more stylized reactive scheduling strategy to make the expected cost of the recourse action representable by explicit mathematical expressions.