Flexible Services and Manufacturing Journal

, Volume 26, Issue 4, pp 466–489

Susceptibility of optimal train schedules to stochastic disturbances of process times


  • Rune Larsen
    • DTU TransportTechnical University of Denmark
    • Dipartimento di Ingegneria dell’Informazione e Scienze MatematicheUniversità di Siena
  • Andrea D’Ariano
    • Dipartimento di IngegneriaUniversità degli Studi Roma Tre
  • Francesco Corman
    • Centre for Industrial ManagementKatholieke Universiteit Leuven
  • Dario Pacciarelli
    • Dipartimento di IngegneriaUniversità degli Studi Roma Tre

DOI: 10.1007/s10696-013-9172-9

Cite this article as:
Larsen, R., Pranzo, M., D’Ariano, A. et al. Flex Serv Manuf J (2014) 26: 466. doi:10.1007/s10696-013-9172-9


This work focuses on the stochastic evaluation of train schedules computed by a microscopic scheduler of railway operations based on deterministic information. The research question is to assess the degree of sensitivity of various rescheduling algorithms to variations in process times (running and dwell times). In fact, the objective of railway traffic management is to reduce delay propagation and to increase disturbance robustness of train schedules at a network scale. We present a quantitative study of traffic disturbances and their effects on the schedules computed by simple and advanced rescheduling algorithms. Computational results are based on a complex and densely occupied Dutch railway area; train delays are computed based on accepted statistical distributions, and dwell and running times of trains are subject to additional stochastic variations. From the results obtained on a real case study, an advanced branch and bound algorithm, on average, outperforms a First In First Out scheduling rule both in deterministic and stochastic traffic scenarios. However, the characteristic of the stochastic processes and the way a stochastic instance is handled turn out to have a serious impact on the scheduler performance.


Railway traffic optimizationDelay minimizationSimulationUncertainty

1 Introduction

Railway services are normally managed on the basis of a timetable (detailed plan of operations) that has been planned in advance with suitable time margins to absorb minor delays.

When planning the traffic over a railway network, deterministic running times are usually assumed. Uncertainties are normally dealt with by insertion of appropriate buffer times in the timetable to absorb delays. However, during operations major disturbances cause traffic deviation from the planned timetable and thus may influence the timetable feasibility. In such cases, real-time adjustments of train timing and orders are required to assure compatibility with the real traffic situation and to limit delay propagation. This task is currently performed manually by dispatchers. There is thus a need for advanced decision support tools to reduce the dispatchers’ workload and to satisfy the requirements of infrastructure managers, train operators and passengers (Biederbick and Suhl 2007). Recently, advanced models and algorithms have been developed to reschedule trains in complicated railway areas with dense traffic and multiple delayed trains (see e.g. Cicerone et al. 2008; Corman et al. 2011a; Hansen and Pachl 2008; Kliewer and Suhl 2011; Lusby et al. 2011; Schachtebeck and Schöbel 2010).

In rescheduling of railway operations, the need to quickly find solutions has directed most efforts to develop advanced heuristic methods, that find good solutions with a limited computation effort. This is in line with a general trend for scheduling and managing systems under uncertainty for complex real-world scheduling problems, that use solving procedures based on fully deterministic information (Larsen and Pranzo 2012; Ouelhadj and Petrovic 2009). Such approaches simplify the problems by feeding the scheduling routines with only deterministic and static data and neglecting further uncertainties. However, most real-life scheduling problems, and especially railway traffic scheduling, are inherently dynamic and prone to uncertainties. A key question is therefore whether the quality of the optimized solutions is partially or completely lost when dealing with uncertainty.

Assuming deterministic and static data may cause discrepancies between what is expected to happen in the model and what actually happens when the plan is applied in practice. In fact, the gap between the off-line published plan and the actual schedule put into operations may affect the railway system performance and the comfort of railway passengers heavily.

Research aiming to increase the robustness of railway systems is generally directed at addressing uncertainty to fill the gap between planned and actual schedules, mostly regarding the following directions and their combination:
  1. (1)

    computing schedules with some sort of slack to recover from disturbances,

  2. (2)

    assessing the effects of uncertainties of a pre-defined schedule before implementation.


In (1) optimal schedules can be computed using detailed models and dedicated algorithms, able to take the uncertainties and the dynamic nature of the problem into account. However such methods require the development of complex methods such as robust optimization (Ben-Tal and Nemirovski 2002) or stochastic programming (Ruszczynski and Shapiro 2003) approaches that are generally harder to solve. The approach in (2) is to use simple models able to compute efficient schedules based on deterministic information and to evaluate the effects of uncertainty, e.g. with the use of Monte Carlo simulations.

This enables assessing the effects of stochastic disturbances on a train schedule by running a simulation of the future evolution of the traffic flows in the railway network.

This paper follows the (2) approach. Our aim is to develop a tool able to assess the effects of unpredictable real-time disturbances of railway traffic on a train schedule enabling a more informed evaluation of different rescheduling algorithms. In fact when dealing with real life problems one should also include the possibility that the system might evolve in directions unknown at the moment of optimization, and not considered in the schedule generated by the solver. This concern is especially relevant when comparing solutions obtained by simple dispatching rules with solutions obtained by sophisticated optimization methods, as tighter packed schedules might be more susceptible to disruptions. This work studies the variability of optimized train schedules in presence of uncertainty.

Differently from other approaches focusing on the analysis of worst-case scenarios for simple traffic patterns on a railway line (see e.g. D’Angelo et al. 2009; Meng and Zhou 2011; Shafia et al. 2012), we measure the robustness as the average degradation when scheduling trains in a complex and densely trafficated network affected by numerous disturbances of process times. A train schedule is robust if it is able to absorb disturbances without causing excessive delays of trains from their scheduled arrival and departure times.

A simulation setup is proposed, in which train schedules are computed by minimizing delays in case of deterministically known perturbed operations. Such train schedules are computed by a microscopic optimization-based train scheduler, working at the level of block signals. We compute the process times in seconds. This detail is required to assess the propagation of delay in the network. The resulting solutions are then evaluated under small stochastic variations of process times, that simulate errors in input data due to uncertainty in train tracking, or further unpredictable events. This is done within a detailed microsimulation tool, that allows quantifying the effects of such disturbances precisely.

The paper is organized as follows: Sect. 2 briefly reviews the train scheduling and robustness analysis literature. Section 3 presents the proposed framework for the robustness evaluation while Sect. 4 describes the computational results by evaluating the robustness of three rescheduling algorithms on a complex dispatching area of the Dutch railway network. Finally conclusions and future research directions are highlighted in Sect. 5.

2 Literature on robust railway systems

A growing stream of scientific literature focuses on addressing how to generate robust timetables. However, there is still no clear consensus on the exact definition of robustness, as also discussed in Dewilde et al. (2011). In fact, several definitions have been proposed in the literature (see e.g. Carey 1999; D’Angelo et al. 2009; Fischetti and Monaci 2009; Goerigk and Schöbel 2010; Liebchen et al. 2009; Nielsen et al. 2007; Salido et al. 2008).

Among the papers dealing with robust timetabling problems we distinguish between approaches developing optimization-based timetables with some degree of disturbance robustness, i.e., point (1) of the Introduction, and papers dealing with the assessment of the robustness of a timetable by means of simulation tools, i.e., point (2) of the Introduction.

Liebchen et al. (2010) use an integer programming approach to generate a delay resistant periodic timetable on a part of the German railway network. They sequentially evaluate it according to a delay management strategy. The problem of managing passenger connections is to decide which connections should be kept in the actual plan and which should be dropped (Corman et al. 2012; Ginkel and Schöbel 2007; Kliewer and Suhl 2011; Schachtebeck and Schöbel 2010).

Fischetti et al. (2009) propose a light robustness approach to take robustness of a solution into account and at the same time maintain low computational times. On a medium sized network, the proposed approach results in two order of magnitude improvements for CPU time while maintaining almost the same solution quality.

Schöbel and Kratz (2009) present a bi-objective approach for generating robust and efficient timetables. Given a level of uncertainty, the problem of finding a Pareto optimal timetable is solved as efficiently as the original single objective problem. However, it is generally difficult to predict the impact of the level of stochasticity of railway systems.

Shafia et al. (2012) describe a mathematical model for train timetabling with some degree of robustness. As described by the same authors in their remarks, the model makes strong assumptions on safety issues related to train movements in the network. Based on this model and on two methods for the computation of buffer times, a branch and bound algorithm and a beam search-based heuristic are proposed by the authors for the computation of timetables. A real railway line is used for testing the algorithms while fictitious data are assumed for the travelling trains. The results show some improvements obtained by the two dedicated algorithms compared with a commercial software.

Vansteenwegen and Van Oudheusden (2006) focus on the development of a robust timetable. They propose a two phase approach, in which, first, ideal buffer times with respect to train connections are computed, and then LP is used to build a timetable minimizing the waiting costs. Finally a discrete event simulation is used to compare the original timetables with the new one. The authors report substantial improvements.

Simulation is a common approach to evaluate the robustness of train schedule, as in point (2) of the Introduction. And we comment on some railway applications of simulation tools to evaluate timetable robustness:

Carey and Carville (2000) compare the reliability of proposed schedules by simulating detailed train traffic in a busy complex station. The simulator uses a rescheduling algorithm to solve arising conflicts. Their approach can also be used to evaluate different reliability measures on a given timetable.

Salido et al. (2008) propose two analytical measures of robustness of a timetable and use a macroscopic simulator to validate the analytical and the observed empirical robustness. They use their approach on a case study with about 30 trains to compare two timetables with different buffer times.

Takeuchi et al. (2007) propose a robustness index, measuring the passenger disutility introduced by a disturbance. They simulate the traffic on a line with 39 stations and 63 trains, to show the differences between two timetables by changing time margins in the timetable.

To the best of our knowledge, very limited research has addressed the topic of robustness of train schedules during dispatching of railway operations, with a microscopic detail of problem modelling, while most of the related research is focused on macroscopic approaches (see e.g. Cicerone et al. 2008; D’Angelo et al. 2009).

Regarding the optimization algorithms, only recently Meng and Zhou (2011) developed a stochastic programming approach for dispatching on a single track line, with recourse to incorporate different probabilistic scenarios in a rolling horizon decision framework. The approach reschedules trains in a perturbed situation, in order to minimize delays under different forecasted scenarios. Experiments are carried out on a line located in China with 18 stations and 50 daily trains.

From the studied literature, there is a lack of computational studies regarding the assessment of the performance of different train schedules computed by rescheduling algorithms during real-time traffic disturbances. Train scheduling plans can be obtained by using intelligent conflict detection and resolution methods, by solving train conflicts and by predicting the delay propagation in the overall studied network, or by adopting local and myopic rescheduling rules. On the one hand, an optimization algorithm based on deterministic information may produce a more effective but fragile solution, i.e., an apparently good solution at the time it is computed, that is highly sensitive to uncertain factors, and that might result in unattractive plans for infrastructure managers and train operating companies. On the other hand, a rule-based algorithm may produce a seemingly less optimized but more robust solution, i.e., it may be able to absorb disturbances without a relevant degradation of the quality of the solution. However, in both cases, the impact of stochastic disturbances needs to be carefully evaluated before schedule implementation.

The main motivation of our study is thus to quantify the improvement that can be achieved in real-time, if any, achievable by using optimization-based rescheduling methods under stochastic disturbances.

3 Robustness evaluation framework

Figure 1 presents the flow of data in the framework developed to evaluate the quality of train schedules when dealing with uncertain information.
Fig. 1

Flow of information in the framework

The framework takes as input:
  • A railway instance. A deterministic railway instance contains a detailed description (at block section level) of the railway network under study, the set of trains running in the network over the considered time horizon and the timetable they should follow on a railway network.

  • Railway network status. The current status of the network, i.e., train position and speed of all trains at time t0, is assumed to be known. Based on this information it is possible to generate the actual entrance delays in the network (also named source delays, see e.g. Schachtebeck and Schöbel 2010) that must be provided to the solver. The entrance delays are generated as deterministic values of train positions and speeds, that represent the current status of the network and the expected evolution in the short term.

  • Stochasticity information. The simulator needs to know the probability functions of the duration of all operations subject to uncertainty. If an operation is not deterministic the probability function should be known and specified. In our tests we considered uncertain duration for all running and dwell times.

All this input defines a stochastic instance. Since it is assumed that the train scheduling solver is not able to directly tackle a stochastic train scheduling instance the framework must transform it into a deterministic instance. 1 This is done as follows: the probability function associated to each operation (running times and dwell times) is converted into a deterministic value by means of a sampling strategy. As a sampling strategy one could consider taking the median of the probability function, the expected value, or a given quantile or the result obtained by a more complex procedure.

Next the train scheduler is invoked on the deterministic instance and it is expected to produce a deterministic solution to the problem. The scheduler uses a rescheduling algorithm in order to detect and solve potential conflicts between trains, and outputs a microscopic plan of operations, with a level of detail of single block sections and a precision of seconds.

As solver we use ROMA (D’Ariano et al. 2008; D’Ariano and Pranzo 2009), a computerized system that can assist train operators in their tasks. ROMA is able to estimate and control the future evolution of the railway traffic by considering deterministic information on track occupation and safety constraints. This may prevent the decision maker from taking less effective decisions, such as causing a deadlock situation or unsatisfactory throughput. The resulting solution represents a new plan of arrival and departure times that minimizes train delays with respect to an original timetable. ROMA also enables the optimization of railway traffic if the actual timetable is not conflict-free and/or deadlock-free, or during severe traffic disturbances, such as when emergency timetables are required and dispatchers need extensive support to solve train conflicts and to handle disruptions (i.e., unexpected blockage of some tracks). Furthermore, different solvers can be used as long as they are able to handle a stochastic or a deterministic train scheduling instance and provide a solution with the desired level of modeling detail.

After the computation of a new schedule by the solver, the framework replaces deterministic process times in the deterministic solution with their original probability functions from the stochastic instance. We call this representation a stochastic solution [Fig. 2 (bottom)]. In other words, a stochastic solution is a representation of the solution computed by the solver embedding the stochasticity. A set of Monte Carlo trials are then performed on the stochastic solution creating a set of sample solutions (scenarios). This transformation of the solution is depicted in Fig. 2: The top plot represents the small railway network with three trains. The middle plot is a Gantt chart, in which all operations are deterministic, and shows two trains progressing along a track with another train traversing part of it in the opposite direction. The bottom plot is a stochastic Gantt chart, and instead of square block representing operations, its shapes height corresponds to the likelihood of the operation running at the current time.
Fig. 2

The layout of the network (top). A deterministic solution (middle) and the corresponding stochastic solution (bottom), for a small instance with three trains. Each train has a running time of 100 time units on each block section and is subject to [−10, 20 %] variability

4 Case study

This section first introduces the experimental setup of the tests and then present the robustness evaluation of the solutions generated by different rescheduling algorithms. The robustness is measured in terms of a number of train delay indicators with the aim to compute the average degradation of the solutions in presence of disturbances of process times.

4.1 The railway network

The test case is the complex and densely occupied area of the Dutch railway network shown in Fig. 3.
Fig. 3

Map of the Utrecht area

We focus on the railway area around the central station of Utrecht, which is a major hub of the Dutch railway network. In fact, about 200,000 passengers use the station every day.

This station area is at the crossroads of the five main lines criss-crossing the Netherlands. Specifically, there are:
  • a double track line to the west, towards Rotterdam and The Hague, served by 12 trains per hour per direction;

  • a four-track line to the north-west, towards Amsterdam, served by 11 train per hour per direction;

  • a four-track line to the north, towards Amersfoort, served by 12 trains per hour per direction;

  • a double track line to the east, in the direction of Arnhem and Germany, served by 6 trains per hour per direction;

  • and finally a double track line to the south, towards Den Bosch, served by 9 trains per hour per direction.

The area under consideration stretches along a few kilometres of the lines, for a diameter of about 20 km. The lines are interconnected by two complex interlocking areas at both sides of the station with a total of about 100 switches, and 20 platform tracks at the station itself. This network is interesting for our study on disturbance robustness since it features a high service frequency, and station platforms and interlocking areas are heavily utilized; thus minor changes of process times may cause the need for rescheduling trains. More detail on the microscopic representation of the railway network can be found in (Corman et al. 2009).

4.2 Distribution of process times

We consider probability distributions for the running and dwell times as follows. Weibull distributions are used to characterize the variability of relevant process times, as in (Corman et al. 2011b; Yuan 2006). An overview on papers studying train delay distributions and experimental findings supporting the choice of the Weibull distributions can be found in (Yuan 2006; Yuan and Hansen 2007).

This work uses the same dataset of events as in Corman et al. (2011b), that contains about 33,000 records of arrival and departure events at Utrecht station, corresponding to a month of operations. When studying dwell times, we divided the events into on-peak and off-peak events. A standard Maximum Likelihood fitting procedure is used to derive the three parameters describing the two Weibull distributions. Distributions of peak dwell times have a shape parameter κ = 1.7914, a scale parameter λ = 255 and a shift of −16 s towards shorter durations. The off-peak dwell times have κ = 2.0824, λ = 261 and a shift of 4.

Densities for the two probability distributions (on-peak and off-peak dwell times) are plotted in Fig. 4. As expected, during peak hours the larger amount of passengers results in longer dwell times. Both probability functions are bound on the lower values by a minimum technical stopping time of 48 s, resulting in a small peak on the probability density graph.
Fig. 4

Densities of dwell times for trains at stations. The spike at time 48 represents the enforced minimum. The height of the curve indicates the likelihood of the probability distribution assuming that delay

Another source of uncertainty is related to running times. This is modeled as a variation of the running time per every single operation (i.e., the passage of a train over a block section); the variation is considered normalized with regard to the planned duration, and follows uniform distributions.

The probability functions related to different trains and different block sections have no correlation in time or space. Unfortunately, little research has been focused on the assessment of the variability of individual running times at a microscopic level. Further research would be required to have a comprehensive description of the statistical characteristics of running time variations, based on recorded data.

We consider four different levels of distributions of running time variations: the intervals describing the uniform distributions are respectively v = [−2.5, +5 %], [−5, +10 %], [−7.5, +15 %] and [−10, +20 %], i.e., there is a slight bias towards longer durations.

4.3 Timetable and deviations

In this study, we consider the 2008 timetable, described in (Kroon et al. 2009) but with a microscopic level of detail regarding the computation of all process times. This timetable is very dense and, in the considered area, it features 80 trains per hour. The traffic is mixed, divided approximately evenly between intercity trains and commuter trains, with a few long-distance, high-speed trains. We consider two train scheduling instances, with a time horizon of traffic prediction of 3,600 and 7,200 s. Those two instances define a set of 1,962 and 3,924 stochastic process times respectively, corresponding to travel and dwell times subject to delays. For each of the two instances we assess 40 cases of entrance delays from (Corman et al. 2011b), representing delays that have already happened before the time window under consideration in the instances, or that are expected in the short term. Each train has a planned arrival time defined by the timetable and associated with each scheduled stop, giving rise to different delay measures.

The entrance delays (source delays) are measured at the entrance of the dispatching area and are given as input to the scheduler. After the computation of a train schedule by a rescheduling algorithm, the positive difference between plan and realization is measured in terms of total delays (or output delays) that are computed at relevant points of the network, i.e., at the scheduled stops and at the terminal points of the studied area. The total delay of a train at a relevant point (equivalent to a tardiness measure) can be further divided in two parts. The primary delay is the train output delay caused by its entrance delays, and it cannot be avoided even by letting the train running at its maximum allowed speed and with no hindrance by other trains. The consecutive delay (or secondary delay or knock-on delay) is the additional train output delay generated by the dispatching measures taken in response to entrance delays, and it is caused by the interaction with the other trains running in the network. In other words, the consecutive delay can be viewed as a domino effect caused by the entrance delay, e.g. a train delayed at its entrance in the network causes headway conflicts with other trains. and it can be partially avoided through a careful planning of the operations (Corman et al. 2011a, b; Schachtebeck and Schöbel 2010; Suhl et al. 2001). Since primary delays are by definition unavoidable, we use the minimization of consecutive delays as performance measures. In the computational experiments, we show the minimization of the maximum and average consecutive delays. The schedule robustness is evaluated by the analysis of small stochastic variations for the running and dwell times of train traffic in the network. Those variations can be the result of, e.g., measurements errors, coarse grained train detection data (for instance based on block section occupancy, rather than a precise positioning system), additional unexpected disturbances, overcrowding or additional delays at station platforms. We generate 1,000 random instantiations of the associated probability functions and use them to evaluate schedule robustness. In other words, each stochastic solution is evaluated over 1,000 Monte Carlo trials (scenarios). In order to obtain a fair comparison, when evaluating the robustness of the solutions provided by different algorithms on the same instance, we apply exactly the same scenarios in the Monte Carlo trials phase. Since the scheduling solvers are deterministic all durations must be supplied as deterministic values and not as probability functions. The proposed robustness evaluation framework has a single parameter that has to be set. Namely we have to specify how the sampling strategy works, i.e., how a stochastic instance is transformed into a deterministic instance. Given a stochastic instance in order to generate a deterministic instance, we sample from the probability distribution of each operation and extract a sampled value that is used as deterministic value. The sampling is done by always extracting for each operation the g-th quantile of the distribution, where g is the only parameter of the sampling strategy. We used the following values of g:
$$ g \in \{0.0, 0.1, 0.2, 0.3, 0.4,0.5, 0.6, 0.7, 0.8, 0.9,1.0\} $$
ranging from g = 0.0 to g = 1.0. Setting g = 0 means that the scheduling solver is considering an instance where all the times are set to the shortest possible duration allowed by the distribution of that operation. When g = 0.5 the sampling strategy provides the median times, and when g = 1 the maximum of each distribution is taken. As we use unbounded Weibull distributions (explained later), we treat g ≥ 0.99 as g = 0.99 for them to avoid infinities. Note that for all the distributions used, the expected values of the distribution corresponds to using g = 0.5 (Uniform distributions) or g≈ 0.56 (Weibull distributions).

4.4 Performance indicators

Recall that in the train scheduling problem total delays (tardiness) can be divided in primary delays, that are unavoidable, and consecutive delays, that depend on the train scheduling decisions. In our problem formulation, both types of delays are by definition not negative.

The traffic flows in a railway network can be represented as a temporal network, in which there are two types of events: the arrival time of a train on a block section and the arrival time of a train at a scheduled stop. Given an event j, the arrival time tja is computed as a longest path in the graph representing the temporal network G = (NFS), where N is a set of nodes where each node corresponds to an event, F is a set of fixed edges corresponding to the track occupation and safety constraints that must be satisfied in any solution, and S is a set of further edges corresponding to the train ordering decisions chosen by the rescheduling algorithm.

Given a planned arrival time tjp related to event j, the tardiness of j is given by this formula: tjt = max{tja − tjp, 0}.

The primary delay is computed on a temporal network without train ordering decisions, i.e., on the graph with any non-fixed edges selected (G = (NF)). Let l(0,j) be the longest path from a start node 0, representing the start time of traffic prediction t0 = 0, to node j, related to event j, the primary delay is tju = max{ l(0, j) − tjp, 0}.

The consecutive delays is computed on a temporal network with train ordering decisions, i.e., on the graph with all non-fixed edges selected (G = (NFS)). The consecutive delay is tjc = max{ tjt − tju, 0}.

In our computational experiments, we consider the following performance indicators, where planned is the set of events with a planned arrival time and |planned| is the cardinality of the set.

The maximum consecutive delay:
$$ \max_{j\in planned}(t^c_{j}); $$
the average consecutive delay:
$$ \frac{\sum_{i\in planned}(t^c_{j})}{|planned|}; $$
the maximum tardiness:
$$ \max_{j\in planned}(t^t_{j}); $$
the average tardiness:
$$ \frac{\sum_{i\in planned}(t^t_{j})}{|planned|}. $$

4.5 Solver settings

The scheduler works with deterministic information. Three rescheduling algorithms are considered: the simple dispatching rule First In First Out (FIFO), a greedy heuristic (AMCC) (Pranzo et al. 2003) and a Branch and Bound (B&B) algorithm (Corman et al. 2011a). The FIFO rule simply states that the train arriving first at a track junction passes first. The AMCC heuristic is based on the alternative graph formulation of the train scheduling problem (D’Ariano et al. 2008) and it basically takes sequential decisions trying to avoid the worst possible decisions at each time. Both FIFO and AMCC need <2 and 7 s respectively to solve the 2-h traffic prediction instances. The B&B is an advanced branch and bound algorithm that minimizes the maximum consecutive delay. Specifically, B&B uses FIFO and AMCC as initial solutions and Jackson Preemptive Schedule as an effective lower bound. The B&B is truncated after 120 s in order to guarantee reasonable computation times. The deterministic scheduling solution computed by the three algorithms is transformed into a stochastic solution using predefined probability functions as explained in the following, and then evaluated using 1,000 instantiations of the stochasticity in the process times. From our tests it turns out that, given a solution, the analysis of the 1,000 scenarios can be done in about 10 s for the largest instance and all objectives. This time is affected by the measures of consecutive delays, as they require an extra longest path calculation to be made to determine primary delays. The process is easily parallelized if faster computation is needed.

4.6 Robustness evaluation

Using two time horizons of traffic prediction, four sets of process time stochasticity, on/off-peak behavior, 40 entrance delay cases and 10 sampling strategies, we have 6,400 runs per algorithm. Adding 1,000 scenarios and a deterministic case, three algorithms (FIFO, B&B and Greedy) and four performance indicators, 76,876,800 objectives are evaluated. This was completed with 12 machines (Intel Core i5 CPU at 3.20GHz), with the last one finishing after 36 h.

The plots in Fig. 5 show the effect of the sampling strategy. The plots are averaged over all cases and on-peak/off-peak dwell times. The right plots show the average and maximum consecutive delays obtained by the solvers while solving the deterministic problems, and the left plots are a result of applying these solutions to the 1,000 scenarios. The right plots show a gradual increase in objective values, which is expected as all process and dwell times increase as g grows. Note also that this growth does not correctly represent what would happen with added random perturbations, as the plots on the left show. In fact, if solvers are optimistic (i.e., assume low values of g) when executing the schedule with added random perturbations the observed performance indicators are greater than expected by the solvers. On the other hand, when assuming high values for g, the objective function as seen by solvers is larger than what it happens when actually executing that plan with random durations.
Fig. 5

Average values of consecutive delays as a function of the sampling strategy (g). The two plots to the right are the solution quality produced by the solvers, and the two plots to the left are the quality after the Monte Carlo samplings are applied to the same solutions

The missing values in Fig. 5 indicate that the greedy heuristic failed to find a feasible solution for g ∈ {0.4, 0.7, 0.8, 0.9, 1.0}. It does however produce competitive results for g ∈ {0.5, 0.6}.

The spread between the rescheduling algorithms is in general limited, unless a large value of g is considered, representing a more conservative expectation of the process times. The closeness of results between algorithms for the same value of g indicates that the g parameter is a stronger predictor of the quality of the stochastic solution than the selected algorithm. Note that underestimating delays when solving the deterministic instances corresponds to low values of \(g (g \leq 0.33\dots)\), and results (on average) in worse solutions. Using expected values for the probability distributions (corresponding to g = 0.5 for the uniformly distributed running times, and g ≈ 0.56 for the dwell times following Weibull distributions) yields much better results (roughly 100 s less average consecutive delay), and if the distributions of the delays are unknown, overestimation is better than underestimation, as shown in Fig. 5.

We observe how assuming g = 1 leads to poor solutions. This is due to the presence of unbounded distributions, i.e., the Weibull used to model dwell times. Specifically, assuming g = 1 (g = 0.99) corresponds to sample extremely large dwell time thus shadowing the travel times and producing poor quality solutions when stochasticity has been applied.

A value of g = 0.6 yields the best solutions for all three algorithms, and will thus be used in subsequent comparisons between the three algorithms.

Table 1 reports a quantitative comparison between B&B and FIFO, for the deterministic (first two columns) and stochastic (last two columns) solutions, in terms of the four delay indicators (reported in seconds). Namely, we consider average and maximum tardiness and average and maximum consecutive delays. The greedy heuristic has been omitted since we obtained the same solutions as B&B for g = 0.6. The presented results are averaged over the off-peak and on-peak dwell time distributions, respectively. Overall, the B&B slightly outperforms FIFO. However, in case of high stochasticity B&B produces significantly better solutions compared to FIFO in terms of the minimization of maximum consecutive delays.
Table 1

Average results on deterministic and stochastic instances for g = 0.6, split by off/on-peak, process time stochasticity and objective function


Obj. func.











[−2.5, 5 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







[−5, 10 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







[−7.5, 15 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







[−10, 20 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.









[−2.5, 5 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







[−5, 10 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







[−7.5, 15 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







[−10, 20 %]

Avg. Tard.







Max. Tard.







Avg. Con.







Max. Con.







The greedy heuristic has been omitted since we obtained the same solutions as B&B for g = 0.6. The % column indicates how many  % FIFO was outperformed by B&B on average

When comparing deterministic and stochastic instances in Table 1, it must be noted that the train delays are much larger when stochasticity is considered, on average 10 % for total delay, and up to ≈250 % for average consecutive delay. This reflects a high sensitivity of the train movements to delay propagation, and justifies the usage of microscopic models that can compute delay propagation between subsequent trains precisely. The ratio (% columns) between average delays and maximum delays (both consecutive delay and tardiness/total delay) produced by B&B and FIFO remains comparable across the different stochastic levels. B&B generates consistently better solutions compared to FIFO in terms of maximum and average consecutive delays, both in the deterministic and stochastic cases.

Surprisingly the relative gap in average consecutive delay between B&B and FIFO increases for the stochasticity level [−7.5, 15 %] (and other bold entries in the table), when stochasticity is added. It was hypothesized before running the tests that the tighter schedule would make more delays propagate. We also observe that the maximum consecutive delays produced by B&B grow much slower than the maximum consecutive delays produced by FIFO as the stochasticity v increases. The difference between algorithms increases slightly with the amount of stochasticity v, due to both a greater expected value of the process times, and a greater chance of delay propagation due to the larger variability.

The plots in Fig. 6 give further details on the statistic trends, aggregating over all values of g, all stochasticity levels v and on/off-peak. Also in this case the greedy heuristic has been omitted from the plots, as it fails to give a feasible solution in all cases thus skewing any density plots. The plots on the left report the distribution of the averaged average 2 tardiness and the averaged average consecutive delay incurred by FIFO and B&B. Solutions computed by B&B are in general better, as observed in Table 1. This can be seen by a shift towards lower delays for both performance indicators. Another interesting feature of B&B is the reduced spread compared to FIFO, i.e., its distributions has a sharper peak. This means that B&B is able to come up with good solutions more consistently and suffers from uncertainty to a lesser extent. Similar trends arise for the average tardiness. Finally, the ripple pattern on the right tail of the figure is due to the aggregation of multiple values of g into a single plot. This highlights the role of the sampling strategy, that has a relatively large influence on the final solutions. The plot on the right represents the densities of the average maximum tardiness and the average maximum consecutive delay. B&B outperforms FIFO on average consecutive delays as seen in Table 1, while having similar averaged maximum tardiness values. In general, the same patterns arise as for the average delays, but due to the high non-linearity of the maximum delay as indicator, the plots are more erratic, and trends are more difficult to visualize.
Fig. 6

Densities of the average delays and consecutive delays (left) and maximum delays and consecutive delays (right) for FIFO and B&B in seconds, for all g, for all v

Figure 7 shows the contribution to the left plot in Fig. 6 that comes from the case where g = 0.6, for all stochasticity levels v. The right tail shows a very neat reduction towards zero; the other trends that were discussed in Fig. 6 are more easily seen. In this plot, the greedy heuristic and B&B are indistinguishable.
Fig. 7

Densities of average consecutive delay and average tardiness for FIFO, greedy and B&B when g = 0.6, for all stochastic levels v

Figure 8 illustrates the difference between FIFO and B&B for all Monte Carlo samples, in terms of average consecutive delay minimization. Specifically, we consider the case g = 0.6, for all stochastic levels v. Four levels of stochasticity are presented in Fig. 8. The plots related to the different levels of stochasticity are all skewed to the right, i.e., the average value is always larger than 0, this means that B&B has always a better average performance than FIFO. Specifically, B&B performs better or at least as good as FIFO in 77.47 % of the cases for v = [−2.5, 5 %], in 76.65 % of the cases for v = [−5, 10 %], in 73.97 % of the cases for v = [−7.5, 15 %], in 82.23 % of the cases for v = [−10, 20 %]. To conclude, the gap between B&B and FIFO increases as the travel times become more uncertain. However, there are some samples for which the solutions provided by FIFO are better than B&B.
Fig. 8

Difference between FIFO and B&B with respect to average consecutive delay when g = 0.6, for all v. The area of shape left of the vertical line indicates the cases where FIFO provided better results than B&B after stochasticity is considered, and vice versa. The corresponding box plots are drawn in the shape

Two scenarios were considered for dwell time durations: on-peak and off-peak. Figure 9 demonstrates a shift in average solution quality: when dwell time increases in the peak hours (Fig. 4), the average consecutive delay increases significantly, with about 15 s deviation on average. Thus, a relatively limited difference in dwell time probability may result in remarkable differences in output. This confirms the general idea that stations are the main bottleneck of railway systems, specially for stations with multiple platforms and tracks such as the one considered in this study, that have many conflicting inbound and outbound routes, and serve large amounts of passengers.
Fig. 9

Densities of average consecutive delay produced by B&B for g = 0.6, for all v. The steeper shape of the density plot for the off-peak case indicates that the average consecutive delay tends to be smaller as expected

4.7 Needs for rescheduling

It can be observed that in practice, when the delays exceed a given threshold the human dispatcher may execute the rescheduling algorithm to produce a new plan more adherent to the current network status. Therefore an algorithm able to produce a robust schedule may lead to a reduced rescheduling frequency, which is desirable. The proposed robustness evaluation framework can then be used to evaluate how often the given rescheduling threshold is exceeded and the rescheduling process is triggered. To determine in what percentage of cases a rescheduling is required, we must first decide on the threshold level to consider. We used the average consecutive delay as objective function indicator, because it allows comparisons between instances with different primary delays. It is also clear that the tardiness measures are a poor basis on which to decide if a rescheduling is needed, as all of them might be unavoidable, i.e., caused by delays of travel times. The percentage of simulated outcomes requiring rescheduling is plotted against the chosen threshold level based on the average consecutive delay in Fig. 10. The figure shows the behavior of FIFO and B&B for a low stochasticity (v = [−2.5, 5 %]) and high stochasticity (v = [−10, 20 %]) situations when g = 0.6 is considered. From the plot it is clear that average values below 50 s are rare, while values >150 s seldom occur. For values in between, there is a consistent gap between B&B and FIFO, indicating that choosing acceptable average consecutive delays in this interval ([50, 150]) would force more frequent reschedules for FIFO than for B&B.
Fig. 10

From this plot we can detect the need for rescheduling. If an acceptable level of average consecutive delay is defined, the resulting percentage of scenarios requiring rescheduling can be determined as a function of threshold values. These values are computed for g = 0.6

4.8 Analysing the influence of a single dwell or process time

Another value of interest is the influence on the overall schedule of the variation of a single dwell operation. To this aim we have to identify the operation with the greatest potential impact on the average consecutive delay of all trains.

Longest path calculations done on the graph are used to obtain starting times for each operation, and allow for identification of critical operations. By selecting an operation oj to analyze, changes can be made to those operations process times for all the sample graphs, 3 and the resulting sets of objectives can be analyzed.

The operation with the greatest potential impact on the objective is identified by applying the sampling strategies g = 0.0 and g = 1.0. For each of these two cases, the average consecutive delay is computed on and averaged over all Monte Carlo samplings. The difference in the two computed means represents an estimate on the averaged impact of changing that operations duration from the minimum to the maximum possible value.

Figure 11 shows the distribution of the average consecutive delay for different values of g when dealing with the operation influencing most the average consecutive delay. This is a dwell time operation with a high degree of uncertainty. Differently from Fig. 11, when dealing with the travel time operations only, the corresponding plot shows little discernible difference between g = 0 and g = 1, even for v = [−10, +20 %]. As a result, we can conclude that the effects of the variability of single dwell time operations can have a larger influence on the robustness of train schedules compared to the corresponding variability of travel time operations.
Fig. 11

Average consecutive delays sensitivity to changes in a single dwell time on a 3,600 s instance

5 Conclusions and further research

This paper proposed a framework for evaluating the robustness of a given solution to the train scheduling problem, where robustness under disturbances is the ability to handle small stochastic variations in running and dwell times of trains in the network, without increasing output delays. The robustness analysis is carried out with a very small computational cost, thus it is suitable for the real-time evaluation of different alternative solutions.

In the computational results of Sect. 4, the behavior of three rescheduling algorithms has been evaluated in deterministic and stochastic environments. B&B provides consistently better solutions than FIFO and AMCC in terms of consecutive delays and tardiness. Specifically, FIFO often presents the worst solutions, while AMCC performs well when it delivers results, but it must be deemed too unstable to use outside as a stand alone algorithm.

The influence of algorithms, stochasticity and sampling strategy was also assessed. From the results, it turned out that the sampling strategy, used to generate the deterministic instances, has significant effect on the solution quality. Sampling strategies when ignoring delays (i.e., g ≈ 0.33) performed significantly worse than strategies corresponding to using expected values for each distribution (g ≈ 0.5). Furthermore it was shown that slightly overestimating the expected value of the distributions performs better than underestimation. This observation becomes important when the exact distributions are unknown.

B&B was originally feared to be more sensitive to stochasticity than FIFO, being the schedule produced by B&B tighter. We have shown, however, that B&B often outperforms FIFO even after taking stochasticity into account.

Future research directions should be directed to:
  • Exploring more sophisticated sampling strategies, and further determining the statistical properties of the recorded distributions of process times.

  • Extend our investigation of disturbance robustness to the analysis of worst-case scenarios in complex and dense networks with multiple process disturbances.

  • Insert the proposed evaluation framework in an iterative process for the production and assessment of schedule robustness.

  • Applying closed loop rescheduling based on the real-time analysis of solutions robustness.

  • Maintaining pools of candidate solutions, and guiding (as real-time events occur) execution of schedules by the plan exhibiting the best performance indicators.


Clearly, if the solver is able to deal with a stochastic instance there is no need of removing stochastic data.


When two aggregators are specified, the inner aggregates over the operations in the solution.


By evaluating estimates for oj in increasing order, longest path calculations can be reduced to propagate changes on paths rooted in the oj in the temporal networks.



Rune Larsen wishes to acknowledge financial support from the Villum Foundation under grant VKR09b-014.

Copyright information

© Springer Science+Business Media New York 2013