Real-time dispatching observations
The transportation planning within the Dutch railway system is a highly dynamic multifaceted process. Within this context, train dispatchers coordinate and manage the (conflicting) demands placed on track use and integrate multiple sources of information to conduct trade-off decisions and actions necessary (e.g., re-routing, re-ordering and re-timing of trains, tracks and signals) to maintain performance, regain control and mitigate potential threats. Especially in uncertain, time pressured and variable traffic situations, in which train dispatchers are pushed toward the limits of their regular operating (base) capacity and the adaptive capacity of the system is challenged (e.g., Woods et al. 2014), handling the situational demands proves to be a cognitively complex task. It is in those instances that resilient strategies and behaviors are required and boundary conditions of adaptive capacity, as well as localization of those boundaries, might be exposed (Woods and Cook 2006; Dekker 2011). For this reason, observations and description of resilient behavior were focused around high-pressure situations.
In the next section, one of the observed high-pressure situations will be delineated. This illustrative case provides insight into the concrete manifestations of resilient dispatcher performance, as well as subsequent vulnerabilities, and serves as a baseline measure for the (operational) resilience conditions currently present within the organization.
Example of a high-pressure situation: ‘the hooligan case’
On 2-4-2014, a major disruptive event unfolded when soccer hooligans ignited fireworks and smoke bombs on a rail station platform and the mobile police unit was forced to intervene. As is standard procedure in such high-pressure situations, the emergency workplace was put into operation (as a means of reinforcement to handle performance variability, minimize timetable disruptions and mitigate the rapidly increasing delays). In these situations, the dispatcher responsible for the rail trajectory in which the disruption occurs focuses on the direct (short-term) actions involved with the disruption handling (i.e., quick responsive action to train and timetable delays directly resulting from the disturbance) while the ‘emergency’ dispatcher assists by taking over verbal communication with other actors (i.e., telephone calls from train drivers) and (long-term) planning activities. As is the case in almost all high-pressure situations, the corridor team was unable to integrate and develop implications for this specific situation based on the full set of information held by all actors involved (Woolley et al. 2008). This due to the fact that no direct line of communication could be established between the rail control center and the commander, nor other members, of the mobile police unit. Therefore, the corridor team initially chooses to arrange disruption handling of delayed trains based on the incoming order of notifications in the system. This method of prioritizing (short-term) proved to be inadequate and even counterproductive in the long term due to escalating knock-on delays for connecting trains (i.e., working at cross-purposes). This process was noticed by the post manager, approximately 15 min after the incident occurred, who directly stressed the importance of developing and implementing an action plan to properly deal with this situation. To fill the information gap, rail dispatchers used the live camera feed from the station platforms (Fig. 3). By monitoring the police actions on scene, rail dispatchers were able to enhance their overall situation awareness. Concurrently, the internal communication channels/structure were optimized. Two corridor team members gathered behind the rail dispatcher’s workstation (responsible for the rail section where the disturbance took place) in order to listen in on communication and look at the monitor displays to gain insight into train movement and the overall rail situation on the surrounding tracks. Subsequently, this information was shared with train dispatchers manning the neighboring rail sections. Implementation of the action plan resulted in highly selective rail movement in the disruptive rail section (e.g., prioritizing international trains), and gradual redirection of stationary and delayed trains occupying adjacent rail tracks to the nearest available railway station.
Resilience behavior episodes
Administering the Resilience Markers Framework by Furniss et al. (2011), two resilience behavior episodes were distinguished for this specific situation. (1) Recognition of inappropriate situation handling and avoiding escalation of commitment. (2) Tailoring of existing artifacts to maximize information extraction.
Recognition of inappropriate situation handling and avoiding escalation of commitment (i.e., the tendency to continue a chosen course of action even when changing to a new course would be preferable; (Staw 1981)) were related to the strategy ‘provision of feedback to enable error correction’ (Blandford and Furniss 2006) and the broader marker of ‘recognizing and responding to failure’. Although the recognition and notification of malfunctioning initiated the corrective actions necessary to manage the performance variability in this situation, the insight came rather late and was only noted by one actor (the post manager) within the corridor team. Although it could be argued that the post manager has a high level of experience and as such might outperform the operational competence skill level of the other corridor team members, the tasks of a post manager and a rail dispatcher are of a different nature. As such, the post manager’s skills and experience do not translate one-to-one to the abilities and experience of the corridor team members. An alternative explanation could be that the post manager provided a fresh perspective which led to a broader set of actions. This situation exposed potential vulnerabilities (e.g., maintaining adequate situational overview and awareness in high-pressure demands, acknowledgment of inappropriate actions and or routines) which could influence learning and anticipation of future resonance and disruption handling. This notion was strengthened by irregularities observed in the levels of operational performance within and between dispatchers and situations. Similar prioritizing decisions could be observed with other dispatchers over different shifts (e.g., answering incoming phone calls rather than prioritizing timetable changes, which would have been more efficient).
Tailoring of existing artifacts to maximize information extraction can be related to the strategies ‘prepare for future work’ (Blandford and Furniss 2006) and ‘cue creation in action’ (Perin 2005), with the broader markers of ‘preparation’ and ‘strategies that maximize information extraction’ (Blandford and Furniss 2006). The awareness of (incoming) data limitations and the proactive steps taken at present (i.e., enhanced monitoring) increased the readiness to adequately respond to ongoing developments (efficient management of the performance variability) and provided the opportunity to anticipate and prepare for future situational demands.
Weak resilience signals
The operational parameters for the workload WRS set by Siegel and Schraagen (2014) were used to determine if the hooligan case could indeed be labeled as a high-pressure situation and if the system detected it as such. A WRS can be defined looking at three features: a (relatively) long stretch duration, high (average) IWS or XTL scores and discrepancies in the stretch ratio. A graphical representation of the workload stretch measures was generated by plotting all the objective versus the subjective stretches of that day relative to an empirically drawn threshold line (i.e., the rounded sum of the means with one standard deviation above). Since the stretches in Fig. 4 are significantly correlated (r = .94, p < .05), the threshold line serves as a visual guide to optimize the selection of stretches that deserve attention, by serving as a WRS. In Fig. 4, the x-axis represents the stretch duration x IWS scores. The stretch duration is derived by the sum of total minutes a stretch occurred representing the 5-min time slots. In addition, the mean IWS score can be calculated for each stretch. The y-axis indicates the sum of technical system activity measured in a specific stretch, also taking into account the 5-min time slots.
From the graph, it becomes clear that most stretches that occurred on 2-4-2014 were small and do not exceed the boundaries of the safe operating envelope (Rasmussen 1997). Ad hoc analysis revealed that five workload stretches in Fig. 4 are caused by the same underlying (decompensation) event, the ignition of fireworks and smoke bombs on the rail tracks and station platform by soccer hooligans (‘Hooligan Case’ in Fig. 4). Looking at these five stretches in relation to the three WRS features, it is evident that all stretches have a (rather) long stretch duration with increased mean IWS scores (circa 5–6, indicating moderate pressure to very busy). In addition to an increased IWS average, all stretches also contained 5-min periods rated with the three highest IWS scores (7 = extreme effort, 8 = struggling to keep up, and 9 = work too demanding). The stretches also have increased levels of technical system activity (i.e., due to telephony and manual re-routing quantities) and enlarged deviations in the stretch ratio (see stretch number 2). All in all, the hooligan case can indeed be classified as a high-pressure situation.
To validate and verify the performance WRS constructs, (log) data were examined to determine whether the decompensation event that unfolded during that day could also have been identified using performance WRS data methods. A spike in delay development was identified for trains in the 1700 series, indicating a segment of 36 trains traveling the same rail trajectory (Fig. 5). The upward slope could be explained by three ‘hooligan trains’ (all part of the 1700 series). Two trains suffered imminent, rapidly increasing timetable delays due to the fact that they could not leave the station as a direct result of hooligans and fireworks on the tracks. The third train was used by the riot police to forcefully transport soccer hooligans out of the station. In addition, knock-on delays occurred due to the fact that trains retained from departure occupied the rail platforms. This induced red rail signals (an increase of 9.2 % above average) for connecting trains, forcing trains to wait on rail tracks surrounding the station.
WRS analysis function
The successful identification of the known hooligan event (i.e., it was observed during the ethnographic study) contributes to verification of the WRS method. Deviations from normal operational baseline periods, defined as the steady state of a rail control post in which rail movements occur as planned without any intervention, could be established on the workload as well as the punctuality boundary. However, signaling of the hooligan event does not immediately create insight and understanding into the unknown variables in the normal performance variability that could indicate potential creeping sources of future resonance that may underlie the incident. Relatively long-lasting disruptive events with a big impact factor (i.e., affecting multiple trains and dispatchers) are likely to gain attention among actors in the system even without WRS indications. However, when such an event is already known, attention is needlessly diverted which may result in obscuring other unidentified potential factors that influence the resilience state. To enhance the organization’s feedback control loop (Doyle et al. 2013) and increase the understanding, tracking and anticipation of potential sources of future resonance and or the impact factors of the different WRSs indicated by the framework, implementation of WRS analysis functions is proposed. An analysis function is described as an alternated frame of reference, based on other or additional performance indicators, which guides the process of selecting WRSs that need to be dealt with. The aim of this analysis function is to exclude the ‘evident, known and obvious’ causes of resonance, and attempt to shift attention and reveal ‘hidden, unknown or ignored’ processes that could affect rail-system resilience. To demonstrate the concept and implementation of this principle, a punctuality WRS analysis function was established for the hooligan scenario which will be described in more detail in the next section. It is important to note, however, that the use of analysis functions is not limited to high-pressure situations. Analysis functions are equally applicable to and well suited to uncover (creeping) incident precursors in routine situations.
Implementation of the WRS analysis function allows the frame of reference for the punctuality boundary to be manually altered by excluding trains with exorbitant delays due to well-known escalation events (i.e., the hooligan trains) and by comparing the real-time delay measurements to specified base line conditions (performance indicators; train series and specified dates). To test the applicability of the WRS analysis function method, the delay data from the second week of observations were re-examined (Fig. 6).
The three hooligan trains, which caused the exorbitant delays, were excluded from the analysis. Ad hoc analysis revealed an upward trend in delay development for the 1700 series. It could be argued that an average delay development increase of 1.7 min (102 s) per train does not exceed the predefined organizational threshold of ≥3 min delay and, as such, does not require further investigation. However, it could be beneficial to examine whether specific trains in this series contribute invariably to this delay development and whether this upward trend continues over time (e.g., the consecutive days or weeks). In addition, the time delays may impact the time buffers built-in on the pre-defined timetable and as such influence the rail dispatcher’s workload. Such information could aid in forestalling and anticipating future resonance emerging from ‘seemingly insignificant’ (creeping) change patterns and might even identify commonalities in the operating state preceding well-known events.
WRSs and WRS analysis functions should be created to (visually) support the train dispatcher’s comprehension of the current operating state and resilience status and to enhance prediction of possible incidents and accidents in the future by guiding attention to aspects that deserve further analysis. They provide a means to an end and will not in themselves present an integrated approach to improve the resilience or related aspects of the system. In other words, rather than directing the domain practitioners along a defined path, exploratory content that allows for comparison between data is provided.
Resilience questionnaire
The response rate to the questionnaire was calculated according to the American Association for Public Opinion Research (2015) RR1 definition. Of the 67 employees contacted, one person no longer worked for the company and a second person abstained due to prolonged absence. This resulted in a RR1: 22/65 = 34 %, which is acceptable for online surveys (Nulty 2008). The sample demographics were as follows: 16 rail dispatchers, three managers and three front office employees. In total, two females and 20 males answered the resilience questionnaire. Results from the ADAPTER questionnaire (Table 2) are consistent with the resilience baseline conditions and current operating state of the system ascertained during the observations in that the domain practitioners rated the resilience constructs monitoring and responding higher than the resilience constructs anticipating and learning. It should be noted, however, that the scores for anticipating (Mdn = 3.25) and learning (Mdn = 3.17) fall within the average range of the five point Likert scale, being at variance with the observational results which indicated underperformance for these constructs. This could indicate miscalibration of resilience levels (Woods and Wreathall 2008; i.e., learning construct α = .70 and SD = .43) within the organization. However, the scores might also be explained by fluctuation in resilient behavior that was observed within and between (senior and junior) rail dispatchers when they were coping with the (dynamic propagation of) random disturbances during real-time operation. A Mann–Whitney U test was conducted to evaluate whether anticipating resilience scores differed between senior and junior rail dispatchers. Although the results indicated that anticipating resilience scores between senior rail dispatchers (Mdn = 11.83) and junior rail dispatchers (Mdn = 10.00) were not significantly different, U = 30.0, p = .608, r = .11, differences in resilient behavior cannot be ruled out completely since common and socially desirable answers could have been given by the rail dispatchers answering the questionnaire. During real-time operation, specific situational demands could elicit differences between junior and senior rail dispatchers based on experience. Situational demands in itself might provide an indication as to why the acceptable degree of internal consistency (α = .70; Tabachnick and Fidell 2001) was not met since it could also explain the operational variability observed within individual dispatchers. This essentially reflects the rail dispatchers’ notions that no situation is alike, even though situations might appear similar to outsiders since they, for example, both entail disruption handling due to a broken rail switch. In addition to the resilience constructs, domain practitioners assessed the relation-oriented abilities (i.e., shared transformational leadership and cooperation with other teams), which are incorporated into the ADAPTER questionnaire to operationalize the concept of team resilience, as the least well represented within the organization. These results are in line with the resilience observations. Transformational leadership has proved to be a leadership style that effectively stimulates knowledge creation and knowledge sharing at the individual and group levels (Bryant 2003). The fact that that this ability is under-represented (Mdn = 3.14) could affect the learning capabilities of the organization (Zagoršek et al. 2009) and as such explain the performance variability observed. The low rating for cooperation with other teams (Mdn = 2.54) indicates improvement opportunities for handlings across organizational and sub-system boundaries, such as the communication breakdown that occurred in the hooligan example.
Table 2 ADAPTER questionnaire; descriptive statistics and reliability coefficients