Exploring Resilience at Interconnected System Levels in Air Traffic Management

This chapter raises issues and ideas for exploring resilience, stemming from various research disciplines, projected on the domain of air traffic management and aviation at interconnected system levels. Attempts are made to connect micro, meso, and macro levels in the aviation sector identifying corresponding research challenges. Examples of this ongoing research are given on how theory has already been translated into practical methodological use. Some connections between foci from Resilience Engineering, Disaster Resilience, and other research disciplines are projected on the air traffic management domain to explore how practical benefits can be obtained from these theories and which aspects of operational practice these theories connect to. The chapter shows that the concept of resilience from various research disciplines has a potentially wide application to system levels of air traffic management, and suggests resilience to be addressed from an interconnected systems perspective to provide added value to operations.

The SESAR 16.1.2 project developed and applied principles from the Resilience Engineering literature to safety assessment and design of future technical and operational concepts for air traffic management [1]. The DARWIN project has conducted a worldwide systematic literature review covering more than 400 articles related to resilience and critical infrastructure [2]. It aims to develop resilience management guidelines [3] and adapt these to health care and air traffic management. The systematic literature review identified resilience research on micro, meso, and macro levels as well as on resilience in response to a variety of circumstances, from uncertainty and change, to disruptions and crises, to everyday variability. Related work on agile inter-and intra-organisational response to various crises in the aviation industry [4] may also be mentioned in this respect.

The Added Value of the Term Resilience
The literature is highly diverse in its use of the term resilience [2]. The discussion in this chapter is in line with the position of Woods [5] that resilience is conceptually different from the terms rebound and robustness, although the terms are oftentimes used synonymously to resilience. Among other reasons, these terms lack the adaptive capacity component of resilience, which in this chapter is seen as a salient aspect of the use of resilience in general and for Air Traffic Management specifically.
This chapter will first try to determine the "added value" of using the term resilience rather than other terms or other uses of the term, by briefly addressing a few of the definitions and descriptions of resilience that take adaptive capacity as a central theme. These stem from Resilience Engineering, mostly based on human factors and safety management, and Disaster Resilience originating in crisis and disaster management.
Resilience as "the intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions." [6, p. xxxvi] implies standpoints of Resilience Engineering relative to traditional approaches to safety [1]. It emphasises the need to not only react and respond when disturbances are observed but also when they are anticipated to occur. Adjusting performance not only in relation to disturbances but also more subtle changes is highlighted, as common everyday fluctuations in working conditions may coincide and coalesce [6] to hazardous situations, due to system complexity. The terms 'required operations' and 'functioning' emphasise the need to appreciate the multiple goals that operations try to balance. This includes not only safety but often also productivity/profit, efficiency, security, environmental sustainability, etc. Referring to both the 'expected' and 'unexpected' emphasises the need to recognize that not all conditions can be expected and prepared for beforehand, and that unexpected conditions are likely to transpire in complex systems. Traditional approaches to safety focus on anticipation and mitigation of risks, i.e., preparing for the expected. Resilience Engineering suggests that working conditions and processes may be designed to support coping with unexpected events.
This approach was further pursued in the SESAR 16.1.2 project, which adapted the definition above to the following: "The intrinsic ability of the ATM/ANS functional system to adjust its functioning and performance goals, prior to, during, or following varying conditions" [1, p. 120]. Several key aspects led to this refinement. First, the language needed to be adjusted to fit the SESAR safety assessment methodology, a method based on more traditional safety engineering, which the project was tasked to feed Resilience Engineering methodology into.
Second, performance goals change depending upon the situation, which was a refinement of Hollnagel's "required operations". For example [1], everyday Air Navigation Service aims to provide both safe and efficient flow of traffic, but this, in case controllers' traffic display blacks out, moves to mainly providing separation (safety) between aircraft using all means available. Third, the term varying conditions is among other objectives aimed at including what in Hollnagel's definition is called expected and unexpected in one phrase. In many cases there are both expected and unexpected aspects to a complex air traffic situation or course of events, so varying conditions is less divisive or binary and thus captures various degrees of regularity of conditions. This leads to another connection to traditional safety engineering, as the conditions that have not been documented in safety cases could be called unexpected conditions. These could however simply be events that indeed have been thought of during development work but did not need to be addressed further due to too low probability and/or severity in the risk matrix to make it to formal addressing as part of regulated safety assessment. Another issue is that "the operational situation" at any point in time is a complicated aggregate of many conditions, some of which are expected, important, challenging and/or meaningful, or not, depending on who you ask. Also, in complex domains including ATM [1] it seems to be evident that coping with everyday situations (which include unexpected conditions) is based on controllers' ability to apply and merge previous experience with preparations for expected conditions (design features, training, procedures, etc.) in new ways. In sum, the need for addressing expected and unexpected conditions simultaneously, in some way connecting to traditional ways of addressing expected conditions, while not oversimplifying complexity and diversity, but maintaining practicality of the approach with limited resources for assessing any future ATM concept, is thus a challenge when introducing resilience perspectives.
In order to achieve resilience, four necessary and interacting abilities have been defined as anticipating (knowing what to expect), monitoring (knowing what to look for), responding (knowing what to do), and learning (knowing what has happened) [7]. This thus ties together several established activities of traditional safety management, such as risk analysis and assessment, safety oversight, safety indicators, and incident investigation. These activities have traditionally been focused on failures and kept mostly separate from business management processes. Resilience Engineering has a different perspective on this focus, emphasizing that all processes and outcomes of everyday performance, productivity, and safety need to be understood from an integrative management perspective.
Articulating the importance of unexpected conditions in Resilience Engineering, Woods' 2006 definition focuses on the situations that go beyond what the organisation or system has prepared for: "the ability to recognize and adapt to handle unanticipated perturbations that call into question the model of competence, and demand a shift of processes, strategies and coordination" [8, p. 22]. This definition seems to include only unexpected beyond-design base processes and strategies in resilience. Although this seems like a welcome distinction in order to avoid that resilience is used as an overly inclusive term, the previous discussion of combinations of various degrees of expectation in a complex combination of varying conditions seems to imply that the distinction of when a competence model is called into question, and what constitutes a shift and what does not, could be difficult to make practically.
The emergency and disaster management literature has acknowledged the potential contribution of the concept of resilience for some time [9]. For example, Comfort, Sungu, Johnson, and Dunn [10] discuss public organisations in risky dynamic environments. They emphasise these organisations' need for a balance between anticipation, meaning assessment of vulnerability and safety and (planning for) preventive action, and resilience, meaning (planning for) flexible response ('bouncing back') after a damaging event [10]. Disaster resilience authors seem to resonate with Woods' suggestion to limit resilience to unanticipated conditions requiring adaptation [11].
Although years of theorizing have passed, agreement on the term resilience in the short term in the broad set of research communities using the term [1] seems unlikely. For new concepts like resilience to contribute beneficially to operations, these distinctions may however be important if the concept is to find its way into concrete methods in highly regulated domains such as ATM.

Micro Level Resilience: The Controller
At a micro level the resilience of the controller may be addressed from at least two perspectives: The psychological processes involved in the well-being of the controller in handling disturbances, and the cognitive processes involved in the actual controlling of the traffic. The latter is arguably most effectively addressed at the meso-level instead of the micro-level, where the controllers and the technical tools available to them in a joint human-human-tool-system can be described as performing cognitive tasks in terms of functional units. The former is currently mostly addressed as part of the work on Critical Incident Stress Management (CISM), which has found its way from other domains into ATM [12]. CISM is a peer support program that has as its objectives to mitigate the effects of harmful events, facilitate recovery, restore adaptive functioning, and identify who would benefit from additional services or treatment. As Mitchell & Leonhardt [12] describe, it is a multi-faceted flexible approach with a number of anticipatory, monitoring, and pedagogical activities before a harmful event occurs, support as a response immediately after a harmful event, as well as longer-term recovery. As such, it is an example of increasing the resilience at the micro-level through engagement at the meso-level, i.e. a set of practices between controllers and organizational processes covering all of Hollnagel's resilience cornerstones in order for the individual controller to be psychologically resilient. Thus it is an example of micro-meso interaction that can be argued necessary to establish micro-level (controller) resilience as well as entailing meso-level resilience in the sense that an ATM organization at the local level organizes peer support for all controllers. As such the mechanisms of adaptive capacity and adaptive performance may though not be easily observable or measurable because of the difficulty in assessing purely psychological processes, and need to be addressed at micro and meso levels simultaneously.

Meso Level Resilience: The Position, Sector, and Tower/Center
The meso level in ATM may be seen as containing several gradations in units of analysis, at least those at the controller working positions at the sector that is controlled, as well as at the level of the air traffic service units (ATSU) where simply put three different kinds of services are provided: tower control (in the control zone directly around the tower on an airport), terminal control (the wider area around the control zone used for flights approaching and departing airports), and area control (for controlling flights en-route or climbing/descending to and from terminal areas). At the area controller working position, services are typically provided to aircraft by two controllers that work on a sector of airspace with a limited number of aircraft, with an "executive" controller who talks to the pilots and a "planner" controller that provides help to the executive controller by coordinating with other sectors, anticipating traffic and pre-emptively implementing or suggesting solutions, etc. Tower and terminal controllers may also work on several closely coordinated working positions. In this way services such as separation and route management, sequencing, and handling pilot requests are handled. To their aid the controllers typically have a suite of technical systems, besides the situation display showing the traffic also for the management of separation between aircraft, planning sequences of traffic, traffic conflict management, anticipation of routing consequences for efficiency and safety, etc. As the functions of separation management and planning for example are jointly performed by both executive and planner controllers with the continuous help of their technical tools, the "cognitive" functions of decision making, planning, attention management, etc., are most meaningfully addressed at the "joint cognitive system" [13] level of the sector. Resilience of the operational activity of handling the air traffic should thus likely also be addressed at this level. In terms of Hollnagel's resilience cornerstones, anticipation, monitoring, and providing a response (of/to traffic and its dynamic behaviour) is performed jointly by several controllers and their technical systems in a highly intertwined manner, i.e. most actions taken to handle air traffic originate from the sum of system parts.
Aggregating sectors then brings the perspective at the ATSU level, consisting of a number of sectors with the setup as described above complemented with operational and technical supervisors, with sectors dynamically grouped or split depending on changing traffic demand and circumstances such as weather. Some characteristics of the activities involved in anticipating, monitoring, and responding to events, and especially learning, cannot meaningfully be expressed at lower levels than the grouped sector or ATSU level. In case of an unexpected event happening in one sector, for example where large amounts of traffic need to be rerouted, help can often be obtained from other sectors, so that several sectors are involved in rerouting and helping each other in handling the traffic jointly. This is however not only reactively, in some cases this can go as far as controllers spotting potential problems for controllers numerous sectors ahead for traffic in their sector. Regular cooperation between adjacent sectors on the boundaries between ATSU areas of responsibility, such as between area control centers or countries, leads to the necessity to address adaptive capacity at the inter-ATSU (macro) level. This means that in these cases adaptive capacity cannot adequately be understood unless activities at several ATSUs (towers, terminal/area control centers, possibly in different countries) are addressed. Moreover, the resilience at the meso-level in air traffic management also needs to include other operators than air traffic controllers representing other stakeholders in the air traffic system, such as supervisors, technicians, pilots, ground vehicle operators, etc.
At the sector and ATSU levels, the SESAR 16.1.2 project [1] generated guidance on how to address resilience from a Resilience Engineering perspective (using eight principles mostly based on the work by Hollnagel and Woods): work-as-done, addressing the ways operators use procedures and other working methods, strategies and practices to achieve safety and efficiency, and to meet varying conditions, interpreting signals and cues, trying to find balance in goal trade-offs, while providing adaptive capacity, coping with complex couplings and interactions, managing timing, pacing, and synchronization, in an environment with under-specification and making necessary but approximate adjustments.
Thus the meso-level has several distinctions in grouping of relevant units of analysis which need to be addressed and understood in concert in order to understand ATM resilience and how the system provides adaptive capacity. Principles originating from the Resilience Engineering literature have been operationalized in the ATM domain. Most of these principles apply to several meso-levels and should be addressed at these levels jointly in order to obtain a comprehensive resilience perspective.

The Macro Level: National and International Organizations
At the macro level which here is identified as the national and international (societal) level, ATM is first of all an international network of ATSU nodes cooperating and collaborating where needed to handle air traffic. A number of organizations exist such as the Network Manager responsibility assigned to Eurocontrol which performs a number of operational functions to increase safety and efficiency of the aviation network as a whole. Addressing the resilience of the European ATM network or the European air traffic system would not be meaningful without addressing these aspects. This can also be said for handling large-scale crisis events: The European Aviation Crisis Coordination Cell (EACCC) coordinates the management of crisis response in the European ATM network. The main role of the EACCC is to support coordination of the response to network crisis situations, in close cooperation with the corresponding national crisis response functions and agencies, including coordinating responses and facilitating information sharing. Thus, when crises are of a scale (possibly after escalation within aviation, or from or to other industries or parts of society), addressing resilience even at a meso level (of sectors and ATSUs) need to include an understanding at the macro level activities (for example rerouting of traffic between countries and restrictions on traffic load affecting several countries). A particularly clear example of this are the events after the 2010 eruptions of the Islandic Eyjafjallajökull volcano which disrupted air travel across Europe, and affected several other means of transportation.

Discussion
This chapter has aimed to show that resilience in ATM, and arguably in other safetycritical network-based parts of industry and society, needs to be understood and addressed at micro, meso, and macro scales appreciating interconnectedness and cross-scale interactions [8]. Resilience also pertains to many functional levels and system groupings, making even a distinction like micro, meso, and macro difficult to specify distinctly, due to the networked nature of ATM. Due to this diversity and wide applicability of the approach, the definition of resilience becomes important, yet the identification of resilience characteristics and especially metrics and measurements is particularly challenging. By studying resilience at these diverse interconnected levels and establishing a vocabulary strongly connected to the operational vocabulary at different scales, resilience research may contribute to a better understanding of adaptive capacity and coping with our increasingly complex world.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.