3.1 Understanding Resilience: When and Where?

Many current efforts to understand and manage risk in complex organisations focus on ideas—and ideals—of resilience. Resilience has been theorised in a variety of different and sometimes conflicting ways, but broadly refers to the capacity of a system to handle disruptions, failures and surprises in ways that avoid total system collapse—and may lead to adaptation and improvement. This chapter addresses one of the most enduring challenges faced in both the theory and practice of resilience in complex sociotechnical systems: ‘when’ and ‘where’ does resilience occur? Do activities of resilience occur solely in response to adverse events, or are they preemptive and proactive? Is resilience characterised by rapid processes of adjustment that unfold over minutes and hours, or long-term reorganisations that take years and decades? And does resilience primarily emerge through the activities of those working on the operational frontline, or through higher-order processes that span entire industries? These questions of time and space are fundamental to how we understand and operationalise resilience. However, to date, these issues have largely been assumed rather than explored. Rather like the broader literature on risk [1], the current literature on resilience represents something of an archipelago. Many small islands of research each examine resilience at different scales of activity, with few systematic attempts to examine the linkages between these. This chapter examines these issues and presents a framework for understanding resilience across different scales of organisational activity and considers the key implications for theory and practice.

Our current theories of resilience address issues of time and space in different ways. Regarding the temporal question of ‘when’ resilience happens, theories differ as to whether resilience happens before or after a disruptive event, and either quickly or slowly [2]. Some emphasise that resilience is solely a reactive capacity, characterised by efforts to respond, recover and repair once disruptive events have occurred [3]. Other characterisations expand the temporal reach of resilience, to “the intrinsic ability of a system to adjust its functioning before, during, or after changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions” [4]. Likewise, Comfort, Boin and Demchak [2] define resilience as “the capacity of a social system [...] to proactively adapt to and recover from disturbances that are perceived within the system to fall outside the range of normal and expected disturbances”. More recent analyses have emphasised the need to distinguish between distinct forms of ‘precursor’ resilience, that proactively prevents major system failures occurring, and ‘recovery’ resilience, that rapidly responds after a system collapse [5]. Relatedly, regarding the spatial question of ‘where’ resilience happens, theories diverge on the location and source of resilience in complex sociotechnical systems. The predominant focus is on the adaptive capacity of frontline personnel who encounter and manage immediate fluctuations in organisational activity [6, 7]. Other analyses focus on specialist supervisory professionals that both oversee and retain close contact with the frontline [5, 8, 9]. Alternative approaches emphasise the role of supra-organisational regulatory bodies [10, 11] or the interconnected capacities of entire social and economic systems [3, 12].

3.2 Moments of Resilience: Situated, Structural and Systemic

To understand resilience at different scales of time and space, this chapter introduces a framework that characterises resilience in terms of the scale and nature of organisational activity that unfolds around a disruption. This framework characterises organisational activities as unfolding within three broad “moments” of resilience: situated, structural and systemic. Each of these moments represents a different scale of organisational activity in terms of duration and reach across a system. Each also represents a different way of enrolling core sociotechnical resources—such as knowledge, tools, data, skills and ideas—into organisational activities.

  • Situated resilience emerges at or close to the operational frontline. It involves mobilising and combining existing sociotechnical resources to detect, adjust to and recover from disruptive events. This can unfold over seconds to weeks.

  • Structural resilience emerges in the monitoring of operational activities. It involves the purposeful redesign and restructuring of sociotechnical resources to adapt to or accommodate disruptive events. This can unfold over weeks to years.

  • Systemic resilience emerges in the oversight of system structure and interaction. It involves reconfiguring or entirely reformulating how sociotechnical resources are designed, produced and circulated. This can unfold over months to decades.

To develop and illustrate this framework, this chapter draws on comparative analysis of three diverse sectors—healthcare, aviation and finance. These sectors differ considerably in terms of institutional landscape, operational practice and social organisation, as well as the nature of the risks to be managed. This helps illustrate both differences and similarities in how resilience is operationalised.

3.3 Situated Resilience

Situated resilience emerges from the situated practices that unfold around disruptive events, close to the operational frontline, and involves integrating and applying existing sociotechnical resources such as knowledge, data, tools and skills to detect and respond to disruptive events as they occur. Moments of situated resilience represent organisational activity at a micro-level: the dynamic interactions of people and their immediate work environment, and the adaptation, adjustment and intelligence required to handle unexpected and non-routine events in front-line work [13, 14] by mobilising the requisite sociotechnical resources. Situated resilience can involve the rapid detection and resolution of deviations from plans. For instance, in airline operations, incorrect departure route data in the flight computer may be missed during routine cross-checks, leading to an unintended departure route being flown. Ongoing monitoring by air traffic controllers allows unexpected route deviations to be detected and rapidly addressed by calling the aircraft and providing a corrected route. A sequence such as this, lasting barely a few minutes, represents a disruption to intended activity that requires multiple actors deploying existing resources to recover intended operations.

Situated resilience can involve rapidly responding to and organising around rare emergency events. In maternity care in healthcare, for instance, emergencies—such as post-partum haemorrhage—are relatively rare but require immediate lifesaving action. This involves rapidly mobilising and applying specialist knowledge, skills and tools in patterns of interaction to diagnose and treat the emergency, the precise features of which have probably never been encountered by this team in this specific combination. And situated resilience can involve the instituting of practices that create spaces that support the detection and recovery from hidden problems. In financial services, for instance, it is customary for front office staff (those with trading or related responsibilities) to take at least a two week continuous break from work each year, handing over their trading book to a colleague, in part to provide an opportunity for any irregularities to be identified. Similarly, deal teams involved in large and complex transactions, such as multi-billion dollar purchases of infrastructure assets, may spend weeks preparing alternative, back-up deal documents to help accommodate last-minute changes in deal terms that could threaten the transaction.

3.4 Structural Resilience

Structural resilience represents the processes of restructuring and reforming sociotechnical resources and situated practices. These processes can span multiple organisational units and are typically coordinated by groups that monitor and supervise frontline operational activities. Moments of structural resilience represent organisational activity at the meso-level: active processes of examining organisational practices and sociotechnical resources, and redesigning them in light of past experiences [15], through an effortful structuration process seeking to shape both situated practice and social structure [16]. Structural resilience can involve the reorganisation of work systems in response to disruptive events. For example, an airline’s event involving a high-speed rejected take-off due to a spurious engine fire warning was revealed, after detailed investigation, to be due to loose screws on a temperature sensor. During maintenance, all screws had been loosened by a new engineer following company procedure. But after a handover, a different engineer finishing the job only tightened the screws that were part of the local informal approach to the task. Revealing this gap between plans and practices [15] or work as done and work as imagined [17] allowed the work practices, the cultural norms and the formal processes to be restructured and reformed over several months.

Structural resilience can also involve the design and redesign of sociotechnical resources through the simulation of disruptive events. In healthcare, the on-site or ‘in situ’ simulation of obstetric emergencies is used not only to train individual and team skills, but can also support the redesign and restructuring of the wider sociotechnical infrastructure and resources that support effective responses to emergency events. For instance, regular in situ simulation of emergencies such as major haemorrhage allows the continual testing and improvement of the design of decision aids, the accessibility of equipment and the processes for requesting and receiving blood products. Similar mechanisms of structural resilience are organised across multiple organisations in financial services. For example, regular ‘stress tests’ of financial institutions simulate extreme adverse scenarios, and are used to assess the stability and safety of current resources and structures, and adapt them where necessary [18]. Structural resilience can also involve the restructuring of resources when indicators of potential risk are triggered. In finance, countercyclical capital buffers require firms to build up additional capital reserves during periods of credit expansion (“good times”), both to better prepare for unexpected losses in times of financial stress and modulate risk taking.

3.5 Systemic Resilience

Systemic resilience represents the fundamental reconfiguration and reform of the processes that design, produce, constitute and circulate the sociotechnical resources that underpin safety. This can take place over years to decades, enrol large numbers of actors and cross many boundaries across an entire industry. Moments of systemic resilience represent organisational activity at the macro-level: lengthy, elaborate and often heavily contested negotiations regarding the proper configuration of sociotechnical resources, the appropriate means of generating these and the systems that supports these. Systemic resilience can involve wide-scale reform of the assumptions, norms and technological systems underlying activities across a sector, reconfiguring the way that disruption is itself handled. For example, in aviation the disturbing failure to trace and recover wreckage or critical flight data from MH370, a Boeing 777 lost in 2014, has provoked a fundamental reconfiguration of the way aircraft data are traced and recovered globally. This process is unfolding over years and represents the global aviation system slowly reconfiguring and adapting in response to a serious systemic disruption.

Systemic resilience can also involve considerable reconfigurations of the system-wide architecture for detecting and responding to disruptions. In healthcare, for example, a major crisis centred on sustained failures of care at a UK hospital in Mid Staffordshire prompted, through a multi-year public inquiry, a dramatic reshaping of the function of system-wide regulatory and inspection processes [19]. This fundamentally reconfigured the system-wide mechanisms for detecting and uncovering similar problems. A related reconfiguration has involved the creation of an independent and system-wide investigation body, to conduct blame-free and systems-focused investigations [20, 21]. Similar reconfigurations took place in financial services following the financial crisis of 2008–2009. This included the design and introduction of system-wide countercyclical capital buffers, previously discussed, representing a new means of detecting, measuring and managing one of the sources of the prior crisis. These all represent fundamental reformulations of the core assumptions and systemic architectures that produce and shape sociotechnical resources, in response to serious systemic crises.

3.6 Organising Resilience: From Disruption to Reconfiguration

Resilience can be understood as happening both quickly and slowly, as a multi-layered set of processes enacted over different time periods and over different scales of activity. Distinguishing three broad ‘moments’ of resilience, each with a distinct function and logic, raises key questions with practical and theoretical implications. First, when does resilience occur? Previously, this question has been answered with reference to some materially disruptive event—either proactive action ‘before’, or reactive action ‘after’. However, disruptions that provoke resilience can simply be symbolically, rather than materially, disruptive [9, 22]. Resilience can be provoked by disruptions to expectations, assumptions, norms and beliefs that call into question the safety of current organisational activity. This removes the need to define resilience directly in relation to a materially adverse outcome. Simulating ‘imagined’ emergency scenarios to test and adapt systems provides a clear and direct example of this [23]. However, reliably generating and responding to symbolic disruptions can be challenging. In some sectors, materially adverse events—such as air accidents—necessarily provoke dramatic symbolic disruption: planes are not supposed to crash. But in other sectors, like healthcare and finance, materially adverse outcomes are an expected part of organisational life. Patients are ill and sometimes do not survive. Creditors go bad or the market can turn. In many circumstances, death and losses need not provoke surprise or symbolic disruption and may be normalised. This emphasises the importance of the difficult, effortful interpretive work that must be done to actively construct and communicate the symbolic disruptions that can act as provocations for resilience across different scales of organisational activity. Building resilient systems therefore depends, in part, on building an infrastructure that can not only detect and respond to materially adverse events, but can continually manufacture, enlarge and circulate symbolically disruptive events to organise resilience around: surprises, uncertainties, ambiguities and other challenges to current norms and beliefs. This suggests that, to support resilience, industries need mechanisms—and people—at every level of the system that can generate scale-appropriate symbolic disruptions that provoke resilient, adaptive responses.

Second, organisational life is full of fluctuation, variation and interruption. But when does a mere fluctuation become a disruption, and how does this lead to the enactment of situated, structural or systemic resilience? A defining feature in this analysis is that a disruption is a ‘disruptive interruption’: it interrupts an activity in such a way that it derails the ongoing flow of that activity and requires the mobilisation of supplementary sociotechnical resources (e.g. expertise, attention, time, tools, data) to restore order and control, beyond those that would ordinarily be enrolled in that particular activity. When defined this way, disruption is scale-insensitive. It is equally relevant to the situated practice of frontline workers as it is to the systemic reorganisation of entire industries. If this is the case, how do disruptions at one scale of activity migrate and enlarge to enact resilience at greater scales of activity? This analysis suggests that the enactment of resilience across different ‘moments’ is, in part, dependent on ‘scaling-up’ a perceived disruption. For fluctuations to become disruptions and provoke situated resilience, they need to represent a perceived loss of situated control: a failure of current situated practice to maintain control and comprehension of activities that creates a perceived need to activate additional sociotechnical resources to re-establish control. Likewise, to enact structural resilience, disruptions need to represent a perceived structural collapse: a failure in the performance, design or functioning of current structures and resources, that requires purposeful restructuring. And to enact systemic resilience, disruptions need to pose a perceived systemic crisis: a failure of current system-wide arrangements to properly supply the resources needed for effective control and functioning of the system, requiring a broad-based reconfiguration.

In practice, this suggests that operationalising resilience across different moments and scales of activity requires protected spaces and forums that create vertical alignment within industries: spaces in which more local disruptions can be transformed into more expanded, larger-scale disruptions. There is a risk that expanding a disruption can imply expanding blame. Thus, these spaces need to be protected from contests over liability and be removed from pressures to allocate or deflect blame. Likewise, expanding disruption across time and space requires expanded scales of expertise. Industries need cadres of professionals that work at the interfaces of situated activities, structural supervision and systemic oversight, who are adept at making linkages between these different levels of analysis—and at constructing and communicating compelling symbolic disruptions. Organisational safety units [9, 24], independent accident investigators [25] and reliability professionals [5] provide potential models for these protected spaces and professional groups.

It can be hard to see the relationship between momentary seconds of operational activity, and years-long reconfigurations across entire industries. There remains much work to be done to explain and operationalise resilience at different moments and scales of organisational activity. Future efforts might most productively focus on three areas: the transition from normal fluctuation to provocative disruption; the interfaces between different scales of resilient activity; and the nature of resilience as it unfolds in situated, structural and systemic ways. Resilience can be both fast and slow, small and large. Building more resilient systems depends on being able to conceptually pull apart and practically integrate these moments of resilience.