Introduction

In the past decades, the major progress in the control of risks of socio-technical and technical systems can be attributed to systematic analysis processes and principles (see, e.g., Olechowski et al. 2016), in particular the risk management process of ISO 31000 (see, e.g., Purdy 2010) and related methods collections in ISO 31010. These processes and principles comprise framings, process requirements, and methods that are recommended to fulfill the overall requirements, and process-specific requirements (see also Fig. 1 for a similar structure). As the result of years of application and domain-specific standards and application recommendations, such as for critical infrastructure (Giannopoulos et al. 2012; Giannopoulos and Theocharidou 2015), by now auditable and insurable risk assessment approaches have been generated. For risk assessment even domain-specific standards have been provided such as for compliance management (ISO 19600; ISO 14001), business continuity (ISO 22301), application to the environmental domain (ISO 14001), security management, in particular of supply chains (ISO 28000), and emergency and incident management (ISO 22320) and urban resilience (ISO/AWI 22371).

Fig. 1
figure 1

Joint resilience and risk analysis and management process

An important observation is that for practical implementation of such classical risk control approaches, tabular or matrix-like methods play a major role. Reasons for the wide-spread use of such methods include their applicability in early assessment stages, the systematic presentation options, the option of informed reuse, and often the possibility of inductive bookkeeper-like conduction. The main reasons for practical success of tabular approaches can be seen in their low (actually employed) implementation costs, their application potential in all phases of product development and deployment (idea, design phase, production and operation), the option to use tiered and nested approaches risk assessments (from qualitative to quantitative), and their high level of acceptance in practice. For example, the combination of the abstract five-step risk management process (context analysis, risk identification, risk computation, risk acceptance evaluation, countermeasure selection) along with appropriate mainly tabular (even look-up table like) and inductive approaches proved to be a successful model of classical risk assessment and control.

The field of resilience assessment and management and its connection with risk assessment and management is less developed and is still a subject of controversy. Many methods, tools, and alternative conceptualizations are proposed (see, e.g., Linkov and Trump 2019 for a recent review). For example, the US National Academy of Sciences considers risk as part of the resilience cycle (namely the response/absorb phase, see NAS 2012, Linkov et al. 2014). In contrast, guidance for resilience assessment for Power Industry places resilience under reliability, which is managed through risk assessment tools (CIGRE C4.47 2019). One of the approaches to connect risk and resilience assessments under a tiered framework that is in place for risk assessment is attempted in Linkov et al. (2018), but it was done in general terms and the application of tabular approaches to resilience management in a more systematic way was not discussed there.

The first attempt to suggest mainly classical tabular system analysis methods for supporting resilience assessment was done by Häring and Gelhausen (2018). Examples of such methods include modified and extended hazard list (HL), preliminary hazard analysis (PHA), hazard analysis (HA) on system level including maintenance and operation and support (O&SHA), variants of failure mode and effects analyses (FMEA), and—in particular at the system function level and subsystem level and for risk evaluation—the risk matrix.

The risk assessment process asks at the very beginning decision makers to formulate of objectives, for evaluation criteria for reaching objectives, and for a broad stakeholder engagement in different stages of the process. The risk management process is focused on controlling risks to objectives identified in the risk assessment step. However, this very subjective identification of objectives and of the risks of not reaching those objectives challenges the traditional application of risk management, in particular when applied to determine resilience. It can therefore be expected that asking for a more explicit deductive process would be helpful that already considers resilience concepts, in particular when combined with tabular methods, and that it would also be efficient for resilience management. Thus, the question arises of whether a system performance (service) function-based process can be defined that supports both more efficient risk assessment and control as well as resilience assessment and management for efficient overall control of risks.

The text is structured as follows: Building on the constructive observations of the introduction, Sect. 2 presents motivations for use of ISO 31000 as foundation for a joint risk and resilience management process along with tabular methods for its implementation. It also shows that the process takes up requirements of known resilience definitions, assessment and management frameworks. In particular that the process asks for and can incorporate quantitative resilience assessments beyond the tabular implementations which are in the focus of the paper. Section 3 motivates the input data selection for the tabular approaches within the defined resilience assessment and improvement process. Section 4 presents the data input for the sample cases telecommunication, gas networks, and indoor localization and provides a tabular overview of analysis options for each process step. Section 5 shows in detail which risk and resilience assessments are feasible for the sample cases in which the process was applied. Section 6 reflects on how to successfully implement such a tabular risk and resilience assessment and improvement process and how to further support it with other methods, in particular for implementing countermeasures that are applicable in case of events. Section 7 provides an overall summary of the joint risk and resilience management process on an abstract level implemented with tabular methods, and provides standardization options.

Rationale of overall approach and comparison to existing approaches

Summarizing, and taking up the constructive line of argumentation of the introduction, the following drawbacks of classical risk management, analysis, and control need to be considered (see also more recent and rather fundamental critical reviews of classical risk management as summarized in Leitch (2010), Lalonde and Boiral (2012), (Hollnagel 2017), Selvaseelan (2018), Aven (2019), and Häring et al. (2020):

  • Damage events with major effects are often unavoidable due to a wide and dynamic threat landscape and enabled by the increasing complexity of modern systems;

  • Focus of classical risk analysis and management on prevention (reduction of frequency of events) as well as protection and robustness (low damage effects in case of events) (see, e.g., Häring 2015) as opposed to considering all resilience dimensions;

  • In governance, risk focuses on management, forecast, and reduction of known threats, and does not have a temporal aspect (Larkin et al. 2015);

  • There is no systematic and explicit leverage of improved absorption, response and recovery, learning and adoption options during and post events, i.e., not all resilience cycle phases (preparation, prevention, protection, response, recovery) are considered (Thoma et al. 2016);

  • The deductive potential based on system performance expectations should be employed by modern approaches versus the rather vague risk on objective approach of classical risk management;

  • No simple extension options regarding resilience management and generation of classical risk management process (formally) conformal to and extending ISO 31000 are available;

  • Classical risk analysis and management does not make explicit cost-versus-efficiency considerations regarding all risk event control and resilience generation options for achieving efficiently overall system performance objectives;

  • Classical risk analysis and management makes no explicit attempt to cover unknown or even unknown events, e.g., in terms of potential effects;

  • Socio-technical resilience capabilities and abilities are challenging to grasp in (technically driven) assessment frameworks, see, e.g., contributions in Nemeth and Hollnagel (2014), even when formulated as technical resilience capabilities (Häring et al. 2016b).

  • Classical risk analysis and management focuses on probabilities and loss reduction and not on improving the system’s ability to absorb, adjust, or recuperate from an event (Baum 2015);

  • In the context of behavior of complex systems affected by threats (Galaitsi et al. 2020), resilience and risk are the subject of active discussions (Linkov and Trump 2019). In particular, resilience and risk are discussed as complimentary concepts that can be connected through a tiered evaluation process (Linkov et al. 2018).

  • There is a lack of (deep) uncertainty coverage and appropriate handling in classical risk management.

Baum (2015) carried out further comparison of risk and resilience, finding that risk analysis is not as good as resilience when it comes to four different conditions. The first condition is if (1) the scenario being investigated includes unknown threats. Risk analyses also have poor performance when (2) the probabilities are not available or measurable, (3) when threats can have cascading effects to entire systems, or (4) interconnected networks, or (5) when the threats are on a large scale and cataclysmic (Baum 2015).

In this context, and taking up major challenges and gaps just listed above, Fig. 1 shows the step-wise performance-based joint resilience and risk management process first proposed and exampled in Häring et al. (2017a) for the electrical energy transmission domain, the urban transport domain, and for coupled infrastructures of a regional area in Canada. The approach has also been motivated in Häring et al. (2016c) and was applied to local electrical distribution grids in Tomforde et al. (2019) (OCTIKT 2018–2021) at concept level. In the present paper, applications are discussed in more detail for telecommunication grids, gas transmission systems, and indoor ultrasonic localization systems.

The resilience dimensions used are defined and referenced in Häring et al. (2016a), see also Table 1 for an overview. Resilience dimensions to be considered include resilience cycle phases, e.g., preparation, physical protection, detection, prevention, absorption, response (stabilization), recovery, adaption and learning (introduced and discussed in NAS 2012). Examples of resilience capabilities implemented at system level include sensing (e.g., detection), data analysis (e.g., data fusion), situation representation (e.g., spatial and dynamic situation representation), decision making (e.g., rule based), and action (e.g., activation of protection mechanism). Examples of system layers are physical, engineering (technical), cyber, operational, organizational (decision making), and policy layer (modified after Linkov et al. 2013).

Table 1 Overview of tabular methods for the joint resilience and risk management process

In the following, the present approach is further related to existing definitions, analyses, and improvement management processes for socio-technical system resilience. Regarding the definition of system resilience and resilience aims, it aligns well with ongoing conceptual discussions as, e.g., based on Haimes (2009), Linkov et al. (2014), Zio (2016), Kröger (2019), and Cottam et al. (2019).

Within a general framework of five steps (threat analysis, resilience capability design, resilience cost evaluation, resilience quantification, resilience improvement) and in the context of supply chain resilience, in Hosseini et al. (2016) a Bayesian network approach is applied to quantify resilience using different types of qualitative (discrete) up to quantitative inputs under uncertainty. It is shown that the approach covers the scope of the framework. A similar quantification approach is also applied to the resilience assessment of waterways (Hosseini and Barker 2016) showing that absorptive, adaptive, and restorative capacities can be modeled using the Bayesian network approach. The Bayesian network-based quantification has also been applied to an interdependent electrical infrastructure system (Hossain et al. 2019). In comparison with the present joint resilience and risk management process, the discussed framework is more focusing on a single well-suited resilience quantification approach.

Henry and Ramirez-Marquez (2012) show how to use system performance functions or system figures of merit (FOMs) to generate dimensionless time-dependent resilience functions that show the increase of system performance post disruptions (recovery over initial loss). The present approach also uses system performance functions, in addition also non-performance functions. However, for reasons of end user acceptance and to avoid too high values, the latter are not transformed into performance functions by using one over figure of merit expressions as proposed for consideration in Henry and Ramirez-Marquez (2012).

Measures of criticality of single components of critical infrastructure networks based on vulnerability and recovery behavior of overall networks are proposed and computed in Barker et al. (2013). With similar approaches also, repair prioritization and effects of repair delay on system resilience can be assessed (Fang et al. 2016). Within the present joint risk control and resilience analysis approach, such component importance measures for system resilience could be used within Step 6 using stets of system performance functions and disruption types that cannot be excluded by using only tabular and matrix approaches.

Also simulative assessment results of multilayer analyses of single infrastructures, see, e.g., Nan and Sansavini (2017) for electrical grids, resilience simulations of interlinked infrastructures using functional dependency modeling, see, e.g., Petrenj and Trucco (2014), or generic graph-based approaches, see, e.g., Kong and Simonovic (2018), can be used within the current process framework along with the tabular approaches. In particular, the tables can be used to collect input for quantitative and simulative assessments.

The present approach takes account of the identified absorptive, restorative, and adaptive resilience capacities metric as identified within a framework for resilience analysis of engineered and infrastructures systems (Francis and Bekera 2014) in terms of requiring to consider the resilience dimensions resilience cycle phases and technical resilience capabilities in particular in Steps 3 to 7 within the system performance function-based process of Fig. 1. Also the framework elements (system identification, resilience objective setting, vulnerability analysis, and stakeholder engagement) proposed by Francis and Bekera (2014) are well covered while being compliant to the classical overall 5-step risk management process of ISO 31000. The present approach also aligns with cornerstone of the framework proposed in Vugrin et al. (2010): the need for context- and system-specific definitions of system performance functions and resilience quantification approaches, a qualitative and quantitative approach to system resilience assessment and improvement.

For representative tabular sample methods, Table 1 presents the tabular structure and table or matrix headlines as well as the relation of tabular methods that support the joint risk control and resilience improvement and assessment process. Multiple sample method classes and methods beyond tabular approaches are listed in Häring et al. (2017a), including the expected suitability of these methods for the nine resilience analysis and management steps.

Table 1 at first proposes to generate lists of information items, for instance, system elements; potential system performance functions in terms of service functions, safety and security functions; potential threat events; or potential (overall) risk control and resilience, simulation and improvement measures. In the following, combinations of these items are considered in matrix-like assessments or correlation assessments, leading to final evaluations. The tabular approaches can be combined and should be filled out iteratively and mutually informed as indicated in the overview scheme of Fig. 1.

As shown in Table 1, combinations of item aspects can be used to improve system understanding. Examples include the allocation of users or stakeholders to resilience objectives and system functions, of system elements to system performance functions, the correlation between threats, the correlation between system elements, and the correlation between system functions. The last two examples allow for the assessment of interdependencies and interfaces between system functions and system elements.

The main use of combinations of items, termed a dependency or correlation matrix, is for analyses. This can include the determination of critical system performance functions by relating main system objectives with system functions, the determination of relevant system elements by relating system performance functions to system elements, the determination and evaluation of critical threats for critical system performance functions by assessing the expected effect of threats on system performance functions using all relevant resilience dimensions, the determination of overall risk control and resilience measures by considering the expected relevancy of improvement measures for identified critical combinations of system performance functions and threats, and the consideration of the effects of improvement measures by reiteration of the process until convergence and for regular monitoring purposes.

Within the present process design of Fig. 1 and Table 1, existing human, organizational, and technical resources, redundancies, and response options are considered within Steps 1 to 3. Hence, they are considered within the risk control and resilience assessment Steps 4 to 6, in particular with respect to their efficiency, as well as within the overall risk evaluation Step 7, which needs to take account of risk control and resilience for each potential event and overall. Only if this as-is risk control and resilience is not acceptable do further human and technical intervention options need to be added, as well as possibly system advancements in terms of, for example, less exposure, more robustness, fail operational designs, redundancies, rapidity, more resources for recovery actions, etc. This highlights that, e.g., robustness and redundancy per se are not assets, but of course they are nevertheless in many contexts, see, e.g., Cimellaro et al. (2010).

When inspecting the master Table 1 of tables and matrices proposed to fulfill the joint performance-based risk control and resilience analysis process of Fig. 1, it is emphasized that it assumes an iterative generation and improvement, i.e., going back to the second line if the last line has been reached to assess secondary and higher order effects.

Data sources and selection criteria for application cases

The approach is applied to civil infrastructure systems (see, e.g., Gay and Sinha 2013 for further sample systems) and a localization system. Due to its generic nature, it is expected that it could also be applied to community resilience (see, e.g., Berkes and Ross 2013) or even ecological or societal resilience challenges.

For the first sample case, the data sources are expert inputs collected from single persons and small informal expert rounds of persons involved in the EU project RESISTO (2018–2021) on cyber-physical risk control and resilience enhancement of telecommunication infrastructure. More advanced expert opinion gathering could, for example, use Bayesian updating to determine more reliable estimates of system parameters (see, e.g., the overview in Mosleh 2018).

In general, the standard input gathering is the coordinated collection of information on and opinions from single experts in a single tabular document (spreadsheet) and its iterative approval in joint expert sessions documented by joint signature processes. Expert sessions can be in-person or virtual. The rationale of using expert data is to avoid resource-intensive field data gathering while being able to start risk and resilience assessment and improvement in the early phases of project developments.

The input includes the telecommunication infrastructure, performance functions, threats, and improvement measures in both the general sense and specifically for the different use cases tested in the EU project “Resilience enhancement and risk control platform for communication infrastructure operators” (RESISTO 2018–2021) (RESISTO D3.9 2020). This use case allows for a more detailed and specific analysis to be completed.

In the example case of the gas grid based on the EU project SecureGas on “Securing the European Gas network” (SecureGas 2019–2021), data are gathered from project phases focusing on the determination of main end user needs to control risks and enhance resilience, in particular to counter critical threats of the main gas grid functions within the given operational and legal requirements. A further focus is the tabular formulation of the functional and technical requirements definition for improved functions of security and safety systems of gas grid systems which resorts to the resilience concepts introduced within the joint risk control and resilience enhancement process. When compared to Vugrin et al. (2011), the sample case does not focus on a specific threat type and selected spatial areas.

For the example case gas grid, in addition to expert input collection using bilateral, round table and questionnaire-based data collection, the story board methodology is used to link the system non-performance and performance functions to potential threats as well as to improvement measures in terms of technical and functional requirements of extended capabilities of the safety system, even as preparation of concepts of operation (CONOPS) of these functionalities. The storyboard methodology has been formulated in various flavors and is by now supported by tools (Mohd Yusoff and Salim 2014). In particular, (electronic) templates for its implementation have been proposed for a more seamless communication, see, e.g., Roytek (2010). The storyboard methodology can also be put in a broader context regarding human-centered technology design (Harte et al. 2017) and served to generate input for of functional-operational requirements generation and short CONOPS descriptions (Thronesbery et al. 2007).

The sample case gas grid used representative historic cyber events, threats, and attacks, e.g., the database of the European Gas Pipeline Incident Data Group (EGIG 2020), the United Kingdom Onshore Pipeline Operators Association (UKOPA 2020), the European Joint Research Center (JRC) natural hazard list (Poljanšek et al. 2019) tailored to the gas grid domain, the International Disaster Database (EM-DAT 2021), the US Pipeline and Hazardous Materials Safety Administration (PHMSA 2021), as well as the hazard list developed within a Greek project on targeted actions for enhancing the protection of national characterized European critical infrastructure (NCECI 2017–2020).

Regarding the methodology of identification and assessing of potential risk, the present approach takes up approaches by the JRC methodology of relevancy and impact screening of natural hazards (Poljanšek et al. 2019) and the security risk assessment methodology recommended by Gas Infrastructure Europe (GIE) (KPMG 2021). Furthermore guidance provided by the hazard identification (HAZID) (CCPS 2010), the preliminary hazard analysis (PHA) (Ericson 2016), and the Hazard and Operability Study (HAZOP) approach according to (IEC 61882) applied for petro-chemical facilities (Crawley and Tyler 2015), whereby the implementation was supported by the ALOHA (areal locations of hazardous atmospheres) software package covering toxic dispersion, fire and explosive scenarios (ALOHA 2021), and the use of generic events trees as proposed in Vílchez et al. (2011). Main additional points included to consider also risks post event occurrence.

The search for potential threats and their assessment was further supported by focused publications regarding potential threats and disruptions, including statistical analysis of events for long-distance pipelines (Dai et al. 2017), effects of large scale disasters, effects of single and compound hazards on gas infrastructure due to extreme weather (Moftakhari and AghaKouchak 2019), and effects of major disasters (ICF 2019), seismic effects (Urlainis et al. 2015), disruptions caused by conflicts, crises and disruptions and civil unrest (Carvalho et al. 2014) (Lochner and Dieckhöner 2012), threats to the energy infrastructure by cyber-attacks, conventional warfare, unconventional warfare, and criminal activity (Staff 2014), and terroristic cyber and physical attacks (Dancy and Dancy 2017) (Pirani et al. 2009), as well as different types of cyber-attacks (ENISA 2020).

The third example case is an indoor ultrasound localization system (Bordoy et al. 2020) (Ens et al. 2015) (Hoeflinger et al. 2015), which offers high localization accuracy when compared to alternative technologies, e.g., Wi-Fi-based fingerprinting approaches (Tiku et al. 2020), Bluetooth, ZigBee, Ultra Wide Band (UWB), vision and acoustic-based (Zafari et al. 2019). It localizes ultrasound transmitters on objects (e.g., goods, transport systems, robots) using receivers on the ceiling by application of time difference of arrival algorithms (TDOA) for the case of known receiver positions. The time differences of ultrasound signals are determined using autocorrelation analysis for chirp signals. Data are exchanged between the receivers, senders, and a gateway using the ISM (industry, science, and medicine) radio 6.78 MHz band. After cloud-based transmission, the data are analyzed and visualized, e.g., with a standard PC. Input data collection was conducted within a project on quantitative resilience indicators for technical systems (Resilience Measures 2016–2018) and on multimodal resilient indoor localization systems as relevant for industry applications (MERLIN 2019–2021).

Main data used are the determination of system service and technical functions of the localization system when used in industrial (e.g., logistics, production) and consumer applications (e.g., restaurants), potential threats and disruptions, recommended measurements, as well as measurement results regarding the resilience behavior in critical scenarios. System knowledge to determine main system performance functions and potential threats was collected in expert rounds as input for the estimation of the criticality of combinations of system functions and threats. Some of the threats have also been investigated using a simulative approach (Jain 2018), e.g., noise and barriers. Experimental results of the assessment of critical combinations along with a dimensionless resilience measure are documented in Scheithauer (2018).

The danger of not contextualizing the inputs of experts sufficiently, e.g., different assessments of seemingly similar scenarios, is avoided in the proposed approach by asking experts to refer explicitly to the background information of already existing tables, see, e.g., Table 2 columns two and three.

Table 2 Telecommunication grid sample data entries and tabular and matrix assessment examples

Sample data sources used

Telecommunication grid

In the example case of a telecommunication critical infrastructure, Table 2 lists the type of data collected for the proposed minimal set of tables and matrices as printed in Table 1 in bold fonts. For each entry type examples are given. Even within this slim approach, it is evident that many tables and matrices can be understood as extensions of simpler versions that have been generated in earlier assessment phases, where less information is available. Similar observations can be made when applying tabular approaches to achieve functional system safety in the context of IEC 61508. Also, in the domain of IT security, for example when using the system performance function and risk-based cyber security HEAVENS approach as developed for the automotive domain within a Swedish research project on healing vulnerabilities to enhance software security and safety within the automotive embedded systems domain (Lautenbach and Islam 2016).

Table 2 can be grouped by tables that collect socio-technical system information (columns 1 to 3), tables that collect information on potential failures, threats, and disruptions considering all resilience dimensions (columns 4 and 5), the system-specific assessment of the threats (columns 6 and 7) in terms of effects on system performance functions, and the selection and pre-assessment of improvement measures. The tables can also be grouped into information that is available in rather early steps of the analysis (columns 1, 3, 4, and 8) and tables that need detailed system knowledge (columns 2, 5, 6, 7, 9).

All tables are related to each other. Most tables additionally contain dependencies within themselves. These dependencies describe the relation of system, subsystems and components in the component and system table (column 2), the relation of system functions, subsystems and components (column 3), and the relation of threats to components, subsystems, and other threats (column 5). Most prominent are the assessment of the combinations of system performance functions with threats (column 6) and the consideration of several such combinations (column 7), as well as the selection and assessment of resilience improvement measures (column 9).

Because the nine tables or matrices (column labeled with 1 to 9 in Table 2) summarize tables and matrices as proposed in Table 1, they can clearly be shown as covering the 9 resilience assessment and management steps as presented in Fig. 1:

Step 1 on context analysis is supported by the stakeholder and objectives table (column 1 of Table 2 entitled 1. Stakeholder and Objectives), which also determines criteria for the degree of the fulfillment of objectives and assessment criteria throughout the process.

Step 2 on system analysis is supported by the tables on subsystems and components (column 2) and the table on system (non) performance functions (column 3), which determines the system boundaries, its interface, and system functions.

Step 3 on system performance function identification determines the most important system performance functions as well as expectations regarding their performance using, respectively, relevant resilience dimensions (column 3 and column 4).

Step 4 on disruptions identification aims at collecting known potential threats, empirically observed threats in other contexts (exampled events), as well as at least potential effects of unknown threats (e.g., recovery capability loss of unknown origin), and is covered by the table on resilience dimensions and attributes (column 4) and threats (column 5).

Step 5 on the pre-assessment of risk control and resilience is conducted by the matrix that considers all combinations of system performance functions and threats (column 6), in particular in all resilience cycle phases, considering all technical resilience capabilities and all system layers.

Step 6 on overall resilience quantification is in parts covered by the table on overall risk and resilience assessment (column 7), which considers not only single combinations of system performance functions and threats but also, for instance, for a single system performance function all threats, or the effect of a single threat on all system performance functions, as well as the combination of threat effects in terms of effects on persons (see details in Sect. 5.1.1 on analysis options).

Steps 7 and Step 8 on resilience evaluation and resilience improvement option selection are supported by tables on improvement measures (column 8) and improvement measure selection (column 9). In the latter case, the evaluation criteria as provided in Step 1 are used.

Step 9 on resilience measure development and implementation is not supported by a table. The table on improvement measures (column 8) and on their specific selection (column 9) is expected to take into account the implementability, monitorability, and incremental improvement potential. As improvement measures are very context- and system-specific, it is expected that domain-specific approaches can be used.

Gas grid

In the case of the gas transmission grid example, Table 3 lists data entries used in tabular and matrix assessments within the steps of the joint risk and resilience management process according to Fig. 1. They suffice to generate user requirements, mainly system non-performance functions, threats and potential disruptions, the identification of critical combinations of system non-performance functions and threats, functional and technical requirements for improving the security system, respectively, guided by improvement of system performance in all resilience cycle phases before, during and after events and taking several further resilience dimensions into account.

The data are based on the report (SecureGas D1.1 2019) on regulative, organizational, and operational requirements of gas grid security systems when designed as service-oriented architecture (SOA, platform as a service, PaaS) (ISO 22301) (Indu et al. 2018) (Bean 2010). It reports expert feedbacks collected in a questionnaire regarding the three identified types of requirements and requirements of nine similar EU projects as well as an additional expert workshop conducted on the basis of the consolidated feedback with focus on operational requirements (SecureGas D1.2 2019). In addition, technical requirements are considered as provided in the report (SecureGas D1.2 2019). Together with a threat, risk, and vulnerability assessment (SecureGas D1.3 2019), key functional requirements of the gas grid security system formulated as key performance indicators (KPIs) are extracted from SecureGas D2.3 (2019). System mainly non-performance but also performance functions are based on SecureGas D2.3 (2019).

Table 3 Gas network sample data entries and tabular and matrix assessment examples

In the application case gas grid, 6 tables and matrices (columns labeled 1 to 6 of Table 3) cover the scope of Fig. 1 using tabular approaches as proposed by Table 1. In overview, the tables and matrices cover the 9 resilience assessment and management steps in the following way:

Step 1 on context analysis is supported by the User/Stakeholder requirements ranking table, see column with label 1 of Table 3. Inspection of the entries requested shows that besides ID, title and short description of the requirements, application showcasing business domain (context of application), the involved users/stakeholders, the main requirement type (legal/regulatory, organizational, operational), respective types (2 + 2 + 7 = 11) and even sub-types (15 + 16 + 48 = 79) are used for each main user requirement type. The requirements are ranked using a semi-quantitative scale.

Step 2 on system analysis as well as Step 3 on system performance function identification is supported by the table covering System (non) performance functions and related (sub) components, see column with label 2 of Table 3. In this sample case, mainly non-performance functions are used. For each system function, an ID, title, and short description are requested. The functions are related to four asset management phases, five gas infrastructure main components and sub-components (4 + 5 + 5 + 9 = 23). This allows to see which system components in which asset management phase are relevant for delivering the system functionality. In addition, the functions are related to user requirements, i.e., which system functions are relevant to fulfill given requirements allowing to trace the coverage of requirements by system functions. Note that the business value chain elements affected are included, which are strongly related to the system elements considered given the asset management phase.

Step 4 on disruptions identification is covered by a table named Threats and disruptions ranking using resilience dimensions, see column with label 3 of Table 3. Each threat with ID, title, and description is classified using a rich resolution with 12 threat categories an in total 98 event categories (threat sub-categories). In addition, 5 resilience dimensions with in total 26 attributes are used to classify the threats, e.g., whether threats mainly challenge the technical detection capability, which system layer they affect, and in which resilience cycle phase. For each case, the relevancy of the attributes and hence threats is ranked.

Step 5 on the pre-assessment of risk control and resilience is conducted by the matrix entitled Critical combinations of (non-)performance functions and threats, see column with label 4 of Table 3. It covers the IDs and titles of the (non) performance functions and threats, respectively, and for each combination a semi-quantitative assessment of its criticality. The assessment considers the information provided in the last three tables.

Step 6 on overall resilience quantification is again only in parts covered by the table on Critical combinations of (non-)performance functions and threats. For instance, for each performance function the relevancy of all threats or potential disruptions is considered and hence can be jointly evaluated as described along with Eq. (1) below. Using this equation, also the effect of given threats on all performance functions can be assessed. However, the equation assumes that the threat events are occurring independent of each other. Nevertheless, if a performance function is affected by more than one threat event type, it is a strong candidate for further quantitative assessment, which by definition of the presented approach is not covered within the pre-assessment Step 5 but subject of Step 6 as exemplarily shown in Sect. 5.1.4.

Steps 7 on resilience evaluation and Step 8 on resilience improvement option selection are supported by tables on Technical requirements for security and safety improvement measures including their coverage of user requirements and main relevancy for performance functions and threats addressed as well as a table on Key performance indicators (KPIs) for the technical requirements, see columns with label 5 and 6 of Table 3, respectively. In detail, the technical requirements table structures each requirement using 13 requirement domains related to technical security solutions (e.g., capabilities of UAV-based detections, fiber-sensor and simulation-based technical solutions) with in total 147 subdomains with high technical specificity. Resorting to the already introduced tables, each technical requirement can be related to users, application use case contexts, main end user requirements, system performance functions supported, and threats countered. In addition, a sorting with respect to 6 standard technical requirement types is feasible (e.g., functional, interfacing). This is further supported by providing quantitative metrics for the requirements in terms of KPIs covering 12 technical requirement domains. No KPIs are provided for the implementation of standard component requirements (IMPS). For each of the 49 KPI fields, at least 1 up to 7 indicators are provided resulting in 78 indicators. The indictors are described and a metric and a numeric threshold are provided.

Step 9 on resilience measure development and implementation is not supported by a table. Similar as in the application case telecommunication infrastructure, it is assumed that based on the detailed technical specification requirements table and quantitative KPIs table as just described sufficient input is provided for a technical development supported by domain-specific standards.

Indoor localization system

Using very first example entries given in Häring et al. (2017b), Table 4 gives an overview of tables and matrices used in the case of the indoor localization sample system introduced in Sect. 3.

Table 4 Indoor localization sample data entries and tabular and matrix assessment examples

For the indoor ultrasonic localization system, 6 tables and matrices (columns labeled 1 to 6 in Table 4) support the joint risk control and resilience improvement process of Fig. 1 using table and matrices as proposed in Table 1 covering all 9 steps of the process:

Step 1 on context analysis, Step 2 on system analysis, and Step 3 on system performance function identification of Fig. 1 are supported by two tables. The table on System functions ranking and related users (see column with label 1 in Table 4) covers 8 functions of the localization system on operational system level as relevant for applications in the industrial, service, and health sector and 11 technical system functions as well as related stakeholder users. It is strongly linked with the table on System functions and related system elements (see column with label 2 in Table 4) necessary for the realization of the system functions, e.g., transmitter tags to be localized. Independent of the technical functionalities, the system service functions are ranked on a semi-quantitative scale from 1 to 10. The system is, as described in Sect. 3, divided in 5 subsystems and 8 main component types that are used in the various subsystems.

Step 4 on disruptions identification is covered by the table on Failures, disturbances, and disruptions ranking (see column with label 3 in Table 4). Events have been categorized in system failure including systematic design failures, degradation, or error; external disturbance or disruption; and intentional disturbances. In total, 42 sub-types are provided. The ranking is conducted at the level of sub-types and assessing all system functions.

Step 5 covering the pre-assessment of risk control and resilience is covered by the matrix on Critical combinations of system functions and disturbance causes (see column with label 4 in Table 4). It resolves the relevancy of disruptions and disturbances for the service functions, mainly the non-performance function absolute localization error. Note that for each combination the probability of the disturbance as well as the expected consequences on system service level are estimated.

In this application case of a smaller technical distributed system, the Step 6 on overall resilience quantification is conducted by the sequential experimental assessment of disruption events of different kinds that are ranked to be most relevant across all application domains regarding probability and potential effects, see the table on Disruption effects quantification experiment ranking (column with label 5 in Table 4). Besides the threat description, experimental set-up characteristics are added including room geometry, ultrasound reflection properties, and the time evolution of the threats.

Finally, Step 7 on resilience evaluation and Step 8 on resilience improvement option selection are supported by a ranked table of improvement measures (column with label 6 in Table 4) taking into account all ranked and experimentally assessed potential disruptions as well as system functions and related components subject to potential modification. Considered are minor architecture and interface changes, software and algorithm changes, and improvement of hardware components. As improvement measures are linked in addition to users, system service functions their feasibly can be assessed.

Data analyses performed

The section shows for the telecommunication grid, security system of gas transmission grid and an indoor localization system how to employ the data provided in Tables 2, 3, and 4 by using some of the tabular and matrix approaches proposed in Table 1 to fulfill the joint risk and resilience management process of Fig. 1.

Telecommunication grid

Assessments and quantities accessible for the telecommunication domain

Several (simple) analysis options and related examples are given. They mainly refer to the telecommunication domain, the sample input data of which are provided in Table 2.

Regarding qualitative and discrete analysis, the following (basic) numbers are accessible:

  • The number of stakeholders, objectives, assessment criteria, components, subsystems, system functions, system performance functions, threats, resilience dimensions and respective number of attributes, number of combinations of system performance functions and threats that need improvement measures, and number of iterations of the overall resilience assessment and improvement cycle. All these numbers should be greater than one to ensure minimum formal coverage.

  • Objectives per stakeholder, system performance functions per objective, number of subsystems and components per system performance function, and threats per system performance function, all of which should be greater or equal than one to ensure consistency and coverage.

  • Critical threats per system performance function; number of threats affecting a resilience attribute for each performance function for each resilience dimension, e.g., to answer which resilience cycle phases, resilience capabilities, or system layers are most often affected by threats (this has been, e.g., also used within the EU project SecureGas (2019–2021) on securing the European gas network for the vulnerability, risk, and resilience analysis for potential threats and disruptions); number of improvement measures per critical combination of threats and system performance functions; number of improvement measures per resilience attribute for each resilience dimension (for each critical combination or overall).

It is expected that most assessment quantities only converge after iteration of the process. For instance, the number of critical threats will decrease with iterations. The ambition is that with consideration of improvement measures, all critical threats per system performance functions can be reduced, e.g., to acceptable threats.

Regarding system analyses based on the table contents, (topological) graphs are available illustrating the level of dependence of, for example, components, subsystems, and system functions, as well as system functions on subsystems and components and objectives on system functions. The links can be used to express the level of relation, allowing for more accurate assessment of the expected effects of threats. Examples are given in Fehling-Kaschek et al. (2019).

Further assessment options are presented in Table 1, which lists several relation matrices that are not made explicit in Table 2, e.g., the relation between threats and improvement measures.

Regarding semi-quantification and quantification, the use of overall resilience quantities is recommended. Resilience quantities should cover sufficient resilience dimensions for all system performance functions and threats. Examples for system dimensions are system layers, resilience cycle phases, and (technical) resilience capabilities, see, e.g., Häring et al. (2016a) for further resilience dimensions. In this way, the total overall risk considering system performance functions, threats, and resilience dimensions reads

$${R}_{res}=\sum_{i=1}^{{N}_{threat}}{P}_{i, PF\left(i\right),T\left(i\right), RD\left(i\right), RA(i)} {C}_{i, PF\left(i\right), T\left(i\right), RD\left(i\right), RA(i)},$$
(1)

where \({N}_{threat}\) labels the different threats considered for which probabilities (frequencies, likelihoods) and consequences (effects, impacts) are determined. In Eq. (1), \(PF\left(i\right)\) is the set of performance functions affected by the threat, \(T\left(i\right)\) the set of threat types the identified threat belongs to, \(RD\left(i\right)\) is the set of resilience dimensions relevant for the categorization of the threat, and \(RA\left(i\right)\) are sets of attributes (one set for each resilience dimension) relevant for each resilience dimension. For each risk event, all sets are required to be not empty: \(\left|PF\left(i\right)\right|\ge 1\), \(\left|T\left(i\right)\right|\ge 1\), \(\left|RD\left(i\right)\right|\ge 1\), and \(\left|RA\left(i\right)\right|\ge 1\). Thus, each risk event affects at least one system (non) performance function, can be attributed to at least one threat category, categorized with at least one resilience dimension (e.g., system layers), and can be sorted into at least one resilience dimensional attribute (e.g., physical layer). The total risk on risk control and resilience objectives in (1) is an extreme quantity in the sense that the consideration of further potential risk events and threat types as well as resilience dimensions will not significantly increase the risk, i.e., add additional significant risk contributions.

Let \({N}_{PF}\) be the number of all performance functions considered. Then for \(1\le j\le {N}_{PF}\)

$${R}_{res}\left(j\right)=\sum_{i=1}^{{N}_{threat}}{\left(PF\left(i\right)=j\right)}_{lb} {P}_{i, PF\left(i\right),T\left(i\right), RD\left(i\right), RA(i)} {C}_{i, PF\left(i\right), T\left(i\right), RD\left(i\right), RA(i)},$$
(2)

is the total risk for each system performance function, where the logic bracket has been used which evaluates as one if the statement is true and zero otherwise. As risk events can be attributed to several performance functions, one has only \({\sum }_{j=1}^{{N}_{PF}}{R}_{res}\left(j\right)\ge {R}_{res}\). However, Eq. (2) provides a risk ranking for each performance function.

In a similar way as in (2), threat types can be ranked using the number of threat types \({N}_{T}\), the relevancy of resilience dimensions using the number of resilience dimensions \({N}_{RD}\), and resilience attributes using the number of resilience attributes within each resilience dimension \({N}_{RA}\left(k\right), 1\le k\le {N}_{RD}\). Examples for system performance functions, threats to be considered, resilience dimensions and attributes used are given for each application case in Sects. 4.1, 4.2, and 4.3 as well as respective sample tables in Sects. 5.1.2, 5.2.2, and 5.3.2.

Equations (1) and (2) employ risk addition (superposition) of single risks to achieve coverage of all risk aspects on system level. They are not normalized and can be used for relative comparison of system modifications, system versions, and system improvement options when anticipating their effects and when assessing their effects in a second iteration of the overall resilience management process.

For further illustration, Eqs. (1) and (2) can also be applied solely to the phases up to absorption of a disruption events (e.g., prevention, building protection, detection, immediate consequences) to cover risk control as well as only to the specific resilience cycle phases during and post events (e.g., response (stabilization), recovery, adoption, and learning) to cover resilience improvement. Similarly, it can for instance be distinguished between threats affecting engineering-technical system layers (e.g., physical, engineering, and cyber) and all non-technical layers (e.g., operational, decision making, and policy). Furthermore, socio-technical resilience capabilities could be assessed separately, considering the options to detect (sensing, situation awareness), to represent, to model and decide (representing, sense making, and decision making), and to act and improve (activation, reconfiguration, adoption, action). Using these perspectives, a strong (even somewhat redundant) focus can be placed on post-event assessment and post-event capabilities, thus thoroughly covering the resilience aspect in addition to classical risk control.

Using Eq. (1) and related sub-sums, classical risk matrix plots are accessible, e.g., all risks identified post event for a selected performance function, or all risks for a selected post-event phase, etc. By providing acceptance criteria (e.g., green: acceptable, yellow: improve if feasible and reasonable, and red: should be reduced), single risks can be evaluated, e.g., in terms of expected monetary loss, see examples given in Sects. 5.1.3 and 5.1.4.

Data visualization can assist users in digesting risks. Bubble charts are accessible and can be used, for instance, to visualize risks with bubble size depending on risk within a system performance versus threat matrix for all combinations or sets of combinations of threats, e.g., to compare natural, anthropogenic, accidental, intentional (sabotage, criminal), and terroristic threats, see Fig. 6 for an example.

For overall or group risk assessment besides the risk matrix determination and evaluation, the use of modified FN diagrams and related criteria is proposed (see, e.g., Proske 2008 for an introduction). For separate consequence categories (e.g., injured, fatalities, monetary loss, environmental damage) the following tuples are convenient:

$$\left({C}_{cat},P(\text{events with} {C}_{i, PF\left(i\right), T\left(i\right), RD\left(i\right), RA(i)}\ge {C}_{cat})\right), {C}_{cat}>0,$$
(3)

where \({C}_{cat}\) is typically increased by a factor (e.g., 2 or 10) to allow a double logarithmic plotting of the group risk tuples. In (3) for each system performance function and threat combination, all probabilities are combined. This is acceptable given that the effects are measured using the same consequence category.

Overall comparisons are accessible if there are joint scales for probabilities and consequence categories, respectively. For instance, regarding fatalities (and related financial quantifications) classical FN criteria can be used, e.g., the Dutch FN criterium with F(1)=\(1.0\times {10}^{-3}\) per year and aversion factor 2, hence, e.g., F(10)=\(1.0\times {10}^{-5}\) per year, see, e.g., Trbojevic (2005). For further criteria, see, e.g., Spouge et al. (2014) for risk level and acceptance criteria for passenger ships as an example for a domain-specific FN criterium selection.

Sample tables for the telecommunication domain

The sample tables used in this Section come from one of the use cases of the RESISTO project, see RESISTO D3.9 (2020). Further examples for performance-based resilience analysis can be found in Häring et al. (2020). For each use case in RESISTO, a full analysis has been completed, including the tables mentioned in this paper. In this general example, a telecommunication network experiences a multitude of common threats faced by telecommunication network operators including cyber-attacks such as distributed denial-of-service attack (DDoS) attacks and physical threats like cable cuts or break ins. As mentioned in Sect. 3, the input for these tables comes from experts as well as the use case description.

The main objectives and related stakeholders within the sample application are summarized in Table 5. The stakeholders and objectives were determined with a survey sent to telecommunication network operators within the RESISTO project. Questions in the survey that helped with the context analysis included defining the industry the company operates in, if there were separate or joint physical and cyber security teams within the company, and the amount of the IT budget that was earmarked specially for IT security. The main objective is a refinement of use cases with the goal of identifying critical threats, in particular those which can be addressed within the instruments provided by project partners. The joint risk and resilience management approach was applied to determine the as-is risk control regarding system objectives and resilience ranking considering already implemented measures of 4G telecommunication standards. The ranking did not consider the feasibility of countermeasures and improvement measures as planned during the project or similarly in real-world applications. However, it was used as input for such decision making to determine the most efficient countermeasures and improvement measures.

Table 5 Examples of the main stakeholders and their objectives

The process of obtaining this information from the operators began with the creation of the spreadsheet tables. The tables were created with the operators’ time constraints in mind. The drop-down menus were predefined, and comments with explanations were added to each column. The operators were given a set deadline by which to return the filled out tables. Once the tables were returned, the use case descriptions were added. In this way, a full picture of the scenarios and the operators’ specific networks was created. The tables were then sent back to the operators for an evaluation. In many cases, one-on-one meetings occurred with each operator to discuss the specifics.

Figure 2 is an excerpt from the table for the system components. For each of the components, additional information is acquired, as discussed in Table 2, including the subsystem, the type, the quantity, and the interconnections. Some of these columns are formatted as drop-down menus to limit the responses to a few categories. This is done for the subsystems, in which case responses were limited to different networks common to telecommunication. The component type also has a drop-down menu and is limited to common components like software or hardware. This use case had other system components besides the ones seen in the figure, including workstations and servers, network security equipment, and equipment shelters and sensors (Miller et al. 2020).This corresponds to the input for row 3 of Table 1 on system analysis, namely the provision of system elements such as subsystems and components and their relation.

Fig. 2
figure 2

A few of the system components provided for the sample use case

When filling out this first table, operators had to decide how in-depth to make their list of components. In a few cases, to save time, operators might put broadband network or simply fiber optic (FO) infrastructure. Many times, operators did not break down the different components or input the smallest components. However, they might put the list of components in the description. For example, if TV headend were the main component, the description might include “coders, multiplexors, and others.” This presents a challenge for future steps of the resilience management cycle, as the components listed in the spreadsheet tables may not be too general for the network flow diagrams provided, and thus may not be the ones that are simulated.

Figure 3 provides the system functions for the use case. Many of the use cases shared similar system functions, such as voice services, mobile and fixed data services, and connectivity. This is because all operators and telecommunication networks are providing a service for customers and need to be able to measure the availability of the service. This corresponds to row 4 of Table 1 on system function analysis.

Fig. 3
figure 3

Some of the system functions defined for the sample use case

An important column when defining the system functions is that of the linked system components. Defining how the system functions link to the system components defined in the earlier table is the basis of the correlation matrix that is determined later. As seen in Fig. 3, multiple components can be linked to a single system function and components can be linked to multiple functions.

When discussing the system functions, operators mentioned Service Level Agreements (SLA) and Service Level Objectives (SLO). SLAs are the agreements the operators have with their customers that define the requirements for performance. SLOs define the penalties if the objective or agreement is not met. The SLOs can be used when defining the system functions. Examples of SLOs include monthly availability, downtime incidents, packet loss, and jitter. Utilizing the SLOs when completing the tables allows for the operators to have more context when selecting the impact that a threat may have on the network as many of the penalties are economical.

Threats are defined in Fig. 4. These were defined by the operators as well as based on the use case scenarios. The threats have many characteristics, as mentioned in Table 2. Figure 4 covers input to Table 1 on threat and disruptions and the pre-assessment of such events up to overall assessment (if no cumulative risk and resilience analysis is conducted), i.e., the joint risk and resilience assessment Steps 4 to 6. The frequency and impact (either economic or social) are used to create a hazard matrix, as seen in the next section. For the correlations, the linked components and system functions are recorded. To get a full idea of the threats experienced by telecommunication networks, cyber, physical, and cyber-physical threats are listed. The cause of each hazard is also defined to be natural, man-made (attack), man-made (accidental), or a technical/system failure.

Fig. 4
figure 4

An example of the threats provided for the sample use case and their assessments

It is interesting to note that when operators were compiling threats, a few mentioned difficulties in selecting the impact economically. For example, if there is a data exfiltration, the economic impact can vary depending on which data are leaked. If the leak is of confidential data about the customers, this could lead to fines within the EU. If it is not, there would be no fine. These fines can cost the operators significant money, so the difference between the two is quite large and would have very different economic impacts. In this case, two different threats were created, one data exfiltration that led to a fine and one that did not.

Improvement measures are defined in Fig. 5. Each threat has components and threats linked to it. Therefore, all of the tables are now linked together and the relationships between all four tables can be visualized. These improvement measures may already be implemented in the networks today. Some operators had a degree of difficulty listing improvement measures. This was especially the case regarding 5G networks. 5G networks have many improvements over 4G, such as having the ability to create new virtual components on demand and generally having more virtualization. The question was raised as to whether these characteristics of 5G networks could be used as specific improvement measures.

Fig. 5
figure 5

Excerpt from the improvement measures defined by operators for the sample use case

Sample matrix assessments and quantities for the telecommunication domain

Once the tables have been filled in by operators, an analysis of correlations and relationships can be completed, using many of the tools mentioned in Sect. 5.1.1. The results below include a correlation matrix, or a bubble chart, as described in Sect. 5.1, and a risk matrix plot, also described in the previous section. These results were then shared with the operators.

The correlation matrix, Fig. 6, depicts the relationships between the threats and the performance functions. The circles of dark blue are the more critical combinations, meaning these combinations should be further investigated in the resilience cycle. For example, the relationship between power outages and voice services has a very dark circle indicating a critical combination. The rest of the steps in the resilience management cycle, such as the simulations and resilience quantification, would focus on this relationship. The simulations would have voice services as a performance function and a power outage as the simulated threat.

Fig. 6
figure 6

The correlation matrix for the threats and the system functions. The bubble size and color give the measure of criticality. The bubbles that are dark blue and larger indicate a more critical combination

To get a better idea of the most relevant threats, the threats are ranked with the following equation, where EI is the economic impact, SI is the social impact, and FQ is the frequency of the threats. All of these attributes are defined by the experts in the threats table (Fig. 4). Each of these attributes is originally defined on a scale (low, medium, high) which is then transformed to numerical values. Depending on which aspects of the threat are most relevant for the particular inquiry, the equation to calculate the risk and resilience score may be adjusted:

$$Score=\left(EI+SI\right) FQ.$$
(4)

When inspecting Eq. (1) introduced above, it can be inferred that Eq. (4) is a special case that incorporates within the economic and societal impact the costs of response and recovery and any improvement measures. The users decided not to separate risk control and resilience generation cost assessments. However, it was ensured that they considered the costs until full recovery of threats and disruptions, in particular of more frequent ones.

In Fig. 7, the score can be seen in green. Each of the threat attributes that corresponds to the specific threat score can also be seen: frequency (FQ) in light blue, social impact (SI) in dark blue, and economic impact (EI) in black. The threats are ranked from the highest score to the lowest. The highest-ranking threat is a DDoS attack; this is logical as it has the largest frequency. Threats considered include fiber optic (FO) infrastructure cuts, power outage in mobile switching center (MSC) sites, and many more, see the y-axis labeling of Fig. 7.

Fig. 7
figure 7

The threat ranking for the use case. A score is calculated for each threat that incorporates the economic impact, the frequency, and the social impact. All scores are measured on the same scale

Finally, the threats can be organized in a risk matrix plot. For an introduction to risk matrix plots, see Sect. 5.1. Within this matrix, the threats are positioned corresponding to their economic impact and frequency considering all resilience cycle phases. A high economic impact and a high frequency results in the red zone. Most of the threats for this particular use case fall in the green zone, indicating they are low risk hazards. DDoS attacks, however, fall into the orange zone, indicating that this threat should be mitigated. As the economic impact is already low, this can be done by reducing the frequency. The hazard matrix is another way to organize the threats to determine which ones need to be further addressed.

Once the results were shared with the operators, the discussion centered around the accuracy of the results. For example, in Fig. 8, the highest-ranking threat is a DDoS attack. However, this is a common attack that the operators expect to occur. Therefore, the operators have mitigation measures in place to prevent or reduce the damages of an attack of this type. These mitigation measures are not considered within the threat ranking or hazard matrix. In this sense, the threat analyses do not give very clear answers on which threats need to be further investigated, as the ones that rank highly may already be very well covered by mitigation measures.

Fig. 8
figure 8

The hazard matrix for the different threats. This hazard matrix is based on frequency and economic impact; however, the matrix can also be created with the social impact or with a combination of these impacts

Simulation supporting tabular and matrix assessments for the telecom domain

To determine the impact of the implementation of improvement measures, the analysis is run again. To do this, the values in the spreadsheet tables are updated. For a DDoS attack, an improvement measure would be an anti-DDoS appliance. When this improvement measure is incorporated into the analysis, the frequency of DDoS attacks decreases. This change in frequency can be seen in Figs. 9 and 10. While a DDoS attack is still ranked the highest, it is no longer in the orange region in the hazard matrix. For a data exfiltration, the improvement measures of more training, governance, and alerts can reduce the economic impact of the attack. When improvement measures are incorporated, the attack has a smaller impact, and data exfiltration also changes position in the hazard matrix. This change is also evident in Figs. 9 and 10.

Fig. 9
figure 9

The new threat ranking with improvement measures incorporated for a DDoS attack and a data exfiltration attack. A score is calculated for each threat that incorporates the economic impact, the frequency, and the social impact. All scores are measured on the same scale

Fig. 10
figure 10

The new hazard matrix for the different threats with improvement measures incorporated for a DDoS attack and a data exfiltration attack. This hazard matrix is based on the frequency and economic impact; however, the matrix can also be created with the social impact or the combination of the economic and social impacts

To see the effects of the improvement measures more clearly, a simulation was completed, the results of which can be seen in Fig. 11. This simulation highlights how improvement measures taken against a DDoS attack can have an effect on the repair time and/or the probability of an attack. The best results occur when improvement measures affect both the repair time and the probability of attack (see the green line in Fig. 11). For more details about the simulation, see Fehling-Kaschek et al. (2020).

Fig. 11
figure 11

Simulation of a DDoS attack with improvement measures included. Improvement measures can either improve the mean time to repair or the probability of occurrence or both; see the blue lines, orange lines, and green lines, respectively. The original curve is a dashed black line. The analysis considers the removal of 4 different representative nodes of a telecommunication grid consisting of about 20 nodes. The effect of the removal is shown for three cases: the effect on the most important connected component; on all working components; and on the voice services of the overall system

As seen from the analysis output, much analysis is completed based on the tables. Already only halfway through the resilience management process and clear information has been uncovered regarding the relations of components, system functions, threats, and improvement measures. The most critical combinations are determined, with the results setting the stage for the latter half of the resilience management process, during which implementation and the analysis of counter and improvement measures take place. It was shown that even before detailed risk and resilience simulation and quantification takes place, the main issues and expected improvements, at least within this telecommunication sample case can be identified.

In addition, the application case showed how more detailed tabular as well as numerical assessments allow for a more accurate determination of the level of risk control of and resilience to identified threats and disruptions. Such analysis was also shown to be useful for the selection of countermeasures and improvement measures. In particular, in the example given, the detailed analysis of the DDoS attacks confirmed the tabular analytical assessment of the effects of countermeasures (see Figs. 9 and 10).

This example shows how to conduct tabular overall risk and resilience assessment (Step 6) and resilience acceptance evaluation (Step 7) of the risk and resilience management process, as well as demonstrating the selection of improvement methods (Step 8), see Table 1, respectively. This example also illustrates how to design overall risk and resilience assessment and improvement measure selection matrix tables (as according to Table 2).

Gas grid

Assessments and quantities accessible for gas grid

In the case of the gas network, based on the input data of Sect. 4.2 as detailed in Table 3, using only tables and matrices, at least the following assessments are accessible:

  • Prioritization and categorization of regulatory, organizational and operational requirements for gas grids with focus on security and safety systems;

  • Identification of stakeholders for given business cases and requirements;

  • System function prioritization in terms of fulfillment of requirements;

  • Identification of system components necessary for system functions;

  • Threats prioritization taking account of several resilience dimensions (considering all system functions);

  • Identification of critical combinations of system functions and threats;

  • Security system functional and technical requirements prioritization;

  • Identification of system components necessary for fulfillment of security system functions;

  • Coverage of user requirements by security system functions;

  • Threats countered by security functions;

  • Key performance indicators (KPIs) to quantify requirements for security system functions.

In Sect. 5.2.2 for some of the assessment sample, tables and matrices are provided.

Sample tables and matrices for gas grid

Table 6 shows the legal and regulatory, organizational and operational requirements for security- and safety-related systems of transmission gas grid networks, prioritized from an end user and stakeholder perspective. It gives examples for selected representative highly prioritized requirements. Regarding regulatory and legal requirements, further examples include EU Directive 2004/67/EC, Council Directive 2008/114/EC, EU Regulation 2009/715, EU Regulation 2010/994, EU Regulation 2016/67, EU NIS (network and information security) Directive 2016/1148 as well as the Charter of fundamental Rights of the European Union 2010/C 83/02 (SecureGas D1.1 2019). Example for national regulations is the Italian Law No. 481/1995 covering competition rule compliance of utilities services.

Table 6 Examples of legal/regulatory, organizational and operational requirements

For organizational requirements, one examples is given in Table 6. The security management systems is also asked to operationally align with the standards ISO 9001 ISO 14001 ISO 22301 ISO 22396 ISO 27000 ISO 31000 and ISO 55000 and covering in particular asset management, risk control, and business continuity best practices. Also, organizational requirements are formulated regarding management systems for pipeline integrity, IT security, emergency/disaster, life cycle, operations integrity, and asset integrity, including operation within the context of the operation and maintenance manual and the crisis manual.

Operational requirements cover a wide range, see Table 3 for an overview, see row 9. Categories identified comprise beyond the six examples within Table 6:

  • Confidentiality, data protection, and safety (CONF), e.g., software and hardware secure, safe and resilient, authentication and authorization, encryption;

  • Conditions (COND), e.g., “plug and operate” for example when relocating and adding new sensors, various/all threats, flexible with respect to legacy and new system elements, different extensions of facilities from 1 km to 1000s of kilometers, resource scalability;

  • Interoperability (INTER), e.g., with existing systems, generating output for existing systems, interoperability with mobile device, operational interoperability;

  • Detection, situational awareness, and decision support (DSD), e.g., cyber threats/attacks, landslide hazards, intrusion and motion detection, third-party interference detection, leak detection, drone detection, fire/heat/explosion detection, asset manipulation, alerting, alert confirmation, accuracy of detection localization, risk level of event generation, decision support and action recommendation, sharing information with the public, simulation capability, compliant storage of supervision data, manual alert, detection of non-available subsystems/sensors;

  • Usability (USA), e.g., user friendly, multilingual interface, maintainability of security system, modularity, accurate information, replaceability/back-up, short recovery time/ less than a couple of hours, high availability, training;

  • Information management (INFOR), e.g., information filtering based entity involved, filtering based on importance/criticality, based on response level, information classification and categorization, combination of security management data with other data such as maps, CBRNE risks, exposition, weather forecast, event register, incident reporting; and

  • Cost (COST), e.g., cost efficiency and low after-sales service costs.

Examples for management systems include Pipeline Integrity Management System, Safety Management System, Security Management System, Emergency/Disaster Management System, Life Cycle Management System, Operations Integrity Management System, Asset Integrity Management system, Operation and Maintenance Manual for Natural Gas Distribution Networks, and Crisis Manual for Natural Gas Distribution Networks.

In total 79 requirements, consisting of 15 legal and regulatory, 16 organizational, and 48 organizational requirements, are prioritized using the three categories high (absolutely needed), medium (important), and low (interesting) (SecureGas D1.1 2019). They are used together with technical requirements and threat criticality assessment to prioritize system (resilience) functions.

Table 7 shows sample entries of gas security and safety non-performance functions. Other examples of mainly non-performance system functions include (SecureGas D2.3 2019) the number of unauthorized interferences with pipeline; number of leaks; number of damage events due to operator failure; number of validated or non-validated security threats or alarms; number of attacks to various IT devices; pipeline temperature; average time to complete tasks; mean times to detection, to appearing in control system, to response and to repair; downtime and availability of pipeline sections, nodes and consumer supply at main nodes; number of unplanned stops; times allocated to training, administration and management; delayed works of repair or renovation; amount or ratio of valves’ remote control; cost benefit ratio for prevention and for mitigation; cost per incident; and operational cost.

Table 7 Sample gas security system non-performance functions and related gas infrastructure components

According to the legal, regulatory, organizational, and operational requirement to cover all threats, a further prioritization option of technical requirements is the assessment of the level of risk control and resilience achieved regarding known threats and potential disruptions. To this end threats are categorized into 12 categories and 98 sub-categories or threat event types, see in Table 3 the corresponding lines. Using 5 resilience dimensions with in total 26 attributes and a semi-quantitative scale, the level of coverage of threats by current best-practice transmission grids can be assessed. Table 8 shows for sample threats high-ranking resilience dimensional attributes.

Table 8 Sample threats/disruptions and how they are assessed using resilience dimensions and attributes

Table 9 shows examples assessing several combinations of gas system non-performance functions and threats or disruptions based on inputs of Tables 6, 7, and 8. In addition, requirements for technical improvement measures and related functional key performance indicators (KPIs) are added, which are further detailed in Tables 10 and 11. In Table 9, for brevity only the titles of each entry are given without further resolution regarding ID-coding of the entities and further classifications as detailed in Table 3 (see column with label 4).

Table 9 Examples for critical combinations of system non-performance functions and threats
Table 10 Sample technical requirements and how they cover the legal, organizational, and operational risk control and resilience enhancement requirements
Table 11 Key performance indicators (KPIs) fur improvement measures (security system)

Technical requirements cover mainly within operational requirements contexts qualitative and quantitative descriptions of additional system functions and resilience functions that are expected to support the risk control and resilience enhancement of gas distribution systems. According to Table 3, a ranked listing and coverage assessment of user requirements as given in Table 10 is feasible. In addition, as listed in Table 3, the complete coverage of all user requirements can be assessed by listing for each user requirement the technical requirements that contribute to its coverage.

In total, 7 technical requirement types with in total 148 sub-types were considered, to cover all user requirements, see Table 3 for an overview.

Combining technical requirements with key performance indicators (KPIs) allows a quantification of the improvement measures as input for security system specifications and developments. To this end for each technical requirement domain, e.g., requirements related to UAVs functionalities or to blockchain, see Table 10, for each requirement type relevant, e.g., functional or interface, indicators are described along with a metric and a target value. Again, respectively Table 3 gives an overview on the different entry types. In this way, 78 indicators are defined which belong to 49 fields of indicators that are related to 12 technical domains of security systems, see Table 11 for example entries.

Sample assessments and quantities for gas grid

The main goal in the application of the tabular approach for the gas grid application are a ranking of technical requirements for several technical domains that were refined during the application of the approach for advancing gas security and safety systems countering cyber-physical threats. To this end, user requirements are ranked (see as example Table 6) as well as system non-performance functions (Table 7) and threats (Table 8). In addition, critical combinations of system performance functions and potential threats and disruptions are assessed (Table 9). This allows a ranking of security system technical functions (Table 10), including the provision of quantitative KPIs (Table 11).

The overview and discussion of assessment options along with Table 3 and in Sect. 5.2.1 show that already 6 tables and matrices are sufficient to achieve the main goal for the application case gas grid.

Indoor localization system

Assessments and quantities accessible for indoor localization system

Based on data collected according to Table 4, the following assessments are accessible:

  • User and stakeholder listing and ranking;

  • System analysis in terms of subsystems and components;

  • Identification of system functions and technical functions subsystems and components involved;

  • Ranking of system functions and related technical functionalities;

  • Overall disruption, failure, and disturbance categorization and ranking;

  • Criticality ranking of combinations of system functions and disruptions;

  • Ranked list of tentative experiments to assess critical combinations (critical scenarios);

  • Experimental assessments of critical combinations;

  • Overall risk control and resilience assessment of critical combinations;

  • Ranked list of improvement options based on experimental results;

  • Relevancy of improvement options for system functions, technical functionalities, subsystems and components.

For each critical scenario, experiments can be repeated to generate representative resilience answers of the system. As an example, for the main non-performance measure time-dependent absolute localization error a quantitative assessment can be obtained in the following way. First, for each distinct disruption scenario the types of resilience answers are identified. Second, for each resilience answer type, phases of the answer type are identified. Third, for each resilience response answer phase quantities are extracted from the non-performance function to characterize the phase quantitatively. For instance, in the absorption phase high resilience in the present case can be defined to be related to a small increase of the localization error. It should occur in a short time when compared to the time scale the user needs for applications to avoid too long latency of response of the localization system to disturbances.

Let \(i = 1,\,2, \cdots ,N_{{{\text{event}}}}\) be the number of different event types, \({N}_{ij}\) be the number of phases considered for each event type and

$${q}_{ijk}=\left\{\begin{array}{c}\Delta {t}_{ijk}/{\Delta t}_{ijk}^{scale}\\ \Delta {q}_{ijk}/{q}_{ijk}^{scale}\\ {m}_{ijk}/{m}_{ijk}^{scale}\\ {A}_{ijk}/{A}_{ijk}^{scale}\\ \cdots \end{array}\right.$$
(5)

for \(k = 1,\,2, \cdots ,N_{{{\text{ijk}}}}\) be the number of dimensionless quantities considered for each phase of each event type to assess the risk control and resilience achieved for each event type. The sample quantities given in (5) use measures for time duration, for difference of the (non) performance function, for slopes and for areas of performance loss, respectively, measured in scales relevant for the event type and phase considered, e.g., the time scale \({\Delta t}_{ijk}^{scale}\), etc. For each quantity, it can be decided if it should be minimized or maximized for overall risk control and resilience. A total measure of resilience then reads

$${R}_{\text{tot}}=\sum_{i=1}^{{N}_{event}}{P}_{i} {C}_{i}=\sum_{i=1}^{{N}_{event}}{P}_{i} \prod_{j=1}^{{N}_{ij}}\prod_{k=1}^{{N}_{ijk}}{{q}_{ijk}}^{{a}_{ijk}} ,$$
(6)

where \({P}_{i}\) is the probability (frequency, likelihood) of an event type, \({C}_{i}\) is a measure for its consequence, \({a}_{ijk}=1\), if the quantity should be minimized (using direct proportionality) and \({a}_{ijk}=-1\), if the quantity should be maximized (using indirect proportionality).

In Eq. (6), instead of the multiplicative measure for each event type also an additive measure or a logarithmic measure could be used. In the former case, the last two multiplication operators are replaced by sums in Eq. (6). In the latter case, in addition a logarithm is applied to the addends. In the example application for the use case localization system, Eq. (6) sums over threat events already identified as critical for all system performance functions and could be further resolved according to Eqs. (1) and (2).

Sample tables and matrices for indoor localization system

Users and stakeholders main groups identified include actual users often without any technical background, persons responsible in teams for the use of localization systems that are interested in simple instructions and seamless operation, technical staff interested in efficient maintenance, and management persons interested in data-driven digital production or service process optimization, see the categories given in Table 4.

Regarding system service functions, the main functions turned out to be cost-efficient coverage of areas or volumes and low localization error sufficient for the application context, typical in the order of decimeters. For instance, the concept of localization error visualization was assessed to be rather complicated for many applications. Similar arguments were given against prioritizing other system performance measures very high as listed in Table 4.

Based on the table of ranked system performance functions, ranked potential disruptions, and mainly the matrix of critical combinations of system functions and disruptions, Table 12 of experiments was generated to experimentally assess the criticality of scenarios.

Table 12 Sample critical combinations of system function and disruptions

Sample experimental resilience assessment quantities for indoor localization system

For the sample experimental assessment, the probabilities in Eq. (6) can be estimated. All other quantities in (6) are extracted from experimental data on system response regarding critical disruptions as identified in Table 12. Figure 12 shows how barriers at different positions for given tag position affect the localization error. The localization error increases from ca. 10 cm up to significant fractions of a meter and even several meters if several receivers are covered.

Fig. 12
figure 12

Example absolute experimental localization errors for disruption scenarios that ware assessed as potentially critical

Figure 13 shows the assessment of all sample disruption scenarios using the total resilience measure defined in (6). Three different measures are used. The last two clearly identify 5 scenarios as critical. In all cases, the 7-th experimental scenario is assessed as most critical. The three measures according to (6) can be distinguished as follows: (1) uses all options namely scaled time durations, performance ratios to measure performance change before and after disruptions, scaled slopes and scaled total performance loss area; (2) uses scaled time differences, performance ratios and areas; and (3) uses only scaled performance loss area. In summary, the last risk control and resilience measure for event types was for the shown example the most robust one. However, the second was more sample specific. The first introduced terms that diverged without being related to major application specific implications.

Fig. 13
figure 13

Three different normalized measures of consequences of lack for risk control and resilience for each experimental scenario of Table 12

The assessment of the criticality of the events can be used to conduct (minor) design changes and in particular localization algorithm changes for given geometries. In the present case, for instance the rate of localization updates was increased, the influence of past positions was decreased, and a software flaw was removed that disturbed the switching on and registration of tags.

As the probability of scenarios strongly depends on application contexts, e.g., localization of material for production versus localization of customers in restaurants, the sample experimental quantifications, as identified to be relevant using the tabular and matrix approach in Sect. 5.3.2, are conducted focusing on different scenarios only. If different design options are compared, it is found to be favorable to use the total measure of risk control and resilience as given in Eq. (6) using all known potential disruption scenarios with estimated probability factors.

Recommendations, practical implementation proposals, and managerial insights

The presentation of the approach shown here provides direct implementation guidance by providing well-defined process steps and supporting tables for each step. For implementation, the following tabular framework is deemed sufficient (see also the bold-typed tables and matrices in Table 1), and recommended for practical implementation:

  • Generation of the table of process steps, which should include process step names, objectives, and approaches used, and in particular supporting tables and matrices. They can be based on Fig. 1 and Table 1 which contains a superset of tables to be used.

  • Generation of a master table as provided for the application examples in Tables 2, 3, and 4, where duplications should be carefully avoided and similar entries should be systematically reused.

  • Filling of tables as planned within a spreadsheet application or using a computer algebra and statistics package such as R and the shiny package (Chang et al. 2019) as used in the first application example. See examples in Sects. 5.1.2, 5.2.2 and 5.3.2.

  • Evaluation of tables as discussed for the examples in Sects. 5.1.3, 5.1.4, 5.2.3, and 5.3.3.

  • Executive verbal summary and evaluation of overall risk reduction in terms of classical risk control and resilience improvement.

Main advantages of the presented approach from a practical management perspective include that it consists of an iterative generation and updating of tables and matrices that do not require demanding methodologies and tools per se. Furthermore, the joint risk control and resilience analysis and management process has been shown to be conformal with ISO 31000 (Häring et al. 2017a). This facilitates from a management perspective to identify responsible persons, as ISO 31000 and related standards are by now well established.

The tables and matrices are capable to summarize and to include existing in-depth analyses as well as to identify the need for further such assessments and quantifications, see Sects. 5.1.4 and 5.3.3 for examples. Thus, a further main advantage is reducing the overall effort by requiring for most cases only concise and well-documented expert assessment instead of resource-intensive risk and resilience quantification. From a management perspective, the summarizing capability is well suited for overall steering and control of resources deployed.

A further advantage is that the tabular and matrix approaches are well established in terms of different types of (preliminary) hazard analyses, HAZOP, and FMEA-type assessments, see, e.g., Ericson (2016) Crawley and Tyler (2015), Tietjen and Decker (2020), and Carlson (2012). Thus, the approach builds on known and lived practices, often only by addition of additional columns, see the examples given in the application cases. Further general arguments for the suitability of the analytical approaches are given in Häring and Gelhausen (2018).

The approach enables management stakeholders to consider business-relevant systems and business cases including already implemented risk control and improvement options. For such systems, they can identify already existing risk control and resilience measures that have not yet been considered so far (including potential implicit approaches) as well as identify the need for further security and safety systems. Both applications can be used to document compliance with standards. In addition, the approach is capable to generate technical specifications of improvement measures and related KPIs, see the use case gas transmission network.

A most concise summary of the criticality of threats is the pre-assessment of the criticality of combinations of system performance functions and (multiple) threats as required in Step 5 of Fig. 1 taking into account all options of risk control and resilience improvement by different resilience concepts and respective resilience dimensions and attributes, see Sect. 2. Within further iterations, this assessment takes up more quantitative results of Step 6. Thus, it provides a matrix of the level of criticality of events to business services that is well suited for the evaluation at management level in Step 7. In particular, it allows management to identify risks to key system service functions as relevant from business perspective and key threats to such services.

Conclusions

This paper confirms that tabular and matrix approaches within the process framework of traditional risk management, such as the hazard list, hazard analyses, and FMEAs, have substantially contributed to the success of risk management. We argue that tabular and matrix approaches can also be leveraged for performance-based resilience management, which both incorporates and substantially extends traditional risk management. This is reasonable because classical risk control may be defined to focus on successful disruption event avoidance, i.e., extending the mean time to failure, and reducing the initial loss, increasing robustness, and reducing vulnerability, whereas supplemental improvement in system resilience can focus on fast response, i.e., stabilization, fast recovery, and even improvement and learning.

This extension from risk control of components to overall systemic risk management driven by resilience concepts opens new innovative ways to achieve the overall objective of highly available, reliable, resource-efficient, safe, and secure systems. This can be accomplished by, for instance, very fast recovery, short-time redundancy only on demand, and/or reconfiguration while using limited resources.

The material presented in this paper provides process steps, process step objectives, and several tables or correlation (dependency) matrices for each process step, including headings of table columns and of rows for each table or matrix. This work documents how this approach has been implemented for the telecommunication domain, for gas networks and an indoor localization system. In addition, further examples are discussed. A minimum set of tables and matrices to be used has been proposed by providing a master table (Table 1) along with the joint risk control and resilience analysis and management scheme (Fig. 1). For each application, case-specific master tables have been derived (Tables 2, 3, and 4).

The advantage of tabular approaches is demonstrated in terms of qualitative, discrete, semi-quantitative, and quantitative evaluations. It is shown that risk and resilience quantities are available for single threats (e.g., threat ranking in terms of performance functions, of resilience cycle phases, or of system layers affected), for performance functions (e.g., extended risk matrix for all resilience cycle phases), and for overall risk (e.g., overall risk for performance functions, overall consequences of threats, extended and modified risk matrices and FN diagrams). Completeness and consistency requirements can be defined and assessed, as well as convergence effects of the iterative assessment and improvement approach. Due to the highly interlinked nature of the approach, it is also expected that implicit knowledge surfaces, often due to the many dependency matrices generated (e.g., relations between components, subsystems, system functions, and threats).

The present approach shows that it is suitable to identify which system model-based quantitative risk and resilience computations and simulations should be conducted. In addition, it can be used to support the collection of information and data necessary for setting up the models. This was detailed through sample tables and matrices as developed within the EU project RESISTO for critical telecommunication infrastructure or the definition of critical scenarios for experimental determination of resilience of a localization system.

In the application case of the transmission gas grid, it was shown within the EU project SecureGas that the approach is useful to identify improvement measures of security systems countering cyber and physical threats of critical distributed infrastructure. Considering the rich application context, a variety of mainly non-performance functions of gas transmission grids were identified that are sensitive to potential threats. This enabled along with a highly structured threat assessment the identification of functional requirements and related quantitative indicators as well as technical requirements of most promising improvement measures.

The application of the approach to an indoor ultrasound localization system allowed the identification and ranking of its key performance functions and relevant threats. The criticality matrix assessment of system performance functions versus disturbing up to disruptive events lead to the ranking of scenarios for experimental assessment. The quantitative evaluation of the experimental scenarios allowed to identify technical improvement needs and options.

The user experience described in the paper shows that the approach can be applied successfully to real-world implementations. The approach was generally accepted by the end users. In particular, it was found very helpful to structure system knowledge in terms of system elements and functions, to identify threats and disruptions in need of more advanced analytical and quantitative analyses. Additionally, end users found the approach useful for supporting the selection of efficient counter and improvement measures and because it leverages similar analytical approaches as already familiar from classical risk control.

In terms of such generic requirements as, e.g., documentability, reusability, scalability, tailorability, extendability, responsibility sharing and documentation, auditability, certifiability, litigability, insurability, and financeability, in all these cases, high levels can be reached due to the tabular nature. In particular, even check-lists can be generated rather easily based on tabular approaches for application in similar application domains, e.g., for the business continuity and consultancy applications.

Standardization of resilience assessments and its integration under structures that are in use in traditional risk management may be important for solidifying this emerging field. Future studies could focus on closing the methodological gap in quantitative assessments of integrated risk and resilience, especially as it relates to integration of social and physical/engineering science methodology and tools. Last but not least, the approach could serve as one input for standardization in the domain of resilience quantification driven by technical science and strengthening of socio-technical systems.