1 Introduction

1.1 Background and motivation

In process plants unexpected external or internal events, such as natural disasters, catastrophic failure of critical equipment, supply chain interruption, cyber attacks etc. may determine process stoppage and extended downtime disrupting operations for significant time intervals. Such perturbations can rapidly propagate through the supply chains, causing huge economic losses, impairing firms competitivity, or even forcing them out of business. In this respect, technological accidents triggered by natural hazards (Na-Tech events), are of particular interest, for the severe consequences they can generate. For instance, the clean room systems of New Mexico semiconductor manufacturing plant of the Philips NV Group were contaminated by a minor fire in the year 2000. This disrupted the production of cellular phone chips and interrupted for over 9 months’ supply to Philips’ customers. Apart from the resulting $ 40 million lost sales cost to Philips, the loss to one of its final customers, LM Ericsson, amounted to $ 2.34 billion loss and forced its retreat from the cellular business (Sheffi 2005). In Japan the 1995 Kobe earthquake hugely impacted the industrial infrastructure. A city-wide power interruption lasted 7 days while the industrial water and gas supply required more than 80 days to be recovered (Cole et al. 2013). This caused severe impact at the individual plant level (Cole et al. 2016; Kajitani et al. 2013). In general, earthquake damage to industrial building may severely impact even operations of manufacturing plants (Saitta et al. 2012). In such cases traditional risk assessment measures, which focus on vulnerability and immediate consequences may not be sufficient, while resilience is a performance measure providing a more complete picture. Resilience measures both the robustness of a system, i.e. the ability to survive unexpected disruptive events, and the capability of rapidly restoring its capacity after the event occurred (Bhamra et al. 2011; Bristow and Hay 2017; Hosseini et al. 2016b; Ivanov et al. 2017; Woods 2015). By accounting even for long term effects, and explicitly factoring in the system capabilities of absorption, adaptation and restoration, resilience is of great relevance to operations managers, plant engineers, public administrators, and industrial actors in general. Sheffi (2005) and Sheffi (2015) reports several practical examples of the devastating consequences of lack of resilience in manufacturing enterprises and the industrial sector. Therefore, to predict unexpected disruptions, by computing resilience-based metrics, is a priority for industrial enterprises, (Nauck et al. 2021), and is a prerequisite for mitigating adverse events impact through building more resilient systems.

While research on industrial systems resilience started more than two decades ago, it was initially aimed at civil infrastructures, networked systems and service distribution infrastructures (i.e. telecommunication and road transportation networks, gas and electricity distribution grids or telecommunication networks etc.), (Argyroudis et al. 2020, 2021; Sun et al. 2020). Subsequently the attention shifted to supply chain resilience, while research on resilience estimation of industrial plants has been comparatively scarcer in both the process and manufacturing sectors. In fact, as noted by El-Halwagi et al. (2020) “very little work has targeted the resilience of the manufacturing processes. Even less work has addressed the topic of process design approaches to create disaster-resilient industrial processes”. Consequently, the present paper is a contribution to fill this gap by providing a complete framework allowing a fully probabilistic detailed resilience estimation of process plants under Na-Tech Seismic events.

When reviewing the literature, some noteworthy distinctions were identified between the available approaches to resilience modeling, allowing a broad classification. For sake of clarity it was found useful to distinguish two main categories of approaches, i.e. non-flow based and flow based.

Non-flow based methods, focus on logical interaction between plant components and often neglect the physical architecture of the plant, thus making an estimation of process flows more awkward if not impossible, especially if different process flows coexist. Moreover, they do not allow to compute the recovery period and capacity recovery trend in a precise manner because interactions between the recovery activities, that is the dependency of the recovery stages from specific end-start constraints, are neglected. In fact, the recovery process is often modelled resorting to predefined analytical functions (Cimellaro et al. 2006; Cimellaro 2016) which may be unrealistic. A continuous instead of discrete recovery curve is thus obtained which often is not related to actual process structure nor includes a schedule of recovery actions dictated by logic constraints in the sequence of recovery tasks. Economic consequences of equipment failure are often neglected (Cimellaro et al. 2009). Within non-flow based models, the main approaches utilize Bayesian networks (BN) or reliability-based modeling. Generally, BN do not describe the physical structure of the system but map the logical structure of "cause-consequences" between damage caused by the disruptive event and its consequences. They often provide only a probabilistic aggregate indicator of system resilience, and the evaluation of this indicator in different time intervals allows to plot resilience evolution over time instead of a time trend of capacity based on progress of actual recovery activities. Among non flow-based models, the dynamic object-oriented BN has been applied to the case of the Fukushima Daiichi nuclear power plant accident (Abimbola and Khan 2019), to model resilience (actually the failure probability considered as its proxy) as the joint probability of the resilience capacities (absorption, adaptation, and restoration) as a function of time. The Chevron refinery accident was analysed in a similar manner (Tong et al. 2020). BN were also used to estimate resilience of a sulphuric acid manufacturing plant (Hosseini et al. 2016a) as well as electric infrastructures (Hossain et al. 2019a), maritime ports (Hossain et al. 2019b), inland waterway ports (Hosseini and Barker 2016), and a composites production facility (Yodo and Wang 2016).

Differently, flow-based approaches rely on modeling the physical structure of the systems and quantifying one or more types of flows passing through the system. Such kind of approach is thus inherently suited to process plants modeling. Flow-based models replicate the system structure by mapping physical and logical interconnections between systems components and often use balance equations to compute flows across branches and nodes. Such models can be even supplemented by mathematical programming approaches where decision variables value has to be selected to optimize a performance measure during system design or recovery after a disruption. In the estimation of seismic resilience of an industrial plant either epistemic uncertainty or randomness can be included, relying on Monte Carlo simulations. Caputo and Paolacci (2017) developed a process flow-based method based on functional block diagrams and activity networks to model both the impact of disruptions on system capacity as well as discontinuous time trend of capacity recovery, as dictated by the actual interaction between the equipment restoration activities. The model is specifically conceived for process plants under natural hazards and allows to compute economic loss in terms of reconstruction and business interruption costs. The model was also applied to a nitric acid plant (Caputo et al. 2020) impacted by earthquakes, in a deterministic way analyzing the most probable seismic damaged scenarios. Mussini (2019) develops a similar approach, also including multi-level analysis and detailed structural analysis of equipment failure under earthquake risk. However, in this model the interaction of recovery actions is only captured in approximate manner.

1.2 Scope of the work

When modelling resilience of process plants, a flow-based approach seems preferable, especially in presence of catastrophic events like earthquakes, as this allows to base the vulnerability and resilience computation on the actual physical structure of the plant accounting for the effect that equipment failure has on process flows changes, which impact on production loss costs. Moreover, to compute in a reliable manner the recovery period duration, the mutual interactions between single units restoration activities must be factored in. The possibility of computing the actual time trend of capacity recovery also allows to compute different resilience metrics and easily assess the individual equipment contribution to plant resilience. Nevertheless, a full probabilistic approach that fully implements the capabilities of a flow-based approach is currently missing.

For the above reasons, in the present paper a probabilistic flow-based model for process plant resilience estimation under seismic Na-Tech events is proposed. It relies on a simplified graph-based network representing the recovery tasks, showing the interdependencies between reconstruction activities of different equipment. For its implementation a specific probabilistic equipment recovery model is formulated. The capability of estimating economic loss from repair activities, material loss and business interruption and the definition of a schedule of recovery tasks, useful for project management purposes, is also implemented. The model is also associated to a hazard analysis module allowing to generate multiple damage scenarios in a realistic manner for seismic Na-Tech events. This enables plant designers and safety managers to explore the possible impact of high risk—low probability events and plan effective measures to increase plant resilience of industrial plants under earthquakes. The main novelties of the work with respect to Caputo et al. (2020), are essentially: (i) a full integration of seismic risk analysis in the MCS approach for the resilience assessment, (ii) the presence of multiple damage states in the definition of all recovery phases, iii) incorporation of the randomness of recovery cost and time, (iv) use of simplified recovery process in the resilience quantification, (v) the quantitative resilience analysis of a new case study for the validation of the method.

The paper is organized as follows. Section 2 provides the adopted definition of resilience for industrial plants. Section 3 deals with a general formulation of probabilistic seismic resilience analysis of process plants (PSRA), whereas Sect. 4 provided the formulation of the simplified PSRA model for process plants. The seismic resilience of a black carbon plant under earthquakes is analyzed in Sect. 5 in order to show the practical application of the methodology and its capabilities. Discussion of results and prospects for future research conclude the paper.

2 Definition of resilience for major-hazard process plants

Industrial plants are complex facilities which process raw materials to produce different types of final products. Seismic resilience of an industrial plant can be defined as ability of the plant to withstand a low frequency high impact disruptive seismic event, and rapidly recover in order to maximize the total production capacity in case of physical damages. Plant operational capacity, which represents the maximum physical production output of the plant, can be used to calculate the resilience of the system. In order to better understand this concept, Fig. 1 shows a representative time trend of plant operational capacity C(t), starting from a time t0 when the seismic event occurs, until a control time (th). It can be noticed that the plant operational capacity curve can be subdivided in five important states. Pre-earthquake state which is defined as the nominal plant capacity C(t0). In this state, the plant operational capacity can be unaltered, or higher than the original one if upgrades are made under the request of stakeholders, or even smaller than original capacity due to aging effects (Ayyub 2014). Earthquake occurrence corresponds to the initiation of damage propagation state, which ends at a time td. In this state, due to seismic damages and related consequences (e.g. explosions or fire), there is a loss of plant operational capacity. Industrial plants deal with hazardous materials and in case of an earthquake occurrence the plant equipment can experience damages with loss of containment, which can trigger damage propagation effects, known as domino effects (Antonioni et al. 2009; Alessandri et al. 2018; Paolacci et al. 2018; Chen et al. 2020). If damage propagation effects are not considered the time td will be equal to t0. Meanwhile, in case of domino effects, the weather conditions and plant emergency response will play an important role in the estimation of the plant residual operational capacity C(td). Moreover, plant robustness, plant topology and intensity of seismic event influence directly the operational capacity loss.

Fig. 1
figure 1

Plant operational capacity vs time

The post-earthquake steady state condition, from td to ta, corresponds to the inspection and planning phase, necessary for the identification of all damages and deciding the recovery plan. The extension of this state is mainly influenced by the preparedness of plant emergency managers and the presence of an existing recovery plan. Moreover, the resilience of other systems, as critical infrastructure, or transportation infrastructure, can have a certain influence (Wu et al. 2020). The plant can become inaccessible due to damages in the road infrastructure which can cause the stoppage of production due to lack of workers which cannot reach the plant, even in the case that the plant might have not experienced damages. Moreover, in the case when the plant is damaged and not accessible the recovery process will be delayed by the time that is needed to repair the transportation infrastructure that connects the industrial plant with the nearby community. Damages in critical infrastructure such as electric network, water network, internet, gas supply, etc. can cause the plant to stop the production process or delay the recovery process. For example, the plant cannot function if there is no electricity supposing that the backup measures are available only to safe shutdown the plant, or the industrial plant cannot operate if the water pressure of the fire hydrants does not fulfill the requirements. Meanwhile, the influence of the community is even more complex as it does not only directly influence the plant, but it is also interdependent with the other systems (Guidotti et al. 2019; Sharma and Tabandeh 2020). When an earthquake occurs, the nearby community is affected, for example people might be injured and might not be available to work, or some people might need to relocate due to damage of their apartment or due to lack of utilities etc. (Guidotti et al. 2019). So, for this reason, after an earthquake the plant might lack plant operators or plant workers which will directly influence the plant operability. Moreover, the damages in community will also affect the recovery process (e.g., lack of skilled and unskilled labors) not only of the industrial plant but also of other systems such as transportation infrastructure, critical infrastructure as they are all interdepended to each other (Guidotti et al. 2016; Sharma et al. 2019, 2021).

Delays due to impeding factors (post-earthquake inspection, engineering, permitting, financing, contractor mobilization) can influence the duration of this phase, which ends at a time ta, time which corresponds with initiation of the recovery phase. In industrial plants the recovery path is gradual, following a step function from ta until the complete recovery of the operational capacity, (tr). In fact, the recovery process involves sequentially the equipment that although they may have different damage states (DS). From a functional point of view they have binary states (working or failing) as soon as the equipment is put back into service its contribution to the recovery of the capacity is instantaneous and thus discrete. This approach has been adopted also by other authors, (Mussini 2019). Each step corresponds to the recovery of one or more equipment, which directly influences the production capacity. The recovery path is influenced by plant topology, availability of workers, market availability for the requested equipment, and the selected recovery strategy. Moreover, stakeholders can decide the level of the operational capacity, with respect to the original one (normal, lower, or higher). Finally, the last state, corresponds to the post-recovery steady state which has a duration interval (th − tr), where th is a control time which is generally decided by stakeholders (Cimellaro et al. 2009), useful to compare, on a consistent basis, resilience determined by different disruptive events.

A plant resilience index can be adopted for the quantification of resilience (R) and can be calculated in different manners, as already mentioned in the introduction. In the present paper R is calculated as the integral over the time of the operational capacity curve, using one of the most popular expression present in the literature (Cimellaro et al. 2009), according to Eq. (1). The reasons of this choice are basically related to the non-dimensional form of R and its straightforward computation related to the integral operation adopted.

$$R = \frac{1}{{t_{h} - t_{0} }}\mathop \int \limits_{{t_{0} }}^{{t_{h} }} C\left( t \right)dt$$
(1)

Nevertheless, beside the calculation of a resilience metric, other aspects assume relevance, such as residual operational capacity C(td), time to full capacity recovery (tr), maximum recovery interval (Tmax = trtd) and also economic losses. In particular, the total economic loss of the plant (EL) can be accounted for as sum of direct economic loss (DC) due to equipment reconstruction cost and business interruption loss (BI), same as defined in FEMA P-58 (2012).

3 Probabilistic seismic resilience analysis (PSRA) of process plants

In classical risk analysis, the Risk of process plants is defined as a combination of Hazard, Vulnerability and Exposure. When it comes to Resilience, it can be defined as combination of Risk properties with the Recovery process. In order to have a reliable model for seismic resilience quantification the epistemic and aleatory uncertainties should be considered. Epistemic uncertainty or state-of-knowledge uncertainty is related to the lack of knowledge about the adopted models (e.g. fragility models, restoration models, plant recovery model), whereas aleatory is typically related to the seismic action. While it is challenging to reduce the aleatory uncertainties, epistemic uncertainties due to lack of information can be taken into consideration. The model should account for the Hazard (H) uncertainties, Vulnerability (V) uncertainties, Exposure (E) uncertainties and uncertainties in Recovery (RE).

Aleatory hazard uncertainties are related to different possible seismic sources, the earthquake magnitude, and site to source distance. In order to account for them all possible seismogenic zones should be considered, earthquake magnitude distribution should be defined together with the site to source distance distribution. All these data randomly combined together using the attenuation law of Akkar and Bommer (2010) can define the probability distribution of ground motion intensity. Given that industrial plants have different types of equipment with different natural periods the use of different IMs should be envisaged (Bakalis et al. 2018; Karaferis et al. 2022; Kazantzi et al. 2022; Melissianos et al. 2022; Phan and Paolacci 2016; Phan et al. 2021). This is generally not convenient, especially in case of expeditious methodologies for risk analysis are employed. Moreover, in most of the cases the literature offers fragility curves in terms of PGA (Syner-G 2014). Consequently, in the present paper all fragility curves will be expressed in terms of PGA, with the idea to use more appropriate IM for equipment that will be found critical after the risk assessment, for which more refined analysis will be required.

Vulnerability uncertainty is related to the equipment performance under different earthquakes, and it can be expressed in term of fragility curves, generally defined as lognormal cumulative distribution function (Porter 2015). Exposure uncertainties are mainly related to epistemic uncertainties. While considering only the most critical equipment of a plant in analysis and neglecting some other equipment, epistemic uncertainties arise. These uncertainties can be reduced by considering every single element of the industrial plant in the analysis, but it might not be feasible if considering computational costs. The uncertainties in Recovery include uncertainties in equipment/activities recovery duration due to working delays, delays due to weather conditions, delays coming from supply chain etc. Furthermore, uncertainties in equipment/activities recovery costs due to unexpected construction works or due to variable cost of equipment spare parts and raw materials, should be considered during the recovery interval (TR).

The general formulation of probabilistic resilience metrics due to seismic hazard, accounting for all uncertainties mentioned above, can be expressed as:

$$E\left[ {R\left( t \right)} \right] = \nu \int ..\int r\left( t \right)\, \times f\left( {R\left( t \right){|}TR} \right)\,f\left( {TR{|}DM} \right)\,f\left( {DM{|}EDP} \right)\,f\left( {EDP{|}IM} \right)f\left( {IM{|}d,m} \right)\, \times \,f\left( d \right)\,f\left( m \right)\,dR\,dTR\,dDM\,dEDP\,dIM\,dD\,dM$$
(2)

where E[R(t)] is the expected annual resilience index R(t), which is a function of the several random variables and of the time t elapsed from t0; ν is annual rate of occurrence of the seismic events greater than a given minimum value of magnitude Mmin, which is provided by the Gutenberg-Richter law. The symbol f indicated the probability density function (PDF) of the several random variables. These latter are respectively: recovery interval (TR), the damage measure (DM), the engineering demand parameter (EDP), the intensity measure (IM), the Magnitude (M) and the distance from the fault (D). The results of Eq. 2 can be easily extended to different seismic sources simply making the summation of the results obtained for each one of them.

A similar integral can be formulated for expected annual losses (E[L(t)]), where the economic loss function (L(t)) depends on the above-mentioned random variables and the recovery cost (C). In this formulation the domino effects and the environmental effects are not explicitly considered and will be object of consideration in future works.

$$E\left[ {L\left( t \right)} \right] = \nu \,\int ..\int l\left( t \right) \times \,f\left( {L\left( t \right){|}TR,C} \right)\,f\left( {TR{|}DM} \right)\,f\left( {C{|}DM} \right)\,f\left( {DM{|}EDP} \right)\,f\left( {EDP{|}IM} \right)\, \times f\left( {IM{|}d,m} \right)\,f\left( d \right)\,f\left( m \right)\,dL\,dTR\,dC\,dDM\,dEDP\,dIM\,dD\,dM$$
(3)

The above-mentioned integrals can be solved numerically via Monte Carlo Simulation (MCS). Each simulation consists in sampling simultaneously from the corresponding PDFs of uncertainties: magnitude M, distance D, IM, equipment damage DM, recovery time TR and recovery cost C, obtaining for each simulation a random operational capacity curve. As described by Caputo et al. (2020) the recovery time and cost will be generated for each recovery task to be carried out. At the end of MCS, the statistic of resilience and economic losses are determined. The expected annual losses can also be easily evaluated. The advantage of this method is to be extremely general because it allows to obtain with a unique framework the statistic of resilience or the economic losses, the mean annual frequency of a given damage scenario or the frequency of occurrence of a given damage state etc. The drawback is the need of a quite large number of simulations in order to obtain reliable results as it involves a quite large number of random variables, and the difficulty in obtaining the actual PDFs of most of them.

A more practical approach consists in selecting a certain number of IM values and evaluating the statistic of L and R in a discrete manner. In particular, in this approach the uncertainties of the seismic event are accounted for performing a classical probabilistic seismic hazard analysis (PSHA) (Cornell 1968; Baker 2008). Then, for each i-th PGA (Scenario), a MCS is conducted sampling random recovery times, recovery costs and seismic damage scenarios. Consequently, for each IM a set of resilience curves will be obtained based on which the statistic (mean and standard deviation) can be determined. The logical sequence of the MCS is detailed in the subsequent Section and therein synthesized in Fig. 4. In this case the convolution of vulnerability and hazard is not automatic but needs to be evaluated a posteriori. In fact, based on the considered scenarios and the predefined hazard curve it is possible to associate a frequency of occurrence to each IM and thus evaluate the mean annual frequency of the economic losses and resilience metrics. The quality of the method strictly depends on the number of PGA values analyzed. In this sense it can be computationally more convenient than the general approach.

The approach can be further simplified by reducing the number of random recovery activities to be carried out. The combination of a discrete number of seismic scenarios and the simplification of the recovery activities framework represents the central novelty of the paper for the PSRA of industrial facilities. Section 4 will explain in details this integrated approach, whereas Sect. 5 will report the application of the proposed method to a Black Carbon plant.

4 Simplified PSRA model for major-hazard process plants

As mentioned before, the simplified approach can be more efficient in terms of computational time for probabilistic seismic resilience estimation, and for this reason it will be explained in more details in this section. The model includes the preparatory steps listed below that are used for the Monte Carlo simulations:

  • Process mapping and plant topology representation

  • Definition of initial residual capacity of the plant

  • Formulation of plant recovery model

  • Formulation of the plant recovery function

  • Definition of resilience index and economic loss model

  • Probabilistic Seismic Hazard Analysis and evaluation of seismic damage scenarios

  • Monte Carlo Simulation for estimation of probabilistic resilience metrics

The following subsections will provide the necessary elements for the implementation of the proposed approach.

4.1 Process mapping and plant topology representation

At first, all equipment of the plant should be identified based on their structural configuration and process function, for example steel storage tanks, horizontal vessels, vertical vessels, piping systems etc. This will help to build the proper seismic vulnerability functions, as explained in Sect. 4.6. Then, Process Flow Diagram (PFD) of the plant should be constructed, including the main equipment of the plant and mapping the different material flows evolving through the equipment and giving rise to marketable output product flows. An equipment can be part of a unique process flow (PF) or it can be part of multiple process flows contemporarily. In order to represent all the equipment that are included in f-th process flow, a unique set of equipment S[f] should be defined (Caputo et al. 2020).

After having defined the sets of equipment of all PFs, the Capacity Block Diagram (CBD) of each PF should be constructed. CBD contains process stages (PS) connected in series, where the capacity of the entire PF corresponds to the capacity of the PS having the lowest capacity. A process stage contains a single equipment, or a group of equipment based on their operational functionality and their influence on the operational capacity of the PF. The most common arrangements of PS are: PS with units in series (PSS); PS with fully redundant units (PSR); PS with k out of n redundant units (PSRk); PS with fractionated capacity units in parallel (PSP). A plant process flow can have a single PSS while the number of PSR, PSRk and PSP can vary from plant to plant. Readers interested in greater details can consult Caputo et al. (2020) while practical application of the CBD can be clarified from the case study of black carbon plant in Sect. 5.

Moreover, in this phase also the possible damage states (DS) of equipment that will be considered in analysis, should be defined. The damage scenarios can be generated by any kind of natural event, but in this work, focus will be on damages caused by earthquakes. In this respect, different DS of equipment can be considered. A damage scenario vector DSV = {γ1, … γN} is defined in order to assign the damage state to the selected units, where γi is the damage limit state of i-th unit. The i-th component of the vector DSV can assume different values depending on the damage level caused by the earthquake:

$$\gamma_{i} = \left\{ \begin{gathered} 1 \;if \;i - th\;equipment\; is\; in \;1 - st \; DS\, \left( {no\; damage} \right) \hfill \\ 2\; if\; i - th \;equipment\;is\; in \;2-nd \ DS \hfill \\ \ldots \hfill \\ n\; if\; i - th\; equipment \;is\; in \;n - th \; DS \hfill \\ \end{gathered} \right.$$
(4)

At the beginning of the simulations all equipment will be considered undamaged, γi = 1.

Furthermore, a vector SV = {δ1, … δN} is defined in order to assign the functionality state to the all equipment. The functionality state variable δi will be a binary variable as shown in Eq. (5). At first all δi will be equal to 1, which corresponds to no damage scenario. Then, for a given damage scenario vector, which has several damaged equipment in different DS, the initial SV should be modified by setting to zero the functionality state variables (δi = 0) of damaged equipment ( γi > 1). Meanwhile, during the recovery phase, the state variable of each damaged equipment will switch from 0 to 1 as soon as the equipment is taken back at operational status, in order to represent the time-varying system status.

$$\delta_{i} = \left\{ {\begin{array}{*{20}l} {0\; if \;i - th \;equipment\; is\; damaged \;(\gamma_{i} > 1)} \\ { 1 \;if\; i - th\; equpment \;in\; not\; damaged\; \left( {\gamma_{i} = 1} \right)} \\ \end{array} } \right.$$
(5)

4.2 Definition of initial residual capacity of the plant

Definition of the capacity block diagrams of each process flow, allows the computation of the plant operational capacity. Having defined the set of equipment S[f] and the corresponding PSs of f-th process flow, the normalized capacity of the f-th process flow (Cf) can be calculated as:

$$C_{f} = min\left\{ {C\left\{ {{\varvec{PS}}^{{\varvec{S}}} \left[ {\varvec{f}} \right]} \right\}; C\left\{ {{\varvec{PS}}_{{\varvec{j}}}^{{\varvec{R}}} \left[ {\varvec{f}} \right]} \right\}; C\{ {\varvec{PS}}_{{\varvec{j}}}^{{{\varvec{Rk}}}} \left[ {\varvec{f}} \right]\} ; C\{ {\varvec{PS}}_{{\varvec{j}}}^{{\varvec{P}}} \left[ {\varvec{f}} \right]\} } \right\}$$
(6)

where \(C\left\{ {{\varvec{PS}}^{{\varvec{S}}} \left[ {\varvec{f}} \right]} \right\}\) is the normalized capacity of PSS of the f-th PF; \(C\left\{ {{\varvec{PS}}_{{\varvec{j}}}^{{\varvec{R}}} \left[ {\varvec{f}} \right]} \right\}\) is the normalized capacity of j-th PSR allocated in f-th PF; \(C\{ {\varvec{PS}}_{{\varvec{j}}}^{{{\varvec{Rk}}}} \left[ {\varvec{f}} \right]\}\) is the normalized capacity of j-th PSRk allocated in f-th PF; \(C\{ {\varvec{PS}}_{{\varvec{j}}}^{{\varvec{P}}} \left[ {\varvec{f}} \right]\}\) is the capacity of j-th PSP allocated in f-th PF. The normalized capacity of each different configuration of process stages can be evaluated as described in Caputo et al. (2020). Furthermore, for a given DSV and the corresponding SV the residual overall plant capacity C(td) can be evaluated using Eq. (7), where Of is the fraction of absolute capacity allocated to the f-th PF (Caputo et al. 2020). This procedure allows for automatic calculation of the plant operational capacity during the recovery process.

$$C(t_{d} ) = \mathop \sum \limits_{f} O_{f} C_{f} \left( {t_{d} } \right)$$
(7)

4.3 Formulation of plant recovery model

As stated above, the definition of recovery model is an essential ingredient for resilience analysis. In this respect different recovery models can be implemented for the recovery process of industrial plants (Fig. 2). The Overall Reconstruction Activity Network (ORAN) model was first introduced by Caputo and Paolacci (2017) and was later applied to a nitric acid plant case study (Kalemi et al. 2019a; Caputo et al. 2020). Kalemi et al. (2019b) applied a probabilistic version of ORAN recovery model to calculate the resilience metrics of predefined seismic damage scenarios of a nitric acid plant.

Fig. 2
figure 2

a Example of equipment subtask grouping of a fictitious equipment; b Simplified activity network of a fictitious equipment; c SPEREN: all equipment start recovering instantaneously; d SPEREN: some equipment start recovering after the recovery of the equipment they depend on. Legend: A = reconstruction activity, PT = preparatory task, ET = external task, ST = subtask

The ORAN model describes the recovery process in terms of an activities network and is based on CPM/PERT method (Vanhoucke 2012) for calculation of the date when a damaged equipment is recovered (TCRi). CPM and PERT are graph-based methods universally used in project management practice to visualize the logical sequence of multiple activities constituting a complex project and their mutual interactions in terms of end-start constraints. In such applications an acyclic graph is built where arrows represent activities and nodes represent events associated to the end or start of activities. An activity departing from a node cannot start until all preceding activities ending in that node are completed. All paths between initial and the final node represent parallel sequences of activities required to complete the project, and the longest sequence is defined as the critical path whose length determines the duration of the project. This approach is useful to link in a mechanistic manner the length of the recovery process to the actual specific activities to be carried out which, in turn, are dictated by the process structure, instead of trying to arbitrarily estimate the length of recovery period. Another advantage is the availability of an algorithm to establish the start and finish date of each activity, thus easily allowing to compute the recovery date of each equipment.

The ORAN model is very detailed and precise as it accounts for the interdependencies between reconstruction activities, but its only drawback is that it requires a lot of information and effort to be applied, which may not always be very practical when quick risk and resilience analysis are needed. For this reason, different levels of simplification can be implemented. For example, it is often possible to group sub-tasks, Fig. 2a, deriving a simplified activity network as shown in Fig. 2b, where the symbols PT, ST and ET stand, respectively, for preparatory, subtask and external activities.

Even simplified, this recovery model still accounts for external interferences and delays at the activity level. For this reason, simpler approach, similar to the one proposed by Almufti and Willford (2013), is herein adopted. In this model a Simplified Plant Equipment Reconstruction Network (SPEREN) is constructed accounting for preparatory task named as inspection and planning recovery task, and equipment recovery tasks. In this case, for each damaged equipment the recovery process will be considered as a single task, with its duration related to the damage state. The recovery of damaged equipment will start only after inspection and planning are finished, with two possible options:

  1. (i)

    All equipment start recovering instantaneously as shown in Fig. 2c.

  2. (ii)

    There is an interdependency between recovery of different equipment, where start of recovery of an equipment can only start after previous equipment has been recovered, as shown in Fig. 2d.

The duration of inspection and planning will be defined based on subtasks such as post-earthquake inspections, financing, contractor mobilization, access cleaning, site cleaning and recovery plan, following the longest path. Equipment recovery duration can be calculated based on sub-tasks as engineering, permitting, delivery and installation, whose duration can be adopted from literature (i.e. Hazus (2022)), or can be defined based on engineering judgment. Accordingly, the probability distribution functions of each duration of inspection and planning and equipment recovery phase will be assumed, specifying distribution shape, mean and standard deviation, for the implementation of Monte Carlo simulations.

This simplified recovery model is very practical, does not require to build a detailed ORAN, allows adoption of empirical recovery duration curves from literature and it simplifies the construction of resilience curves from the computational point of view. In the present paper, the restoration functions in terms of time and cost provided by Hazus (2022) for oil and gas systems components have been used, which are normal distribution functions characterized by mean µ and standard deviation σ.

The only drawback is that the interdependency between equipment can be either at the start of equipment recovery process or when the equipment is fully recovered, as it does not account any possible interconnections in between recovery process of each equipment.

For the implementation of SPEREN model the following procedure has been adopted. At first, all possible duration TP(i,j) of preparatory activities of i-th equipment with j-th damage state DS (inspection and planning activity), and recovery costs CP(i,j) of preparatory activities are identified and defined through Gaussian probability density functions, with mean (μ) and standard deviation (σ). In any case, user can adopt any kind of distribution, which must be truncated in order not to have negative values. Analogously, the activity/equipment recovery duration TA(i,j) and activity/equipment recovery costs RC(i,j) need to be defined accounting for all possible damage states of equipment.

Next step will be determination of interdependencies between activities/equipment recovery tasks, which will be defined through activities/equipment interdependency matrix (AI). The matrix will have 0/1 numbers, where i.e. AI[i, k] = 1 will mean that starting of i-th recovery activity depends by finishing of k-th activity. In case that i-th activity does not depends on any activity, the i-th row of AI will have all columns with 0 values.

An example is shown in Fig. 3a, where a fictitious activity network has in total 3 reconstruction activities, corresponding, in the spirit of SPEREN, to 3 different equipment. Activities A1 and A2 are related to equipment 1 and 2, respectively, both with damage state DS2, while activity A3 corresponds to equipment 3 with damage state DS3. For sake of simplicity in the subsequent numerical application we have considered only one possible damage state of equipment. The 1-st and 3-rd row of AI have all zeros which means that activity A1 and A3 do not depend on any other equipment recovery activities, excluding preparatory activity which are considered apart as described in the following subsection. Meanwhile, AI [2,1] = 1 means that the start of recovery activity of 2-nd equipment(A2) depend from finishing date of recovery activity of 1-st equipment (A1).

Fig. 3
figure 3

a Fictitious SPEREN and probabilistic Operational capacity curve; b Fictitious distributions of all possible TP(i,j); c Fictitious distribution of all possible TA(i,j)

Generally, the equipment-activities interdependency should be defined in order to assign the activities to equipment. However, differently from the ORAN method, in SPEREN the interdependencies between activities and equipment are excluded a priori, and thus this phase is not considered.

4.4 Formulation of the plant recovery function

The implementation of recovery model is made only once by defining all the possible distributions of TP(i,j), TA(i,j), CP(i,j), RC(i,j), and AI matrix. Then, for a given DSV and its corresponding SV, it is needed to calculate the vector TCR, which contains the recovery date of each equipment, where for the undamaged equipment the corresponding recovery date will be 0.

At first, TP should be calculated based on DSV and the corresponding probability distribution functions TP(i,j), as shown in Fig. 3b. In this respect, the duration of common preparatory activity will be governed by the equipment which influence the most the inspection and planning phase and will be calculated as TP = max[TP(i,j)]. Then, the duration TA(i) of the i-th activity is generated, based on DSV, SV and the probability distribution functions of TA(i,j), see Fig. 3c. Using Eq. (8), it is possible to create the vector of recovery dates for all equipment TCR, where corresponding value for the undamaged equipment will be 0. The duration of the preparatory activities TP however should also be included in the squared brackets terms of Eq. (8) for those equipment reconstruction activities having preparatory activities as predecessor.

$$T_{CR} \left( i \right) = T_{A} \left( i \right) + \max \left[ {T_{CR} \left( 1 \right)AI\left( {i,1} \right), \ldots , T_{CR} \left( {N_{A} } \right)AI\left( {i,N_{A} } \right)} \right]$$
(8)

In order to compute the time trend of capacity recovery, at each time value t = TCR(i), the corresponding equipment state variable is set back to δi = 1, the SV is updated, the PF capacity is updated using Eq. (6), and plant residual capacity is updated through Eq. (7). A graphical representation of this procedure is shown in Fig. 3a, where the operational capacity curve is constructed based on the TCR vector. This curve will be randomly generated during the Monte Carlo simulation, as better described in Sect. 4.6.

4.5 Definition of resilience index and economic loss model

For each seismic damage scenario, the resilience index is calculated using Eq. (1), while the EL will be sum of direct costs (DC) and Business Interruption (BI) cost. DC are calculated as sum of cost of preparatory tasks (CP) and the equipment reconstruction costs (RCi), as given in Eq. (9). In the general case the RCik is the cost of k-th restoration activity required to recover the i-th equipment. In case of SPEREN model there will be just one total cost of equipment recovery of i-th equipment (RCi). The vector RC should have a dimension equal to the number of total reconstruction activities NA (or total number of equipment in case of simplified recovery model) and at the initial state all RC(i) values would be zero. For a given damage scenario vector with a corresponding state vector SV, the vector RC will be updated taking the corresponding random values from the predefined distributions RC(i,j) for the equipment reconstruction activities that need to be carried out ( δi = 0). On the other hand, the CP will be calculated as maximum value of CP(i,j) that corresponds to DSV, using the same reasoning as per TP.

BI is computed as the contribution margin of the lost production during the flow interruption period. In greater detail we assume that operating costs of a firm include both fixed costs (which are paid irrespective of the production level), and variable costs which only incur if production takes place and are proportional to the production level (i.e. materials and energy costs). Consequently, when production is interrupted the company loses the revenues from sales of lost production, but also saves the variable cost which did not occur thanks to the production interruption, while it continues to pay the fixed costs. Therefore, the actual BI economic loss is the revenues minus the saved variable cost of lost production. This is recognized as margin of contribution because it represents the net revenue which is available to contribute to cover the fixed cost and make a profit. Accordingly,in Eq. (10), Cf(t) is the capacity of f-th process flow at time t, Cnf is the nominal production output of the f-th process flow, Cvuf is the variable unit production cost of the f-th process flow, pf is the unit selling price of the f-th process flow, and Δtz is the duration of the z-th time interval between functional recovery of two successive units (Caputo et al. 2020).

$$DC = CP + \mathop \sum \limits_{i} RC_{i}$$
(9)
$$BI = \mathop \sum \limits_{f} \mathop \sum \limits_{z} \left( {p_{f} - Cvu_{f} } \right)\left[ {Cn_{f} - C_{f} \left( t \right)} \right]\Delta t_{z}$$
(10)

4.6 Probabilistic Seismic Hazard Analysis and evaluation of seismic damage scenarios

The simplified methodology here proposed relies on the generation of different damage scenarios to be used in the Monte Carlo framework of Fig. 4. At this purpose three different sub steps are necessary:

  • Probabilistic Seismic Hazard Analysis (PSHA) of the plant site

  • Vulnerability analysis of all equipment

  • Evaluation of seismic damage scenarios of the plant

Fig. 4
figure 4

Logical sequence of Monte Carlo Simulation

Seismic hazard estimation methods are usually based on the classical Probabilistic Seismic Hazard Analysis (PSHA) approach proposed by Cornell (1968), which will be employed in Sect. 5. In this respect the Matlab script Mathazard has been used (Paolacci et al. 2022). This open source software has been developed by the research group of Roma Tre University which is available under request.

Seismic vulnerability of structures and equipment can be usually expressed in terms of fragility curves (Baker 2015). The latter are synthetic tools to assess the equipment failure modes and the probability of exceeding the related damage states. For the generation of fragility curves different approaches can be employed, as judgmental (Hazus 2022), empirical (Salzano et al. 2003; D’Amico and Buratti 2018) and analytical formulations (Alfanda et al. 2022; Karaferis et al. 2022; Kazantzi et al. 2022; Phan et al. 2019, 2017, 2020; Caprinozzi et al. 2021; Bakalis et al. 2017, 2018; Di Sarno and Karagiannakis 2020; Farhan and Bousias 2020). For each plant equipment all possible damage states that will be taken into consideration should be defined. As stated in Sect. 3, in order to perform expeditious risk analysis, Peak Ground Acceleration (PGA) will be used as intensity measure (IM) for the fragility curves of all equipment.

Both these data can be synergically used for the generation of different seismic damage scenarios in the plant and feeding the simplified probabilistic resilience evaluation process. At this purpose, the approach formulated by Alessandri et al. (2018) for the quantitative seismic risk analysis of major-hazard process plants can be profitably used. In this respect, in order to generate different seismic damage patterns in the plant and then quantifying the statistical distribution of resilience index and economic losses, the software PRIAMUS, proposed in Corritore et al. (2017) will be employed. This software can easily generate damage patterns within the plant, including also possible damage propagation effects (domino effects). In brief, the method employs Monte Carlo simulations to generate samples of damage scenarios involving the equipment of a plant subjected to an earthquake, based on which the related statistic is built.

The following section will explain in detail the integration of all above sub-steps by using a Monte Carlo simulation approach and how the probabilistic resilience metrics can be quantified.

4.7 Monte Carlo Simulation for estimation of probabilistic resilience metrics

After defining the seismic hazard curve of the selected site, a suitable range of PGA, compatible with the set of Damage States considered in the analysis for the several equipment and their corresponding mean annual frequency of occurrence (λ) are defined. Different seismic scenarios, corresponding to different PGA values, will be used in analysis. The greater the number of points selected, the greater the precision, but at the expense of computational efficiency. Consequently, Monte Carlo Simulation (MCS) should be conducted in order to balance the numerical effort and the accuracy of the results.

In simplified approach, as stated before, probabilistic resilience metrics are identified with the economic losses (EL) and the resilience index (R). The mean value of economic losses can be expressed analytically as sum of Eqs. (11) and (12). They provide, respectively, the expected values of business interruption loss (BI) and direct cost (DC), given the PGAj, at time t. Both will be referred to random damage scenarios, herein identified with the previously defined vector DSV (Damage Scenario Vector), which represents, for each j-th sampled PGA, the ensemble of damage states DS of the equipment.

In Eq. (11) the term \(f\left( {BI{|}{\varvec{T}}_{{{\varvec{CR}}}} } \right)\) is the probability density function of BI conditional to TCR; \(f\left( {{\varvec{T}}_{{{\varvec{CR}}}} {|}{\varvec{DSV}}} \right)\) represent the probability density functions of TCR conditional on DSV; \(f\left( {{\varvec{DSV}}{|}PGA} \right)\) is the probability density function of the DSV conditional to PGA. Similarly, in Eq. (12) for a given damage scenario vector (DSV) and its corresponding equipment recovery cost vector (RC), the DC is evaluated using Eq. (9). Moreover, \(f\left( {DC{|}{\varvec{RC}}} \right)\) is the probability density function of DC conditional to RC; finally, \(f\left( {{\varvec{RC}}{|}{\varvec{DSV}}} \right)\) represent the probability density functions of RC conditional on DSV.

$$E[BI\left( t \right)|PGA_{j} ] = \mathop \int \limits_{BI}^{{}} \mathop \int \limits_{{{\varvec{T}}_{{{\varvec{CR}}}} }}^{{}} \mathop \int \limits_{{{\varvec{DSV}}}}^{{}} bi\left( t \right) \times \,f\left( {BI{|}{\varvec{T}}_{{{\varvec{CR}}}} } \right)\,f\left( {{\varvec{T}}_{{{\varvec{CR}}}} {|}{\varvec{DSV}}} \right)\,f\left( {{\varvec{DSV}}{|}PGA} \right)\,dBI\,d{\varvec{T}}_{{{\varvec{CR}}}} \,d{\varvec{DSV}}$$
(11)
$$E[DC|PGA_{j} ] = \mathop \int \limits_{DC}^{{}} \mathop \int \limits_{{{\varvec{RC}}}}^{{}} \mathop \int \limits_{{{\varvec{DSV}}}}^{{}} dc \times \,f\left( {DC{|}{\varvec{RC}}} \right)\,f\left( {{\varvec{RC}}{|}{\varvec{DSV}}} \right)\,f\left( {{\varvec{DSV}}{|}PGA} \right)\,dDC\,d{\varvec{RC}}\,d{\varvec{DSV}}$$
(12)

The expected resilience index for a given PGAj, is provided by Eq. (13). The term R(t) is the resilience index at time t, (Eq. (1)) for a given damage scenario vector (DSV) and its corresponding equipment recovery time duration vector (TCR); \(f\left( {R{|}{\varvec{T}}_{{{\varvec{CR}}}} } \right)\) is the probability density function of R conditional to TCR.

$$E[R\left( t \right)|PGA_{j} ] = \mathop \int \limits_{R}^{{}} \mathop \int \limits_{{{\varvec{T}}_{{{\varvec{CR}}}} }}^{{}} \mathop \int \limits_{{{\varvec{DSV}}}}^{{}} r\left( t \right) \times \,f\left( {R{|}{\varvec{T}}_{{{\varvec{CR}}}} } \right)\,f\left( {{\varvec{T}}_{{{\varvec{CR}}}} {|}{\varvec{DSV}}} \right)\,f\left( {{\varvec{DSV}}{|}PGA} \right)\,dR\,d{\varvec{T}}_{{{\varvec{CR}}}} \,d{\varvec{DSV}}$$
(13)

For each seismic scenario of interest, characterized by PGAj, a Monte Carlo Simulation (MCS) is conducted in order to solve the integral of Eqs. (11), (12), (13). Its logical sequence is illustrated in Fig. 4. For each step of MCS, for each equipment, a random sampling of the damage state DS is generated through fragility curves, enabling the construction of the damage scenario vector DSV and its corresponding equipment state vector SV, as shown in Fig. 4, (Alessandri et al. 2018).

Then, random equipment/activities recovery times (TA(i,j)) and equipment/activities recovery costs (RC(i,j)) should be generated. Having defined the DSV and SV, the corresponding random vectors of equipment/activity recovery duration TA and equipment/activity recovery cost RC can be built as previously described, in above. Moreover, if there is any preparatory or external task, also the random TP and CP will be generated as described above. Knowing TA vector and TP, the recovery dates of each equipment (TCR) can be evaluated using Eq. (8). Then the operational capacity curve versus time can be constructed. Moreover, for each simulation, economic losses such as direct costs (DC) and business interruption (BI) can be estimated as described in Sect. 4.5. The procedure is repeated until convergence criteria of MCS is reached, (Alessandri et al. 2018). Moreover, statistics of resilience metrics such as distribution of resilience index, distribution of initial residual capacity C(td), distribution of maximum duration of plant full recovery (Tmax), distribution of economic losses can be estimated.

The Monte Carlo simulations will be repeated NPGA times, and statistics of resilience metrics and economic losses (direct and indirect) can be calculated at any PGA scenario.

Finally, the mean annual losses (EALm) can be directly calculated as integral of the loss exceedance curve versus the mean annual frequency of exceeding PGA, λ. In this respect, given the limited number of seismic scenarios, EAL can be approximated by Eq. (14). The same procedure can be adopted for calculating separately business interruption BI and direct costs DC.

$$EAL_{m} \cong \mathop \sum \limits_{j = 2}^{{N_{PGA} }} \frac{{EL_{m} \left( {PGA_{j} } \right) - EL_{m} \left( {PGA_{j - 1} } \right)}}{2}\left[ {\lambda \left( {PGA_{j} } \right) - \lambda \left( {PGA_{j - 1} } \right)} \right]$$
(14)

5 Illustrative example

A black carbon plant is used as a case study (Karagiannakis et al. 2020), and it is assumed to be ideally located in Priolo Gargallo, a highly seismic zone of Italy. The plant produces different types of black carbon, which are widely used for coloring products in black tones, or as additives in various rubber products, such as car tires, car dashboard etc. A plan view of the plant with the indication of the main equipment is shown in Fig. 5. The principal process can be summarized as follows: first, the feed stock oil needs to be stored in the fuel storage tanks; secondly, the oil needs to be preheated and transported to the reactors (oil pumps, pipe rack, heat exchangers); in the reactors, an incomplete combustion of the oil produces the desired carbon black powder, and tail gas (horizontal and vertical reactors); the carbon black powder then needs to be separated from the tail gas (bag collectors) and compacted into pellets (milling towers); finally, the pellets need to be stored in the silos before they can be shipped to the costumer.

Fig. 5
figure 5

Milling tower (left) and plan view of the Black Carbon Plant (Right)

5.1 Process plant mapping

In total, 23 major equipment are considered in analysis as shown in Fig. 6 and named in Table 1. In the plant two distinct process flows (PFs) can be identified, corresponding to two separate physical production lines sharing the pumping facilities, as shown in Fig. 6 and it can produce two different types of black carbon in same time. PF1 has a production capacity of 60 t/d while PF2 has a production capacity of 100 t/d. Both types of carbon black that are analyzed in this case study have a variable unit production cost estimated around 1111 €/t and a market selling price of 1273 €/t, based on information provided from plant engineers. The two sets of equipment of each process flow are S[1] = {E-1, E-2, E-7, E-9, E-10, E-13, E-14, E-15, E-18, E-20, E-22} and S[2] = {E-3, E-4, E-5, E-6, E-7, E-8, E-11, E-12, E-16, E-17, E-19, E-21, E-23}.

Fig. 6
figure 6

Simplified process flow diagram of black carbon plant

Table 1 Fragility curves parameters for different damage state of equipment

In Fig. 7 is shown the Capacity Block Diagram of black carbon plant. PF1 has two blocks with two fully redundant equipment each (PS1R [1] = {E-1, E-2}; PS2R [1] = {E-9, E-10}), a block with three equipment in parallel with a capacity of 33.3% each (PSP [1] = {E-13, E-14, E-15}) and a block with equipment is series (PSS [1] = {E-7, E-8, E-18, E-20, E-22}). PF2 has one block with four fully redundant equipment (PS1R [2] = {E-3, E-4, E-5, E-6}), one block with two fully redundant equipment (PS2R [2] = {E-11, E-12}) and a block with equipment in series (PSS [2] = {E-7, E-8, E-16, E-17, E-19, E-21, E-23}). It can be noticed that equipment E-7 and E-8 are used by both PFs, so in case any of them fails, the entire plant will stop operating.

Fig. 7
figure 7

Capacity block diagram of black carbon plant

5.2 Seismic hazard and vulnerability of the equipment

The scenario-based resilience analysis is applied to the black carbon plant. As mentioned previously, the black carbon plant is assumed to be placed closed to Priolo Gargallo city, having a latitude of 39.17° and a longitude of 15.17°, and characterized by a soil type B. The seismic hazard curve of the selected site is shown in Fig. 8a (Alessandri et al. 2018). In the same figure the selected PGAs used in analysis, ranging from 0.05 to 1.79 g, are indicated with red dashed vertical lines. PGA values smaller than 0.05 g have been neglected, because not capable to generate appreciable seismic damages to the equipment. On the contrary, the maximum value of PGA considered in the analysis, that is 1.79 g, corresponds to a very small annual probability of exceedance, which is approximately 10–6. Intermediate values of PGA can possibly generate relevant seismic damage scenarios and not negligible frequency occurrence.

Fig. 8
figure 8

a Seismic hazard curve of Priolo Gargallo site with the selected PGA; b Hazus (2022) Fragility curves of oil tanks

Seismic vulnerability of equipment is expressed in terms of fragility curves. Equipment with multiple damage states have been here considered, as summarized in Table 1. The definition of damage states of equipment is adopted from Hazus (2022), where DS1 corresponds to no damage, DS2 corresponds to slight/minor damages, DS3 corresponds to moderate damage, DS4 corresponds to extensive damage, and DS5 to complete damage. For more details about seismic vulnerability and damage states of the main equipment of hazardous facilities, the readers can refer to Paolacci et al. (2013). The fragility curves parameters of steel storage tanks, oil pumps station, horizontal HEX and vertical HEX are derived from Hazus (2022) manual. Meanwhile for the reinforced concrete pipe rack and steel frame structures such as vertical reactor, bag collector, milling tower and silo, fragility curves parameters are based on numerical models (Karagiannakis et al. 2020). Table summarizes the median value of PGAm and the lognormal standard deviation of the fragility curves of each group of process equipment. As a matter of fact, fragility curves for the oil tanks are illustrated in Fig. 8b.

5.3 Recovery model

Simplified probabilistic equipment recovery model (SPEREN) is used, and equipment recovery duration and costs are summarized in Table 2. Mean values for recovery duration and recovery costs of each equipment are provided by plant engineers based on their experience (Karagiannakis et al. 2020). In this study, both equipment recovery durations and costs are considered as normal distributions, truncated to have positive values. The use of Gaussian distribution seems reasonable, given that a recovery activity duration can deviate from the mean for random motives and in equiprobable manner, but the higher the deviation the lower the probability of occurrence However, another candidate distribution often used in maintenance engineering is lognormal.

Table 2 Mean values of recovery time and recovery cost for each DS of equipment

A standard deviation of 20% of the mean value is assumed for recovery durations and recovery costs. Assumption that there is no limitation in working force has been made, and the reconstruction of all damaged equipment can start at the same time, after inspection and planning activity, without having any interdependencies between them. A control time equal to 2 years (th = 730d) is selected for analyzing the operational capacity curve of the plant and calculation of seismic resilience index.

5.4 Analysis of the results and discussion

For each of the selected PGA scenarios, a Monte Carlo Simulation is conducted using PRIAMUS software (Corritore et al. 2017), to randomly generate seismic damage scenarios. A random number Xi is sampled from a uniform standard distribution U(0, 1), and based on equipment fragility curves the damage state for each equipment is defined. Along with equipment damage states a random recovery duration and random recovery cost, are generated based on predefined distributions for each damaged equipment. The seismic damage probability of exceeding several damage states in the equipment for PGA = 0.5 g is illustrated in Fig. 8.

From the Fig. 9, it is clear that storage tanks represent rather vulnerable equipment in different damage conditions. For example, for storage tanks, the probability to be in damage state DS3, DS4 or DS5 are respectively 37, 28 and 17%. Horizontal Heat Exchanger, the reactors, vertical reactors and the vertical Heat Exchanger can also be considered rather vulnerable elements, especially for damage state DS3 and DS4, as clearly shown in Fig. 9. According to the 19 scenarios generated with the help of PRIAMUS software, the statistic of the resilience quantities has been determined.

Fig. 9
figure 9

Seismic Damage Probability of exceeding damage states (DS) in the equipment for PGA = 0.5 g

Figure 10a shows probabilistic operational capacity curves of black carbon plant for different confidence intervals, corresponding to a PGA = 0.5 g. In this condition, resilience index which corresponds to 10th, 20th, 50th, 80th and 90th percentiles are 81.0, 77.4, 69.5, 45.0, 25.9%, respectively. It is worth mentioning that operational capacity curves of different percentiles represent realistic curves which may occur, and these curves are used later to calculate the probabilistic economic losses and probabilistic expected annual losses of the plant. We can see from Fig. 10a that the operational capacity of 90th percentile curve does not reach the full capacity even after 730 d. This is due to delays coming from probabilistic recovery of damage scenarios with at least a vertical reactor, reactor and vertical heat exchanger, or milling tower in DS4, which have a mean recovery time of 660 d.

Fig. 10
figure 10

a Probabilistic Operational Capacity curves of Black Carbon plant for MCS corresponding to PGA = 0.5; b Probabilistic resilience index curves for different PGA

Figure 10b summarizes the probabilistic resilience index for different PGA levels. It can be noticed that until a PGA = 0.15 g the variation of resilience indexes of different percentiles does not vary a lot, while it could not be said the same for PGA bigger than 0.15 g where the change is quite significant. Regarding the 80th and 90th percentiles resilience index, it can be noticed a difference around 10–20% in the range of PGA from 0.35 g to 0.7 g while for other values the difference is smaller. Moreover, for PGA equal or bigger than 1 g the 80th and 90th resilience index of the black carbon plant drops almost to 0%.

Figure 11a shows the distribution of resilience index for MCS of PGA = 0.5 g, which has a peak around 75%. Figure 11b shows the maximum recovery interval for PGA = 0.5 g, and it confirms that for some simulation the maximum time needed for the plant to be fully recovered is bigger than control time th = 730d. This is the reason why the 90th percentile operational capacity curve of Fig. 10a does not reach the 100% operational capacity. Maximum recovery time distribution seems to be a bimodal distribution having one peak around 350d and the second one around 700d. The first peak at 350d is governed from recovery of steel storage tanks in DS4 or DS5, vertical HEX in DS4 and bag collector in DS4, while the second peak at 700d is governed from recovery of vertical reactor in DS4, reactor plus vertical HEX in DS4 and milling tower in DS4.

Fig. 11
figure 11

a Resilience index distribution for MCS corresponding to PGA = 0.5 g; b Distribution of maximum recovery interval (Tmax) for MCS corresponding to PGA = 0.5 g

Figure 12a shows the distribution of economic losses for PGA = 0.5 g. Direct costs have a mean of 4.81 million euros, BI have a mean 7.17 million euros and EL have a mean of 11.98 million euros.

Fig. 12
figure 12

a Distribution of Economic Losses for PGA = 0.5 g; b Probabilistic Business Interruption losses for different PGA

Figure 12b shows the variation of probabilistic BI losses for different levels of PGA. For PGA bigger than 1 g the BI losses curve of 80th and 90th percentile reaches a plateau of maximum values, which goes around 18–18.5 million euros. Regarding the mean and 50th percentile curve, it can be notices that for PGA between 0.7 g to 1 g the BI losses are almost same, while for PGA values bigger than 1 g the 50th percentile BI losses are around 1 million euros bigger than BI losses of the mean curve. Meanwhile, for PGA smaller than 0.7 g the mean BI losses are bigger than BI losses of 50th percentile curve. Figure 13a shows the variation of probabilistic Direct Cost losses for different levels of PGA. It can be noticed that the 50th percentile and mean curve have almost same values. The max values of DC for 10th, 20th, 50th,80th, and 90th percentiles, are 12.2 million euro, 13.3 million euro, 15.0 million euro, 16.4 million euro and 17.0 million euro, respectively. While the maximum DC losses for the mean curve are 14.8 million euro. Figure 13b shows the variation of probabilistic EL for different levels of PGA. For PGA bigger than 1.2 g the EL curve of 80th and 90th percentile reaches a plateau of maximum values, which goes around 32–35 million euros. Regarding the mean and 50th percentile curve, it can be noticed that for PGA between 0.7 g to 1 g the EL are almost same, while for PGA values bigger than 1 g the 50th percentile EL are around 1 million euros bigger than EL losses of the mean curve. Meanwhile, for PGA smaller than 0.7 g the mean EL are bigger than EL of 50th percentile curve. The maximum values of EL for 10th, 20th, 50th, 80th, and 90th percentiles, are 26 million euro, 29 million euro, 32 million euro, 34 million euro and 35 million euro, respectively. While the maximum EL for the mean curve are 31.3 million euro.

Fig. 13
figure 13

a Probabilistic Direct Cost losses for different PGA; b Probabilistic Economic Losses for different PGA

In Fig. 14a are summarized the mean curves of DC, BI and EL of black carbon plant for different levels of PGA. For small levels of PGA, 0.05 g up to 0.3 g the BI losses influence approximately between 70 and 80% in EL, while for PGA levels bigger than 0.7 g the BI losses are slightly higher than DC. By combining the hazard curve of Fig. 8a and mean economic loss curve of Fig. 14a, the mean expected loss exceedance curve is constructed as shown in Fig. 14b. Using Eq. (14) mean expected annual losses (EALm) of the carbon black plant are calculated to be around 27,976 €, from which 19,599 € are due to business interruption and 8378 € are due to direct cost for reconstruction of damaged equipment. As business interruption losses causes around 70% of total expected losses, it is important to have an efficient recovery plan in order to minimize the maximum recovery time and business interruption losses.

Fig. 14
figure 14

a Mean economic loss curve for different PGA; b Mean annual frequency of exceedance of economic losses of the Black Carbon Plant

Additionally, EAL are calculated also for 10th, 20th, 50th, 80th and 90th percentiles, having a value of 5990 €, 8940 €, 18,972 €, 39,410 € and 60,114 €, respectively. We can see that the difference between EALm and EAL of 50th percentile is around 10,000 €, and this is due to higher values of EL of mean curve for lower PGAs compare to the 50th percentile curve. It is important to be highlighted, that the smaller PGAs, influence more the EALm due to their higher probability of exceedance. For the case of black carbon plant, the PGAs 0.5 g and smaller cause around 93% of total EALm.

6 Conclusions

The effects of catastrophic events on industrialized communities can be extremely dispendious both in terms of human lives and economic losses. Consequently, the mitigation strategies against natural hazards and climate change, primary causes of such events, need to be re-thought. In fact, interaction between critical elements and the presence of multiple damage conditions that characterize those events can amplify the damage propagation effects. Moreover, the quantification of the restoration capacity of infrastructures and the community resilience represents, under these conditions, a very difficult task.

A particular case is represented by Na-Tech events in major-hazard facilities, like process plants, which have been the object of many studies in the past, especially in presence of earthquakes. Nevertheless, methods for quantitative resilience quantification of such complex systems are scarce, which often do not account for all possible randomness in the quantification of the hazard, vulnerability and reconstruction costs.

For these reasons, the present paper proposed a Probabilistic Seismic Resilience Analysis (PSRA) method for the quantification of the operational capacity of hazardous process plants and the quantification of direct and indirect economic losses generated by the seismic damage conditions. The method is a generalization of the Performance-based Earthquake Engineering method, including also the restoration phase. It derived from a method already proposed by the authors, which now incorporates the probabilistic nature of the problem.

One of the relevant aspects of the proposed method is the capacity to account for, in a probabilistic manner, functional interactions between the damaged equipment, for the quantification of the operational capacity of the system along the time. At this purpose, a Simplified Plant Equipment Reconstruction Network (SPEREN) approach has been proposed, which allow the use of empirical recovery costs and duration from literature. Given the large number of random variables and the complex interaction between the equipment and reconstruction activities, the method has been formulated adopting a Monte Carlo Simulation approach.

Consequently, using a scenario-based approach samples of damage scenario and reconstruction activity are generated and the statistics of operational capacity and economic losses (direct and indirect) are evaluated.

Resilience value can inform decision making in a valuable manner. Each single plant manager can establish an acceptable threshold to resilience. No objective and absolute value can be stated, and no standard or law requirement has been established on that matter, because it depends on the impact that the capacity loss and interruption duration has on the customer satisfaction level and the overall company business. In safety–critical plants one should strive to have very high values of resilience, while a manufacturer of consumer goods can accept a lower value, provided that the resulting economic loss does not jeopardize the company survival. Based on the plant owner perception of the criticality of production interruption and capacity loss, he can judge as satisfactory or not the computed resilience level and decide to actuate resilience improvement projects such as modify the plant structure (i.e. introducing redundancy) to reduce capacity loss, improving equipment robustness or install protective devices. Capacity recovery plans can even be revised in order to shorten the duration of recovery process. The possibility to quantitatively compare resilience before and after such interventions provides a measures of their cost/effectiveness ratio.

A Black Carbon plant has been selected as representative case study for the application and validation of the proposed method. The selection is related to the evident strategic role of such a facility and its complexity. The plant ideally located in Priolo Gargallo (Italy), includes different seismically vulnerable elements (storage tanks, pumps, stacks, piperack, etc.), whose fragility curves have been derived either empirically or analytically. According to the proposed method the following conclusions can be drawn:

  • The operational capacity (OC) curve is rather dispersed with an evident dissymmetry with respect to the mean curve. This strongly depends on the dispersion of the recovery time of the equipment and the prevalent damage states in which they are in the aftermath of a seismic event. For example, in the analyzed case, for a PGA = 0.5 g, the 90th percentile OC curve presents a total recovery time out of scale (> 720d). This is due to delays coming from probabilistic recovery of damage scenarios with at least a vertical reactor, reactor and vertical heat exchanger, or milling tower in DS4, which have a mean recovery time of 660d.

  • The variation of resilience index (R) with the intensity measure (PGA) is straightforward. There is a lower zone where R is insensitive to PGA. In mean, this zone in extended up to PGA = 0.15 g with a 90% confidence interval between 0.1 g and 0.3 g. After that, R drops quite linearly up to (in mean) 1.3 g. Subsequently, R remains practically unvaried. This upper invariance zone ranges from 0.8 g up to PGA > 1.8 g.

  • The probability distribution functions of R and recovery interval Tmax appears bimodal, as shown in the example for PGA = 0.5 g. This depends essentially by the different damage states of the equipment and the different recovery function. In the example, the first peak (350d) is governed by the prevalent damage states (DS4/DS5) of storage tanks and their recovery phases. The second one is instead governed by the prevalent damage state (DS4) and recovery function of vertical reactors and reactor + vertical.

  • The proposed method easily allowed to evaluate the statistic of economic losses both of direct (seismic damage of the equipment, DC) and indirect (Business interruption, BI) nature. For example, similarly to R and Tmax, the probability distribution function for PGA = 0.5 g showed a clear prevalence in terms of mean value of the BI, with an increment with respect to DC of about 50%.

  • In the investigated range of PGA, the economic losses showed a prevalence of BI up to 0.3 g, after that the difference decreases to 10% after PGA = 1.0 g. This means that for seismic intensities whose return period comprises the limit states typically used for civil structures (up to collapse prevention) BI represents the predominant economic losses.

  • The mean expected annual losses (EALm) indicates that more than 90% of the economic loss is related to seismic events with a PGA <  = 0.5 g, due to their higher frequency of occurrence. Moreover, around 70% of EALm occurs due to BI, so it is important to have an efficient recovery strategy in order to minimize the economic losses.

In summary it is clear the potentiality of the proposed method for decision making analysis of process plant vulnerable to seismic action. The outcomes in terms of operational capacity and economic losses can be profitably used to individuate the most critical components of a plant and investigate the most effective mitigation strategy, both in terms of fragility reduction and recovery time control. This aspect along with the influence of possible domino effects will the object of further investigations, as well as the dynamic interaction between interconnected equipment.