1 Introduction

Multi-device contingencies in power systems can have disproportionate impact via cascading failure (e.g., additional generators must shut down to protect themselves if system frequency gets too low) and are particularly difficult to mitigate. Protective measures for power system resilience fall into discrete and continuous categories. Some discrete options include intentional tripping of generators, transmission lines, transformers, and load; as well as hardening decisions to prevent failures of these grid components in the first place. Continuous options include manipulation of device controls, e.g., generator voltage and power set-point adjustment. The work we present advances power system resilience research through the integration of multiple mathematical optimization paradigms to stabilize the grid over several possible contingencies.

Static power flow models, which assume constant power generation and load consumption, are typically used for planning and daily operations purposes. For example, unit commitment (Knueven et al. 2020b) and economic dispatch models use the DC optimal power flow approximations to schedule a grid’s fleet of generators to serve forecasted load at minimal cost under normal operating conditions. Power system restoration, on the other hand, uses AC optimal power flow approximations to generate a plan for restoring the grid after an extreme outage or blackout has occurred (Patsakis et al. 2018). These processes occur on timescales of minutes or more and models can therefore rely on static power flow assumptions for tractability, without needing to consider system dynamics which occur on a timescale of seconds.

Models aimed at measuring or minimizing the operational impacts of disruptive events to a power system—and particularly at understanding or preventing cascading failure due to protective device tripping—require a full dynamic model to represent the short-term behaviors that could cause such impacts. At a minimum, dynamic models capture the inertial behavior of generators as they convert mechanical energy into electrical energy, and the resulting fluctuation in metrics such as AC frequency, current and voltage throughout the system. If these metrics exceed certain upper or lower bounds they can cause additional impacts through mechanisms such as voltage collapse, protective load shed and/or tripping of generators or transmission lines, which in turn have follow-on effects. Dynamics-informed resilience metrics related to voltage levels, voltage stability, frequency levels, and power transmission levels are commonly-used to detect when the system is vulnerable to instability (Kundur et al. 1994).

The transient stability constrained optimal power flow (TSCOPF) problem, rooted in dynamic simulation stability research in the 1990 s, was developed to perform optimal economic dispatch while maintaining transient stability guarantees in the event of specific contingency scenarios (Gan et al. 2000; Abhyankar et al. 2017; Geng et al. 2017). TSCOPF is governed by differential algebraic equations (DAE) that model dynamic power generation and transmission over the grid, as well as constraints that keep transient stability, power and voltages, and/or other metrics within acceptable bounds. Effectively, TSCOPF optimizes the initial conditions of DAE contingency simulations subject to these constraints.

Since the introduction of multi-contingency (MC-) TSCOPF (Yuan et al. 2003), most TSCOPF work has been concerned with simultaneously guaranteeing stability over multiple contingency scenarios. However, in practice each contingency scenario usually represents failure/disconnect of a single device. Failure of multiple devices is much harder to protect against via initial conditions alone and is almost never considered. Our model is similar to MC-TSCOPF in that it optimizes system operating conditions prior to multiple potential contingencies, but different in that it considers additional decision variables as discussed below. Furthermore, optimizing generation cost subject to stability and operational constraints can leave the system with very little margin with respect to those constraints, and therefore vulnerable to additional perturbations. Our model is primarily concerned with severe event resilience and therefore optimizes operational margins rather than the generation cost as in (MC-) TSCOPF.

The transient stability emergency control (TSEC) model described in Abhyankar et al. (2017) and Gan et al. (2018), optimizes emergency control of system dynamics after a contingency has occurred. Like TSEC, our model incorporates dynamic post-contingency control – specifically of exciter voltage and governor mechanical torque set points. Unlike TSEC or TSCOPF, we solve the pre-contingency problem and each post-contingency recourse/emergency control problem simultaneously. In this regard our work is similar to dynamic preventive-corrective control (Yuan and Xu 2020; Arguello et al. 2021). Like Arguello et al. (2021), we use the full system DAE rather than a One Machine Infinite Bust (OMIB) representation, allowing us to optimize and constrain each device and consider metrics beyond transient stability. For convenience of implementation, we use the same controls in our pre- and post-contingency optimization stages. Thus, rather than optimizing pre-contingency real and reactive power like TSCOPF, our model directly optimizes the generator controls that determine real and reactive power. Thus our model is capable of determining the best control path to the desired pre-contingency power levels within a given timeframe, rather than assuming attainability of desired power levels.

There has been some progress in adding discrete planning decisions to dynamically-constrained problems. Transient stability constrained unit commitment (TSCUC) (Abhyankar et al. 2017; Geng et al. 2017; Xu et al. 2015) combines TSCOPF with unit commitment for dynamic stability-constrained economic scheduling and dispatch. Others have explored how to combine discrete variables with stability constraints for other purposes (Dehghanian et al. 2015; Li et al. 2016; Lu et al. 2018; Paramasivam et al. 2013; Kamali et al. 2018). Our model is different from these works in that it allows resilience-focused device hardening in addition to optimal high-fidelity generator control.

For dynamic preventive-corrective control, we leverage the model of Arguello et al. (2021), which performs two-stage stochastic optimization over multiple contingencies. We essentially extend the first stage of this model with discrete hardening of components to protect them from failure in any contingencies. With minor modification our model can extend to other discrete decisions such as intentional generator, transmission line, transformer, and load tripping.

In the next section of this paper, we present our model in stages. First, we summarize the nomenclature and basic power system dynamics model followed by a description of the discretization method that was applied to the differential equations. Next, we describe the resilience metrics that drive our optimization, how we model outages through disjunctive programming, and the mitigation component of our model. We then put all previous modeling components together and describe our solution methodology using a combination of nonlinear programming, generalized disjunctive programming, and stochastic programming techniques. We apply our approach to the 9-bus Western System Coordinating Council (WSCC) test system. This small system provides clear interpretation of results so that the reader can easily see the benefits of both mitigation and high-fidelity generator control in the seconds surrounding a failure. Finally, we present our conclusions and suggestions for future work.

2 Power system resilience and mitigation model

We use a similar dynamic power system model as the one described in Arguello et al. (2021), including a fourth-order flux decay generator, turbines with no reheating, and dynamic loads. This model is extended, through the use of disjunctive programming, to allow discrete mitigation decisions. The model nomenclature is defined in Tables 1, 2, 3, 4, 5 and 6. Note that all power dynamic and control variables are time-dependent. For brevity, we omit time indices and function notation.

Table 1 Grid components
Table 2 Bus mappings
Table 3 Grid Parameters
Table 4 Power dynamic variables
Table 5 Control variables
Table 6 Investment and switching variables

2.1 Power system dynamics model

In this section, we introduce the basic power system dynamics DAE model–a differential-algebraic system of equations that includes generator dynamics, load dynamics, AC power transmission, and power balance at buses. Then discretization, resilience metrics, outages, and mitigation features will be discussed. The section will finish by integrating all features into a stochastic optimization model that optimizes over a set of scenarios.

2.1.1 Generator model

Differential Eqs. (1)–(4) model the transformation of mechanical energy to electrical energy in a generator using the reduced-order flux-decay model from Sauer et al. (2017). This system of differential equations is derived by performing a Park’s transformation over a balanced, symmetrical, three-phase generator with a field winding and three damper windings on its rotor. Through this dynamic system, rotor frequency is related to mechanical torque from the governor, internal voltages, internal currents, a reference voltage, and the generator’s voltage at its connection to the rest of the grid. Dynamic simulators usually regard the reference voltage \(V_{{ref}_g}\) as a constant parameter, but our model allows it to vary over time, serving as a control variable.

$$\begin{aligned}&\frac{d\delta _{g}}{dt} = \omega _{g} - \omega _{s} \quad&\end{aligned}$$
(1)
$$\begin{aligned}&\frac{d\omega _{g}}{dt} = \frac{T_{M_{g}}}{M_{g}} - E^{'}_{q_{g}} \frac{I_{q_{g}}}{M_{g}} - I_{d_{g}}I_{q_{g}} \frac{X_{q_{g}} - X^{'}_{d_{g}}}{M_{g}} - D_{g} \frac{\omega _{g} - \omega _{s}}{M_{g}}&\end{aligned}$$
(2)
$$\begin{aligned}&\frac{dE^{'}_{q_{g}}}{dt} = -\frac{E_{q_{g}}}{T^{'}_{do_{g}}} - X_{d_{g}} - I_{d_{g}}\frac{X^{'}_{d_{g}}}{T^{'}_{do_{g}}} + \frac{E_{fd_{g}}}{T^{'}_{do_{g}}}&\end{aligned}$$
(3)
$$\begin{aligned}&\frac{dE_{fd_{g}}}{dt} = -\frac{E_{fd_{g}}}{T_{A_{g}}} + (V_{ref_{g}} - V_{b_g})\frac{K_{A_{g}}}{T_{A_{g}}}&\end{aligned}$$
(4)

\(\forall g \in {\mathcal {G}}\)

2.1.2 Governor model

Shaft mechanical torque \(T_{M_g}\) is often used as a parameter in dynamic simulators. Similarly to \(V_{{ref}_g}\), our model allows it to vary over time. Unlike \(V_{{ref}_g}\), however, our model does not directly vary \(T_{M_g}\). Instead, we include a non-reheat steam turbine model to smoothly vary \(T_{M_g}\) through \(P_{{ref}_g}\) using differential Eq. (5). If \(P_{{ref}_g}\) were fixed, \(T_{M_g}\) would exponentially decay to \(P_{{ref}_g}\). Given that we allow optimization over \(P_{{ref}_g}\), this governor model will force \(T_{M_g}\) to dynamically gravitate towards \(P_{{ref}_g}\) in an exponential decay.

$$\begin{aligned}&\frac{dT_{M_{g}}}{dt} = \frac{P_{ref_{g}}-{T_{M_{g}}}}{T_{ch_{g}}}&\end{aligned}$$
(5)

\(\forall g \in {\mathcal {G}}\)

2.1.3 Stator equations

The flux-decay model stator equations in (6)–(7) result from applying Kirchoff’s Voltage Law to the generator’s dynamic equivalent circuit. It models real and reactive voltage drop across the generator’s circuit elements.

$$\begin{aligned}&V_{b_g} \sin (\delta _{g} - \theta _{b_g}) + R_{s_{g}}I_{d_{g}} - X_{q_{g}}I_{q_{g}} = 0&\end{aligned}$$
(6)
$$\begin{aligned}&E^{'}_{q_{g}} - V_{b_g} \cos (\delta _{g} - \theta _{b_g}) - R_{s_{g}}I_{q_{g}} - X^{'}_{q_{g}}I_{d_{g}} = 0&\end{aligned}$$
(7)

\(\forall g \in {\mathcal {G}}\)

2.1.4 Exponential recovery load model

To include dynamic real and reactive power consumption on the grid, we use the exponential recovery load model (8)–(11) from Hill (1993). This model is based on the empirically-observed exponential recovery of an aggregated load after a step change in voltage. Equations (10)–(11) express power consumption as a linear combination of the dynamic variable x and an exponential function of V. In turn, \(\frac{dx}{dt}\) is a linear combination of x and exponential functions of V through (8)–(9). This model has the effect of \(P_{L_i}\) and \(Q_{L_i}\) recovering towards \(P_{O_{L_i}}\) and \(Q_{O_{L_i}}\) respectively after a change in voltage.

$$\begin{aligned}&\frac{dx_{p_{d}}}{dt} = \frac{x_{p_{d}}}{Tp_{L_{d}}} + Po_{L_{d}}V_{b_d}^{\alpha _{s_{d}}} - Po_{L_{d}}V_{b_d}^{\alpha _{t_{d}}}&\end{aligned}$$
(8)
$$\begin{aligned}&\frac{dx_{q_{d}}}{dt} = \frac{x_{q_{d}}}{Tq_{L_{d}}} + Qo_{L_{d}}V_{b_d}^{\beta _{s_{d}}} - Qo_{L_{d}}V_{b_d}^{\beta _{t_{d}}}&\end{aligned}$$
(9)
$$\begin{aligned}&P_{L_{d}} = \frac{x_{p_{d}}}{Tp_{L_{d}}}+ Po_{L_{d}}V_{b_d}^{\alpha _{t_{d}}}&\end{aligned}$$
(10)
$$\begin{aligned}&Q_{L_{d}} = \frac{x_{q_{d}}}{Tq_{L_{d}}}+ Qo_{L_{d}}V_{b_d}^{\beta _{t_{d}}}&\end{aligned}$$
(11)

\(\forall d \in {\mathcal {D}}\)

2.1.5 Power flow equations

The power form of Ohm’s law for AC power flow applied across transmission lines yields power flow Eqs. (12) and (14). They calculate the real and reactive power at bus i from the line between buses i and j. Similarly, (13) and (15) calculate the real and reactive power at bus j from the line between buses i and j. Note that these equations can include modeling of transformers, line charging susceptance, and shunt devices.

$$\begin{aligned}&P_{ij}= \frac{1}{\tau _{ij}}\left( V_{i}^{2}\frac{G_{ij}}{\tau _{ij}} - V_{i}V_{j}G_{ij}\cos (\theta _{i}- \theta _{j}) - V_{i}V_{j}B_{ij}\sin (\theta _{i}- \theta _{j})\right)&\end{aligned}$$
(12)
$$\begin{aligned}&P_{ji}= V_{j} ^{2}G_{ij} - \frac{1}{\tau _{ij}}(V_{i}V_{j}G_{ij}\cos (\theta _{i}- \theta _{j}) + V_{i}V_{j}B_{ij}\sin (\theta _{i}- \theta _{j}))&\end{aligned}$$
(13)
$$\begin{aligned}&Q_{ij}= \frac{1}{\tau _{ij}}\left( -V_{i} ^{2}\frac{B_{ij}-\frac{Bs_{ij}}{2}}{\tau _{ij}} + V_{i}V_{j}B_{ij}\cos (\theta _{i}- \theta _{j}) - V_{i}V_{j}G_{ij}\sin (\theta _{i} - \theta _{j})\right)&\end{aligned}$$
(14)
$$\begin{aligned}&Q_{ji}= -V_{j}^{2}\left( B_{ij} -\frac{Bs_{ij}}{2}\right) + \frac{1}{\tau _{ij}}(V_{i}V_{j}B_{ij}\cos (\theta _{i}- \theta _{j}) + V_{i}V_{j}G_{ij}\sin (\theta _{i}- \theta _{j}))&\end{aligned}$$
(15)

\(\forall i,j \in {\mathcal {L}}\)

2.1.6 Balance equations

The power-balance network Eqs. (16) and (17) enforce conservation of power at each bus. Together with the power flow equations, they model the transmission of real and reactive power from generators, through lines, to loads throughout the grid.

$$\begin{aligned}&{\sum _{g \in {\mathcal {G}}_b}}(I_{d_{g}}V_{b_g} \sin (\delta _g - \theta _b) + I_{q_{g}}V_{b_g} \cos (\delta _{g}- \theta _b)) - \; \; {\sum _{(i,j) \in {\mathcal {L}} \vert i = b}} P_{ij} \; \; \nonumber \\&- \; \; {\sum _{(i,j) \in {\mathcal {L}} \vert j = b}} P_{ij} - {\sum _{l \in {\mathcal {D}}_b}}P_{L_{l}} = 0 \end{aligned}$$
(16)
$$\begin{aligned}&{\sum _{g \in {\mathcal {G}}_b}}(I_{d_{g}}V_{b_g} \cos (\delta _g - \theta _b) + I_{q_{g}}V_{b_g} \sin (\delta _g - \theta _b)) - \; \; {\sum _{(i,j) \in {\mathcal {L}} \vert i = b}} Q_{ij} \; \;\nonumber \\&- \; \; {\sum _{(i,j) \in {\mathcal {L}} \vert j = b}} Q_{ij} - {\sum _{l \in {\mathcal {D}}_b}}Q_{L_{l}} = 0 \quad \end{aligned}$$
(17)

\(\forall b \in {\mathcal {B}}\)

2.1.7 Variation restriction

Finally, we add two inequality constraints that limit the rate of change of \(V_{ref}\) and \(P_{ref}\) so that feasible solutions include practical control schemes that avoid high levels of oscillation.

$$\begin{aligned}&\left| \frac{dV_{ref_{g}}}{dt} \right| \le K_{V_g} \end{aligned}$$
(18)
$$\begin{aligned}&\left| \frac{dP_{ref_{g}}}{dt} \right| \le K_{P_g} \end{aligned}$$
(19)

\(\forall g \in {\mathcal {G}}\)

2.2 Discretization

To incorporate this power system dynamics model into an optimization framework to maximize resilience, we discretize the DAE model and approximate the differential equations with algebraic equations. The time horizon [0, T] is partitioned through a finite set of points \({\mathcal {P}} = {\mathcal {T}} \cup {\mathcal {T}}_{fail}\) where \({\mathcal {T}} = \{t_1 = 0, t_2, \dots , t_{n-1}, t_n = T\}\) and \({\mathcal {T}}_{fail}\) is a set of failure times. For simplicity, we use uniform spacing between the points of \({\mathcal {T}}\). Every derivative in the power system dynamics model is then approximated using either a finite difference or collocation discretization scheme. See Arguello et al. (2021) for more details.

2.3 Resilience metrics

After applying a discretization scheme, the power system variables defined in Table 4 are available at points in \({\mathcal {P}}\) to form resilience metrics for use in the objective function. The following resilience metrics (\(M_{v}\), \(M_{\omega }\), and \(M_{d}\)) penalize deviations from nominal values of voltage, frequency, and load respectively.

$$\begin{aligned}&M_{v}(t_1, t_2) = \sum _{t \in \{\tau \in {\mathcal {P}} \vert t_1 \le \tau < t_2\}} \sum _{b \in {\mathcal {B}}}\left( \frac{1 - V_{b, t}}{\eta _1}\right) ^{\gamma _1} \end{aligned}$$
(20)
$$\begin{aligned}&M_{\omega }(t_1, t_2) = \sum _{t \in \{\tau \in {\mathcal {P}} \vert t_1 \le \tau < t_2\}} \sum _{g \in {\mathcal {G}}} \left( \frac{\omega _{g, t} - \omega _{s}}{\omega _{s} \cdot \eta _2}\right) ^{\gamma _2} \end{aligned}$$
(21)
$$\begin{aligned}&M_{d}(t_1, t_2) = \sum _{t \in \{\tau \in {\mathcal {P}} \vert t_1 \le \tau < t_2\}} \sum _{d \in {\mathcal {D}}} \left( \frac{P_{L_{d, t}} - Po_{d} }{\eta _3}\right) ^{\gamma _3} + \left( \frac{Q_{L_{d, t}} - Qo_{d}}{\eta _4}\right) ^{\gamma _4}&\end{aligned}$$
(22)

The \(\gamma \) parameters are chosen to be positive and even. Their magnitude determines the rate at which deviation from nominal values is penalized. Meanwhile, the \(\eta \) parameters can be used to shape the resilience metrics. Minimizing these metrics improves power quality while increasing voltage and frequency margins from their limits. This helps prevent protective tripping, which in turn may prevent other outages or even cascading failures. The objective function of our optimization model is formed from a linear combination of these resilience metrics taken over appropriate time intervals as shown in Sect. 2.6.3.

2.4 Outages

Component trips within the power system are modeled using disjunctions through generalized disjunctive programming (GDP). Each disjunction consists of two disjuncts, each containing a block of constraints. Feasible solutions satisfy either of the two disjuncts within each disjunction. To keep track of which disjunct is satisfied, we use variables \(\zeta \). If the first disjunct of a disjunction is satisfied, \(\zeta = 1\), otherwise \(\zeta = 0\).

We begin by modeling generator tripping:

$$\begin{aligned} \left[ \begin{array}{ll} (6) \\ (7) \\ \zeta _{g} = 1 \end{array} \right] \vee \left[ \begin{array}{ll} I_{q_{g}} = 0\\ I_{d_{g}} = 0 \\ \zeta _{g} = 0\end{array} \right] \qquad \forall g \in {\mathcal {G}} \end{aligned}$$
(23)

This disjunction either models the conversion of mechanical energy into electrical energy through the stator Eqs. (6)–(7) or specifies that no electrical energy is generated through \(I_{q_{g}} = 0\) and \(I_{d_{g}} = 0\)

Next, we model line tripping:

$$\begin{aligned} \left[ \begin{array}{ll} (12) \\ (13) \\ (14) \\ (15) \\ \zeta _{ij} = 1 \end{array} \right] \vee \left[ \begin{array}{ll} P_{ij} = 0\\ P_{ji} = 0\\ Q_{ij} = 0\\ Q_{ji} = 0\\ \zeta _{ij} = 0 \end{array} \right] \qquad \forall i,j \in {\mathcal {L}} \end{aligned}$$
(24)

This disjunction either models power transmission through the power flow equations for real and reactive power or specifies that no power is being transmitted through a line by setting the power flow variables equal to 0.

Finally, we model load tripping:

$$\begin{aligned} \left[ \begin{array}{ll} (8) \\ (9) \\ (10) \\ (11) \\ \zeta _{d} = 1 \end{array} \right] \vee \left[ \begin{array}{ll} P_{L_{d}} = 0\\ Q_{L_{d}} = 0\\ x_{p_{d}} = 0\\ x_{q_{d}} = 0\\ \zeta _{d} = 0 \end{array} \right] \qquad \forall d \in {\mathcal {D}} \end{aligned}$$
(25)

This disjunction either models the consumption of power at a load through the exponential recovery load model or specifies that no power is consumed by setting all variables in the exponential recovery load model to 0.

All of the disjunctions described above are indexed by particular time indices depending on the component trip scenario being modeled. This time index is described in more detail in Sect. 2.6.

2.5 Mitigation

The outage variables \(\zeta \) indicate whether or not an outage has occurred since the outage disjuncts are mutually exclusive. Consequently, these variables can be used as switches to either de-energize or protect individual components within a particular outage scenario. For example, if a grid component c is expected to become de-energized at time t unless a mitigation has been put in place, then the on/off status of c is determined through the constraint

$$\begin{aligned} \zeta _{c,t} = z_{c} \end{aligned}$$
(26)

where \(z_c\) is the binary decision variable being optimized. Now assume the set of components \({\mathcal {C}}\) is vulnerable in a scenario and will become de-energized unless mitigation investments are made. If only B such investments can be made, the constraint

$$\begin{aligned} \sum _{c \in {\mathcal {C}}} z_{c}\le B \end{aligned}$$
(27)

will choose up to B components from \({\mathcal {C}}\) to keep energized while de-energizing the rest.

As another alternative, the set of vulnerable components \({\mathcal {C}}\) can be partitioned into type-specific component sets, \({\mathcal {C}}_g = \{c \in {\mathcal {C}} \vert c \in {\mathcal {G}}\}\) for generator mitigations, \({\mathcal {C}}_l = \{c \in {\mathcal {C}} \vert c \in {\mathcal {L}}\}\) for line mitigations, and \({\mathcal {C}}_d = \{c \in {\mathcal {C}} \vert c \in {\mathcal {D}}\}\) for load mitigations. Type-specific investment constraints can be used:

$$\begin{aligned}&\sum _{c \in {\mathcal {C}}_g} z_{c}\le B_g \end{aligned}$$
(28)
$$\begin{aligned}&\sum _{c \in {\mathcal {C}}_l} z_{c}\le B_l \end{aligned}$$
(29)
$$\begin{aligned}&\sum _{c \in {\mathcal {C}}_d} z_{c}\le B_d \end{aligned}$$
(30)

A third alternative, is to associate a cost, \(p_c\), with protecting each grid component and use a weighted sum constraint as a more detailed mitigation decision mechanism:

$$\begin{aligned}&\sum _{c \in {\mathcal {C}}} p_c z_{c} \le B \end{aligned}$$
(31)

2.6 Stochastic mitigation and power system dynamics control optimization model

Finally, we extend the deterministic power system dynamics control and mitigation model detailed above to a two-stage stochastic program with recourse as in Arguello et al. (2021). In this extension, a finite scenario set \(\Xi = \{\xi _1, \dots \xi _n\}\) models outage scenarios where each \(\xi _i\) occurs with probability \(p_i\) and has an associated outage set \({\mathcal {C}}_i\). For ease of experimentation we assume that all failures occur at a common time \(t_f\) which allows us to partition the time horizon into a first-stage, \({\mathcal {T}}_f = \{t \in {\mathcal {P}} \vert t < t_f\}\), and a second-stage, \({\mathcal {T}}_s = \{t \in {\mathcal {P}} \vert t \ge t_f\}\). This event model represents simultaneous failures and idealized control with no delay in event detection or response. Our stochastic model can be generalized to include failure and control delays along with scenario and grid component-specific failure times. We leave this to future research.

2.6.1 First-stage

The first stage of our stochastic program captures decisions to be made before we know which outage scenario is realized. First-stage decision variables must be nonanticipative (i.e. the same across all scenarios). In our model the first stage variables are the control actions before \(t_f\) and the mitigation decisions. Control actions taken during this stage adjust the grid to be optimally prepared given the set of scenarios considered. More specifically, this includes \(V_{ref}\) and \(P_{ref}\) for all generators and \(t \in {\mathcal {T}}_f\). Mitigation decisions determine which components are hardened or protected from failing. The mitigation investment variable is defined as \(z_c\) for \(c \in {\mathcal {C}}\) where \({\mathcal {C}} = \bigcup _i^n {\mathcal {C}}_i\).

Our first-stage model includes all power system dynamics and control constraints (1)–(19). For mitigation investment decisions we use component type-specific budgets as shown in constraints (28)–(30).

2.6.2 Second-stage

The second stage solves for scenario-specific control decisions that occur after \(t_f\). There are no mitigation decisions in this stage and components that were protected in the first stage remain protected. Control decisions include all power dynamic and control variables (see variable Tables 4 and 5) for each \(t \in {\mathcal {T}}_s\).

Second-stage constraints include power system dynamics and control together with outage disjunctions that model either outage or protection depending on values of \(\zeta \). Power system dynamic constraints include (1)–(19) for each \(\xi _i \in \Xi \) and \(t \in {\mathcal {T}}_s\). To protect through mitigation decisions, we include (26) and outage disjunctions for each \(c \in {\mathcal {C}}_i\) in stage \(\xi _i \in \Xi \) and for every \(t \in {\mathcal {T}}_s\).

2.6.3 Objective function

Finally, our objective function seeks to optimize grid resilience in the first stage through prepositioning while also optimizing expected resilience through emergency control after a scenario has occurred at \(t_f\). Using our resiliency metrics \(M_v\), \(M_\omega \), and \(M_d\), our stochastic objective is

$$\begin{aligned}&\min _{V_{ref}, P_{ref}, \zeta } M_v(0,t_f) + M_\omega (0, t_f) + M_d(0,t_f)\nonumber \\&\qquad + \sum _{i}^n p_{i}\left( M_v(t_f, T) + M_\omega (t_f, t) + M_d(t_f, t) \right) \end{aligned}$$
(32)

Note that \(t_{f}\) can be any time period in [0, T]. In our experiments, we have selected \(t_{f}\) to be the midpoint. However, if economic costs were included, a significantly longer first stage could be considered to model longer term decisions in preparation for a set of scenarios \(\Xi \) that occur with more advanced notice. Depending on the severity of the scenarios, a longer second stage could also be studied to measure the efficacy of the emergency controls.

3 Solution methodology

The model we present in this paper is a stochastic nonconvex nonlinear generalized disjunctive program. There are widely-used techniques for solving each subtype of this problem class. Depending on problem structure and size, algorithms for solving stochastic programs include the L-shaped method of Birge and Louveaux (2009), progressive hedging from Rockafellar and Wets (1991), and direct LP or MIP solution of the extensive form. Nonconvex nonlinear problems can be solved through a combination of interior point methods, gradient descent algorithms, and active set algorithms (Nocedal and Wright 1999). Finally GDP’s have been traditionally solved through big-M (Trespalacios and Grossman 2015) and convex hull relaxations (Grossman and Lee 2003).

To express and solve our model, we use the Python-based mathematical programming language Pyomo (Bynum et al. 2021). Pyomo enables the expression of complex optimization problems and contains several extensions for representing and solving specific types of models. Pyomo extensions used in this work include mpi-sppy (Knueven et al. 2020a), Pyomo.DAE (Nicholson et al. 2018), Pyomo.GDP (Chen et al. 2022), and GDPopt (Chen et al. 2018). The mpi-sppy package facilitates the expression of stochastic programs and implements the L-shaped method and progressive hedging algorithms. Pyomo.DAE enables easy expression of differential equations and transformations for approximating derivatives using algebraic equations. Pyomo.GDP allows users to represent disjuncts and disjunctions and provides general implementations of the big-M and convex hull relaxations. Finally, GDPopt provides logic-based decomposition approaches for solving nonlinear GDP models. Each of these Pyomo extensions has been demonstrated individually but to our knowledge this is the first time they have all been combined to efficiently implement and solve the novel optimization formulation described in this work.

Our solution approach consisted of the following steps:

  1. 1.

    Implement deterministic power system resilience and mitigation model from Arguello et al. (2021) through Pyomo.DAE and Pyomo.GDP, including all differential equations and disjunctions.

  2. 2.

    Use Pyomo.DAE collocation transformation to approximate all derivatives using a collocation over finite elements discretization. Note, Pyomo.DAE also allows forward, backward, and central finite difference methods.

  3. 3.

    Extend to a stochastic program by duplicating the deterministic problem to form different scenarios and providing mpi-sppy scenario probabilities and parameters, including which variables belong to the first-stage.

  4. 4.

    Apply a nested sequence of solvers. For our model mpi-sppy required a subsolver capable of solving the nonlinear GDP model in each scenario. We used GDPopt, which in turn required mixed-integer linear programming (MINLP), mixed-integer nonlinear programming (MINLP), and nonlinear programming (NLP) subsolvers. Our GDPopt subsolver choices included Gurobi (Gurobi 2022), Bonmin (Bonami et al. 2008), and Knitro (Byrd et al. 2006).

4 Experimentation and results

We demonstrate the utility of our model through a multiple scenario failure contingency experiment on the WSCC 9-bus power system starting at a steady state. Our experiments optimize grid resilience while preemptively mitigating worst-case failures given a budget.

4.1 Outage scenarios

In our experiment, we synthesize four notional scenarios. The first scenario, which we refer to as the baseline scenario, is the system at steady state with no failures. This scenario is included since, in practice, the outcome that no failure occurs should be considered.

The three other scenarios each consist of three grid component failures that collectively cover all generators and load within the system. See Fig. 1a–c. For our experiment, we assume \(t_f = 1.5\) s and the full time horizon is 3 s long.

Fig. 1
figure 1

The three outage scenarios considered in addition to a baseline scenario with no outages. These scenarios collectively fail all generators, all loads, and three transmission lines

4.2 Dynamics with no resilience control and no mitigation

To see the impact of control and mitigation optimization, we first show the power system dynamics for scenarios 2–4 with no control and no mitigation by fixing the \(V_{ref}\) and \(P_{ref}\) parameters and using a budget of zero. See Figs. 2, 3 and 4. For scenario 2, the voltage at bus 2 drops significantly to 0.8 p.u. at \(t_f\), followed by voltage rapidly increasing past 1.05 p.u., both well past the stability margin of 0.95–1.05 p.u. This causes the frequency of generator 2 to increase and diverge beyond the nominal frequency of 60 Hz, while the frequencies of generators 1 and 3 decrease to compensate. Similar behavior amongst the generators is observed in scenario 3, except the voltages at each bus increase and decrease more chaotically, with bus 8 decreasing and all other buses increasing to compensate, followed by harmonic motion of the voltage at each bus. Scenario 4 also has frequency divergence, except now with generator 3 increasing past nominal frequency. Unlike Scenarios 2 and 3, in Scenario 4 all buses increase in voltage followed by a steady decline. The extreme voltages and frequencies seen in these scenarios could lead to further cascading failures in real power systems.

Given these failure scenarios and no mitigation controls, there exists no optimal solution, as the model is unable to stabilize itself. This is most notable in the frequency deviation observed for each scenario.

Fig. 2
figure 2

Scenario 2 dynamics with no controls or mitigation

Fig. 3
figure 3

Scenario 3 dynamics with no controls or mitigation

Fig. 4
figure 4

Scenario 4 dynamics with no controls or mitigation

4.3 Resilience control and mitigation budget of \(B = 1\)

With a mitigation budget of \(B=1\), the optimization problem can protect a single line, load, and generator across all scenarios, each of which occurs with \(p=0.25\) for simplicity. To see the mitigation choices made by the model relative to the grid, see Fig. 5. It chose to protect the lower left portion of the grid, namely the load at bus 5, the line between buses 4 and 5, and generator 1. These hardware components were determined to have the largest impact on grid resilience if tripped and protecting them results in the optimal reduction in expected system deviation.

Fig. 5
figure 5

Mitigation choices for \(B=1\), stars denote the protected components

Figures 6, 7, 8 and 9 show the outcome of both resilience control and mitigation optimization. Note that control profiles (\(V_{ref}\) and \(P_{ref}\)) and dynamics (V and \(\omega \)) are the same before \(t_f = 1.5\) s. This is expected since \(V_{ref}\) and \(P_{ref}\) before \(t_f\) are first stage variables.

Fig. 6
figure 6

Baseline scenario dynamics with \(B=1\)

Even in the baseline scenario where no component is failing, there are several noteworthy features within the dynamics and control profiles. First, as an artifact of pre-positioning the grid in anticipation of scenario 2–scenario 4, there is a gradual decrease of frequency. After \(t_f = 1.5\) s, there is a spike in \(P_{ref}\). Normally the model would be disincentivized from allowing frequency to dip too much from the nominal value of 60 Hz. However, as scenarios 2 through 4 each have load being tripped, the model prepares for the sudden loss of load by allowing the generators to slow down. After realizing that the baseline scenario occurred with no failures, a single generator speeds up by momentarily ramping \(P_{ref}\) to bring voltage and frequency closer to nominal values.

Fig. 7
figure 7

Scenario 2 dynamics with \(B=1\)

Fig. 8
figure 8

Scenario 3 dynamics with \(B=1\)

Fig. 9
figure 9

Scenario 4 dynamics with \(B=1\)

Note that recourse action after \(t_f\) and system responses are different in each scenario due to the nature of their outages. However, with appropriate control actions, voltages and generator frequencies remain within stable margins across all scenarios. The optimal deviation cost across all four scenarios was 3.331. See Sect. 4.5 for additional details.

4.4 Resilience control and mitigation budget of \(B = 2\)

Increasing the mitigation budget to \(B=2\) improves resilience while demonstrating limited marginal utility in mitigation investments. Now that an additional line, load, and generator can be protected, the model chooses to protect the left side of the system together with the load at bus 6. See Fig. 10. However, it chooses to not protect the generator at bus 2, even though it has enough budget to protect it. This demonstrates the marginal utility of budget on grid resilience.

Fig. 10
figure 10

Mitigation choice for \(B=2\), stars denote the protected components

Similar to mitigation with budget \(B=1\), we see a resilience effect with both mitigation and control. However, with additional components being protected, preventative control is less extreme. See Figs. 11, 12, 13 and 14. The optimal deviation cost across all four scenarios was 0.484. See Sect. 4.5 for additional details.

Fig. 11
figure 11

Baseline scenario dynamics with \(B=2\)

Fig. 12
figure 12

Scenario 2 dynamics with \(B=2\)

Fig. 13
figure 13

Scenario 3 dynamics with \(B=2\)

Fig. 14
figure 14

Scenario 4 dynamics with \(B=2\)

We further demonstrate the impact of controls and mitigation on the dynamics of the system in Sect. 1.

4.5 Results summary

The optimal objective function values from running the model at each budget level B are summarized in Table 7 and broken down into voltage deviation cost \(M_v\), frequency deviation cost \(M_\omega \), and load deviation cost \(M_d\). With a \(B=0\) investment, the solver is unable to find a feasible solution, suggesting that the set of failure scenarios leads to severe grid instability. With just a single \(B=1\) hardening investment for each component type, the grid becomes stable with a total objective cost of 3.331. Increasing the mitigation investment to \(B=2\) results in further improvements to the total objective cost, now only at 0.484, approximately an 85% reduction. Frequency deviation accounted for the greatest magnitude of cost for both \(B=1\) and \(B=2\), and thus the cost improvements are largely made by reducing \(M_\omega \). Given some cost for hardening a component, one could better quantify the optimal cost policy that minimizes grid instability and hardening investment costs.

Table 7 Comparison of optimal results

Table 8 demonstrates the maximum and minimum voltages and frequencies across all four scenarios for each level of investment. As B increases, the range of values for both voltage and frequency decreases—the range of values for frequency was reduced the most, coinciding with the greater reduction in cost seen in Table 7.

Table 8 Comparison of voltage and frequency ranges

For each of these solves, an extensive form solution methodology was used with the following solver settings:

figure a

The order of magnitude of the model was 14,484 variables and 14,106 constraints. Each model solve took approximately 25 min.

5 Conclusion

We have successfully incorporated binary mitigation variables into a two-stage stochastic optimization model with dynamic constraints to improve system resilience. The model combines GDP for incorporating mitigation decisions, stochastic programming to account for uncertain scenarios, and DAE equations to model the dynamics itself. We have demonstrated that mitigation decisions, when under a constrained budget, can choose to protect certain components asymmetrically across scenarios, as well as not exercising the entirety of the budget when certain combinations of components are vulnerable to being tripped simultaneously. This modeling framework is the first of its kind to study the interdependencies between optimal hardening mitigation, pre-positioning, and emergency control with respect to a set of failure scenarios.

5.1 Future research

This paper demonstrates the utility of our model on a small test system. Future research will leverage progressive hedging, variable screening, and reduced-order dynamically equivalent modeling to scale to larger and more realistic systems. Additional discretization points could also extend the time frame that is studied as well as increase the fidelity of results. As an example, additional points around a failure time could improve accuracy. Model fidelity can be improved by incorporating details on device types, outage effects, device damage, restorability, and remedial action schemes. These remedial action schemes can be modeled through hybrid dynamic modeling. Finally, constraints and objectives can be added to model transient, small-signal, and voltage stability.