1 Introduction

Risk evaluation for a complex system of multiple agents or subsystems is a fundamental problem relevant to many fields. A crucial question is the assessment of the total risk of the system, taking into account the risk of each agent and the risk associated with the efficient operation of the system as a whole. A challenge arises when the risk evaluation is based on confidential or proprietary information of the individual agents. Extensive literature exists on the properties of risk measures and their use in finance. Our goal is to address situations related to robotics, energy systems, business systems, and logistic problems, where heterogeneous sources of risk may exist. The nature and the complexity of the relations in those systems may prevent direct application of the theory and methods developed in finance. In many systems, the source of risk is associated with highly non-trivial aggregation of the features of its agents, which may not be available in an analytical form. For example, in automated robotic systems, the exchange of information may be limited or distorted due to the speed of operation, the distance in space between the agents, or other reasons. Another difficulty associated with evaluating risk arises when the risk of one agent stems from sources of uncertainty of a different nature. The question of aggregating those risk factors in one loss function does not have a straightforward answer.

The main objective of this paper is to suggest a new approach to risk-averse optimization of a distributed complex system. While building on the developments thus far, our goal is to identify a framework that is theoretically sound but also amenable to efficient numerical computations. We propose a set of axioms for functionals defined on the space of random vectors. The random vector comprises risk factors of various sources or represents the loss of each agent in a multi-agent system. While axioms for random vectors have been proposed earlier, our set of axioms differs from those in the literature, most notably with respect to the translation equivariance condition. When the dimension of the random vectors becomes one, the resulting systemic risk measures reduce to coherent measures of risk for scalar-valued random variables. We derive the dual representation of the systemic risk measures with fewer assumptions than those known for multivariate risks. We also propose several ways to construct systemic risk measures and analyze their properties.

A risk-averse two-stage optimization problem with a structure reflecting a distributed system of loosely coupled subsystems is formulated. The proposed distributed numerical method lets each subsystem optimize its operation with minimal information exchange among the subsystems (agents). The method demonstrates that distributed calculation of the systemic risk is possible without a significant computational burden. We also consider a two-stage wireless communication network model formulated for a team of robots. It addresses a situation when a team of robots explores an area, and each robot reports relevant data. The goal is to determine a few reporting points so that the communication is conducted most efficiently while managing the risk of losing data. Similar problems have been considered in literature before [19, 24, 36]. Our two-stage formulation for the information exchange problem is novel and has not been explored in prior research. We conduct several numerical experiments to compare various systemic risk measures.

Our paper is organized as follows. In Sect. 2, we provide preliminary information on coherent risk measures for scalar-valued random variables and survey existing methods for risk evaluation of complex systems. Section 3 contains the set of axioms, the dual representation associated with the resulting systemic risk measures, and two ways to construct such measures in practice. Section 4 provides a theoretical comparison of the new systemic measures of risk to other notions.

In Sect. 5, we formulate a risk-averse two-stage stochastic programming problem and propose a distributed method for solving the problem. It also contains a description of the wireless information exchange model and our numerical comparison of several systemic measures of risk.

2 Preliminaries

2.1 Coherent Risk Measures

The risk of one loss function can be evaluated using a coherent measure of risk or other classical risk measures, such as Value-at-Risk (VaR). The axiomatic framework for coherent measures of risk proposed in [2] is widely accepted. It is further extended and analyzed in [8, 15, 23, 29, 33, 34], and many other works. It is worth noting that another axiomatic approach was initiated in [20], and this line of thinking was developed into another framework in [31]. We refer to [15, 35] for an extensive treatment of risk measures for scalar-valued random variables. In [35], risk-averse optimization problems are presented as well and we adopt the setting and exposition of that source. Let \(\mathcal {L}_p(\varOmega , \mathcal {F}, P)\) be the space of real-valued random variables, defined on the probability space \((\varOmega ,\mathcal {F}, P)\), that have finite p-th moments, \(p \in [1, \infty )\), and are indistinguishable on events with zero probability. We shall assume that the random variables represent random costs or losses. A lower semi-continuous functional \(\varrho : \mathcal {L}_p(\varOmega , \mathcal {F}, P) \rightarrow \mathbb {R}\cup \{+\infty \}\) is a coherent risk measure if it is convex, positively homogeneous, monotonic with respect to the a.s. comparison of random variables, and satisfies the following translation property:

$$\begin{aligned} \varrho [Z + a] = \varrho [Z] + a \text { for all } Z \in \mathcal {L}_p(\varOmega , \mathcal {F}, P),\; a \in \mathbb {R}. \end{aligned}$$

If \(\varrho [\cdot ]\) is monotonic, convex, and satisfies the translation property, it is called a convex risk measure. The space \(\mathcal {L}_p(\varOmega , \mathcal {F}, P)\) equipped with its norm topology is paired with \(\mathcal {L}_q(\varOmega , \mathcal {F}, P)\) equipped with the weak\(^*\) topology where \(\frac{1}{p} + \frac{1}{q} = 1\). For any \(Z \in \mathcal {L}_p(\varOmega , \mathcal {F}, P) \) and \(\xi \in \mathcal {L}_q(\varOmega , \mathcal {F}, P)\), we use the bilinear form

$$\begin{aligned} \langle \xi , Z \rangle = \int _\varOmega \xi (\omega ) Z(\omega ) dP(\omega ). \end{aligned}$$

Every proper lower semicontinuous coherent risk measure \(\varrho \) has a dual representation of form

$$\begin{aligned} \varrho [Z] = \sup _{\xi \in \mathcal {A}_\varrho } \langle \xi , Z \rangle , \quad Z \in \mathcal {L}_p(\varOmega , \mathcal {F}, P), \end{aligned}$$
(1)

where \(\mathcal {A}_\varrho \subset \{ \xi \in \mathcal {L}_q(\varOmega , \mathcal {F}, P) ~ | ~ \xi \ge 0 \text { a.s.}, ~ \int _\varOmega \xi (\omega )dP(\omega ) = 1 \}\) is the convex-analysis subdifferential \(\partial \varrho [0]\).

Risk measures have also been defined by specifying a set of acceptable values for the random quantity in question; this set is called an acceptance set. This concept is commonly used in finance and a random outcome is deemed acceptable if it does not require any additional capital. Given the acceptance set \(\mathcal {K}_\varrho \subset \mathcal {L}_p(\varOmega , \mathcal {F}, P)\), the risk of a random variable Z is defined as

$$\begin{aligned} \varrho [Z] = \inf \{ d \in \mathbb {R}~|~ Z - d\,\mathbb {I}\in \mathcal {K}_\varrho \}, \end{aligned}$$
(2)

where \(\mathbb {I}\in \mathcal {L}_p(\varOmega , \mathcal {F}, P)\) stands for the random variable with \(\mathbb {I}(\omega )=1\) for all \(\omega \in \varOmega .\) In finance, this notion of risk is interpreted as the minimum amount of capital that needs to be invested to make the final position acceptable. It is shown that \(\varrho [\cdot ]\) defined in (2) is a coherent measure if and only if \(\mathcal {K}_\varrho \) is a convex cone (see, e.g., [14]).

2.2 Risk Measures for Complex Systems

As the risk is not additive, when we deal with distributed complex systems, the question of risk evaluation for the entire system needs to be addressed. This risk is usually called systemic in financial literature, and the measures for its evaluation are termed systemic risk measures. Further, the risk of the system may include components characterizing the system as a whole, such as the risk associated with completing a common task.

Assume that the system consists of m agents (subsystems). One approach to evaluating the total risk is to use an aggregation function, \(\varLambda :\mathbb {R}^m\rightarrow \mathbb {R}\), and univariate risk measures. More precisely, let \(X \in \mathcal {L}_p(\varOmega , \mathcal {F}, P; \mathbb {R}^m)\) be an m-dimensional random vector, where each component \(X_i\) corresponds to the costs of one agent. The first approach to systemic risk is to choose a univariate risk measure \(\varrho _0\) and apply it to the aggregated cost \(\varLambda (X).\) Using an acceptance set \(\mathcal {K}_\varrho \) as in (2), the systemic risk can be defined as:

$$\begin{aligned} \varrho [X] = \varrho _0[\varLambda (X)] = \inf \big \{ d \in \mathbb {R}~|~ \varLambda (X) - d\,\mathbb {I}\in \mathcal {K}_\varrho \big \}. \end{aligned}$$
(3)

This point of view is analyzed in [7] for finite probability spaces, where it is shown that any monotonic, convex, positively homogeneous function provides a risk evaluation as in (3) for an appropriate set \(\mathcal {K}_\varrho \). Further analysis is provided in [11], where the authors analyze convex risk measures defined on general probability spaces and propose examples of aggregation functions suitable for a financial system. In both studies, the essential results are established when the aggregation function \(\varLambda \) satisfies properties similar to the axioms postulated for risk measures. The aggregation function is suitably chosen for each specific problem. For example, in [4], a particular aggregation function and an evaluation method are proposed to deal with the risk associated with the cumulative externalities endured by financial institutions. One can also analyze the maximal risk over a class of aggregation functions rather than using one specific function. We refer to [32] for an overview of the risk measures constructed this way.

The translation property for constant vectors is introduced in [5] for convex risk measures defined for bounded random vectors. A similar approach is taken in [12], where law-invariant risk measures for bounded random vectors are investigated to obtain a Kusuoka representation. The axioms proposed in [5, 12] are closest to ours, and we provide a more detailed discussion in Sect. 3.

Another approach to risk evaluation of complex systems consists of evaluating agents’ individual risks first and then aggregating the obtained values. This method is used, for example, in [3] and in [13]. In [21], convex risk functionals are defined for portfolios of risk vectors following this aggregation principle. In Sect. 3, we shall show that our approach also accommodates this point of view. Using the notion of an acceptance set, the systemic risk measure in [3] is defined as follows:

$$\begin{aligned} \varrho [X]&= \inf \Big \{ \sum _{i=1}^m d_i ~|~ d \in \mathbb {R}^m, ~\varLambda (X - d\,\mathbb {I}) \in \mathcal {K}_\varrho \Big \}. \end{aligned}$$

A further extension in [3] replaces the constant vector \(d \in \mathbb {R}^m\) by a random vector \(Y \in \mathcal {C},\) where \(\mathcal {C}\) is a given set of admissible allocations. This formulation of the risk measure allows to decide scenario-dependent allocations, where the total amount \(\sum _{i=1}^m d_i\) can be determined ahead of time while individual allocations \(d_i\) may be decided when uncertainty is revealed. In [13], a set-valued counterpart of this approach is proposed by defining the systemic risk measure as the set of all vectors that make the outcome acceptable. Once the set of all acceptable allocations is constructed, one can derive a scalar-valued efficient allocation rule by minimizing the weighted sum of components of the vectors in the set. Set-valued risk measures were proposed in [18], see also [1, 17] for duality theory, including the dual representation for specific set-valued risk measures. The systemic risk depends on the choice of the aggregation function \(\varLambda \) and how well it captures the interdependence between the components. The dependence can also be captured based on the copula theory, as shown in [28]. It is assumed that independent operation does not carry systemic risk; hence, each agent can optimize its local risk independently. The systemic risk measures are then constructed based on the copulas of the distributions.

Another line of work includes methods that use some multivariate counterpart of the univariate risk measures. The main notion here is the Multivariate Value-at-Risk (\(\textsf{mVaR}\)) for random vectors, identified with the set of p-efficient points. Let \(F(X;\cdot )\) be the right-continuous distribution function of a random vector X with realizations in \(\mathbb {R}^m\). A p-efficient point for X is a point \(v\in \mathbb {R}^m\) such that \(F(X;v)\ge p\) and there is no point z that satisfies \(F(X;z)\ge p\) with \(z\le v\) component-wise. This notion plays a key role in optimization problems with chance constraints (see [9]). Multivariate Value-at-Risk satisfies the properties of translation equivariance, positive homogeneity, and monotonicity. This notion is used to define Average Value-at-Risk for multivariate distributions (\(\textsf{mAVaR}\)) in [22, 27, 30]. Let \(Z_p\) be the set of all points, each of which is component-wise larger than some p-efficient point:

$$\begin{aligned} Z_p = \bigcup _{v \in \textsf{mVaR}_p(X)} (v + \mathbb {R}^m_+). \end{aligned}$$

In [22], the authors define \(\textsf{mAVaR}\) of a random vector X at level \(p \in (0,1)\) as

$$\begin{aligned} \textsf{mAVaR}_p(X) = \mathbb {E}[ \varLambda (X) ~|~ X \in Z_p ], \end{aligned}$$
(4)

where \(\varLambda \) is assumed such that \(\mathbb {E}[\varLambda (X)]\) is finite for all \(X\in \mathcal {L}_p(\varOmega , \mathcal {F}, P; \mathbb {R}^m)\). They show that \(\textsf{mAVaR}\) is translation equivariant, positive homogeneous, and subadditive only when all components of the random vector are independent.

The authors of [25] propose a vector-valued Multivariate Average Value-at-Risk (\(\textsf{vmAVaR}\)) using the notion of p-efficient points and the extremal representation of the Average Value-at-Risk. Specifically, given probability \(p \in (0,1)\) and corresponding p-efficient points \(v \in \textsf{mVaR}_p(X)\), we consider the vectors:

$$\begin{aligned} \textsf{mAVaR}_p(X; v) = v + \frac{1}{p} \mathbb {E}[(X-v)_+], \end{aligned}$$

where \((X-v)_+\) is a random vector whose i-component is \(\max (0, X_i - v_i) \). The vector-valued Multivariate Average Value-at-Risk (\(\textsf{vmAVaR}\)) is calculated by solving the following optimization problem:

$$\begin{aligned} \textsf{vmAVaR}_p(X) = \min \{ \textsf{mAVaR}_p(X; v) ~|~ v\in \textsf{mVaR}_p(X) \}, \end{aligned}$$
(5)

where the minimum in (5) is taken in terms of Pareto-efficiency. The vector-valued Multivariate Average Value-at-Risk is monotonic, positively homogeneous, and translation equivariant but is not subadditive. Note that in both \(\textsf{mVaR}\) and \(\textsf{mAVaR}\), one must use a scalarization function to obtain a scalar value for the risk.

We shall compare our proposal to the aforementioned risk measures in Sect. 4.

3 Axiomatic Approach to Risk Measures for Random Vectors

In this section, we propose a set of axioms for measures of risk for random vectors with realizations in \(\mathbb {R}^m\). This framework is analogous to the properties of coherent risk measures for scalar-valued random variables. If \(m=1\), the proposed set of axioms exactly coincides with those in [35]. We denote by \(\mathcal {Z}=\mathcal {L}_p(\varOmega , \mathcal {F}, P;\mathbb {R}^m)\) the space of random vectors with realizations in \(\mathbb {R}^m\), defined on \((\varOmega ,\mathcal {F}, P)\). Throughout the paper, we shall consider risk measure \(\varrho \) for random vectors in \(\mathcal {Z}\) to be a lower semi-continuous functional \(\varrho : \mathcal {Z}\rightarrow \mathbb {R}\cup \{+\infty \}\) with non-empty domain. Abusing notation, we denote the m-dimensional vector, whose components are all equal to one by \(\textbf{1}\), and the random vector with realizations equal to \(\textbf{1}\) by \(\mathbb {I}\).

Definition 3.1

A proper lower semi-continuous functional \(\varrho : \mathcal {Z}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a coherent risk measure with preference to small outcomes if it satisfies the following axioms:

  1. A1.

    Convexity: For all \(X, Y \in \mathcal {Z}\) and \( \alpha \in (0,1)\), the following holds: \( ~ \varrho [\alpha X + (1-\alpha )Y] \le \alpha \varrho [X] + (1 - \alpha )\varrho [Y]. \)

  2. A2.

    Monotonicity: For all \(X, Y \in \mathcal {Z}\), if \(X_i \ge Y_i\) a.s. for all components \(i = 1, \ldots , m\), then \(\varrho [X] \ge \varrho [Y]\).

  3. A3.

    Positive homogeneity: For all \(X \in \mathcal {Z}\) and \(t > 0\), \(\varrho [tX] = t\varrho [X]\).

  4. A4.

    Translation equivariance: For all \(X \in \mathcal {Z}\) and for all \(a \in \mathbb {R}\), the following equality holds \(\varrho [X + a\mathbb {I}] = \varrho [X] + a\varrho [\mathbb {I}]\).

A proper lower semi-continuous functional \(\varrho : \mathcal {Z}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a convex risk measure if it satisfies axioms A1, A2, and A4.

The axioms of convexity and positive homogeneity are defined similarly to the properties of coherent risk measures, and the random vectors are now compared component-wise for the property of monotonicity. The main difference is the definition of a translation equivariance axiom. It suggests that if the random loss increases by a constant amount for all components, the risk should also increase by the same amount.

3.1 Dual Representation

In order to derive a dual representation of the multivariate risk measure, we pair the space of random vectors \(\mathcal {Z}\) with the space \(\mathcal {Z}^* = \mathcal {L}_q(\varOmega , \mathcal {F}, P; \mathbb {R}^m)\), where \(q \in (1, \infty ]\) is such that \(\frac{1}{p} + \frac{1}{q} = 1\), \(q = \infty \) for \(p = 1\). For \(X \in \mathcal {Z}\) and \(\zeta \in \mathcal {Z}^*\) the bilinear form \(\langle \cdot , \cdot \rangle \) on the product space \(\mathcal {Z}\times \mathcal {Z}^*\) is defined as follows:

$$\begin{aligned} \langle \zeta , X \rangle = \int _\varOmega \langle \zeta (\omega ), X(\omega ) \rangle dP(\omega ). \end{aligned}$$

The Fenchel conjugate function \(\varrho ^*: \mathcal {Z}^* \rightarrow \mathbb {R}\cup \{+\infty \}\) of the risk measure \(\varrho \) is given by

$$\begin{aligned} \varrho ^*[\zeta ] = \sup _{X \in \mathcal {Z}} \big \{ \langle \zeta , X \rangle - \varrho [X] \big \}, \end{aligned}$$

and the conjugate of \(\varrho ^*\) (the bi-conjugate function) is

$$\begin{aligned} \varrho ^{**}[X] = \sup _{\zeta \in \mathcal {Z}^*} \big \{ \langle \zeta , X \rangle - \varrho ^*[\zeta ] \big \}. \end{aligned}$$

The Fenchel–Moreau theorem implies that if \(\varrho [\cdot ]\) is convex and lower semi-continuous, then \(\varrho ^{**} = \varrho \) and

$$\begin{aligned} \varrho [X] = \sup _{\zeta \in \mathcal {A}_\varrho } \big \{ \langle \zeta , X \rangle - \varrho ^*[\zeta ] \big \}, \end{aligned}$$
(6)

where \(\mathcal {A}_\varrho = \textrm{dom}(\varrho ^*)\) is the domain of the conjugate function \(\varrho ^*\). Then based on the Fenchel–Moreau theorem and the axioms proposed in this paper, we show the following theorem.

Theorem 3.1

Suppose \(\varrho : \mathcal {Z}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a convex proper lower semicontinuous risk functional. Then the following holds:

  1. (i)

    Property A2 is satisfied iff \(\zeta \ge 0\) a.s. for all \(\zeta \) in the domain of \(\varrho ^*\).

  2. (ii)

    Property A3 is satisfied iff \(\varrho ^*\) is the indicator function of \(\mathcal {A}_\varrho \), i.e.,

    $$\begin{aligned} \varrho [X] = \sup _{\zeta \in \mathcal {A}_\varrho } \{ \langle \zeta , X \rangle \}. \end{aligned}$$
    (7)
  3. (iii)

    Property A4 is satisfied if and only if \(\varrho [\mathbb {I}] = \langle \mathbb {I}, \mu _\zeta \rangle \) for all \(\zeta \in \mathcal {A}_\varrho \), where \(\mu _\zeta = \int _\varOmega \zeta (\omega ) P(d\omega )\).

Proof

Since \(\varrho [\cdot ]\) is convex and lower semicontinuous and we have assumed that it has a non-empty domain, the representation (6) holds by virtue of the Fenchel–Moreau theorem.

(i) Suppose \(\varrho \) satisfies the monotonicity condition. Assume that \(\zeta _i(\omega )< 0 \) for \(\omega \in \Delta \in \mathcal {F}\) with \(P(\Delta ) > 0\) for some component \(i = 1,\ldots ,m\). Define \(\bar{X}_i\) equal to the indicator function of the event \(\Delta \) and \(\bar{X}_j = 0 \) for \(j \ne i\), \(j = 1,\ldots , m\). Take any X with support in \(\Delta \) such that \(\varrho [X]\) is finite and define \(X_t: = X - t\bar{X}\). Then for \(t \ge 0\), we have \(X \ge X_t\) a.s. and \(\varrho [X] \ge \varrho [X_t]\) by monotonicity. Consequently,

$$\begin{aligned} \varrho ^*[\zeta ]&\ge \sup _{t \in \mathbb {R}_+} \Big \{ \langle \zeta , X_t \rangle - \varrho [X_t] \Big \} \ge \sup _{t\in \mathbb {R}_+} \Big \{\langle \zeta , X \rangle - t\langle \zeta ,\bar{X}\rangle -\varrho [X] \Big \}\\&= \sup _{t\in \mathbb {R}_+} \Big \{\langle \zeta , X \rangle - t \int _\Delta \zeta _i(\omega )P(d\omega ) -\varrho [X] \Big \} = +\infty . \end{aligned}$$

It follows that \(\varrho ^*[\zeta ] = +\infty \) for every \(\zeta \in \mathcal {Z}^*\) with at least one negative component, thus \(\zeta \notin \textrm{dom}\varrho ^*\). Conversely, suppose that \(\zeta \in \mathcal {Z}^*\) has realizations in \(\mathbb {R}^m\) with nonnegative components a.s. Then whenever \(X \ge X'\) a.s., we have:

$$\begin{aligned} \langle \zeta , X \rangle&= \int _\varOmega \langle \zeta (\omega ), X(\omega ) \rangle dP(\omega ) \ge \int _\varOmega \langle \zeta (\omega ), X'(\omega )\rangle dP(\omega )= \langle \zeta , X' \rangle . \end{aligned}$$

Consequently,

$$\begin{aligned} \varrho [X]&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{ \langle \zeta , X \rangle - \varrho ^*[\zeta ]\Big \} \ge \sup _{\zeta \in \mathcal {Z}^*} \Big \{ \langle \zeta , X' \rangle - \varrho ^*(\zeta ) \Big \} = \varrho [X']. \end{aligned}$$

Hence, the monotonicity condition holds.

(ii) Suppose the positive homogeneity property holds. If \(\varrho [tX] = t\varrho [X]\) for all \(X \in \mathcal {Z}\), then for any fixed \(t>0\), we get

$$\begin{aligned} \varrho ^*[\zeta ]&= \sup _{X \in \mathcal {Z}} \Big \{ \langle \zeta , X \rangle - \varrho [X] \Big \} = \sup _{X \in \mathcal {Z}} \Big \{ \langle \zeta , tX \rangle - \varrho [tX] \Big \} \\&= \sup _{X \in \mathcal {Z}} t \Big \{ \langle \zeta , X \rangle - \varrho [X] \Big \} = t\varrho ^*[\zeta ]. \end{aligned}$$

Hence, if \(\varrho ^*[\zeta ]\) is finite, then \(\varrho ^*[\zeta ] = 0\) as claimed. Conversely, if \(\varrho [X] = \sup _{\zeta \in \textrm{dom}\varrho ^*}\langle \zeta , X \rangle \), then \(\varrho \) is positively homogeneous as the support function of a convex set.

(iii) Suppose the translation property is satisfied, i.e. \(\varrho [X + t\mathbb {I}] = \varrho [X] + t\varrho [\mathbb {I}]\) for any \(X \in \mathcal {Z}\) and a constant \(t \in \mathbb {R}\). Then for any \(k \in \mathbb {R}\) and \(\zeta \in \mathcal {Z}^*\), we get:

$$\begin{aligned} \varrho ^*[\zeta ]&= \sup _{X \in \mathcal {Z}} \Big \{ \langle \zeta , X + k \mathbb {I}\rangle - \varrho [X+k\mathbb {I}] \Big \} \\&= \sup _{X \in \mathcal {Z}} \Big \{ \int _\varOmega \langle \zeta (\omega ), X(\omega ) + k\mathbb {I}\rangle P(d\omega ) - \varrho [X] - k\varrho [\mathbb {I}] \Big \} \\&= \sup _{X \in \mathcal {Z}} \Big \{ \langle \zeta , X \rangle + k \int _\varOmega \langle \mathbb {I}, \zeta (\omega ) \rangle P(d\omega ) - \varrho [X] - k\varrho [\mathbb {I}] \Big \} \\&= \varrho ^*[\zeta ] + k \Big ( \int _\varOmega \langle \mathbb {I}, \zeta (\omega ) \rangle P(d\omega ) - \varrho [\mathbb {I}] \Big ). \end{aligned}$$

If \(\varrho ^*[\zeta ]\) is finite, then \(\varrho [\mathbb {I}] = \int _\varOmega \langle \mathbb {I}, \zeta (\omega ) \rangle P(d\omega )\).

Let us denote \(\mu _\zeta = \int _\varOmega \zeta (\omega )P(d\omega )\), then we obtain

$$\begin{aligned} \varrho [\mathbb {I}] = \langle \mathbb {I}, \int _\varOmega \zeta (\omega ) P(d\omega ) \rangle = \langle \mathbb {I}, \mu _\zeta \rangle \quad \text {for all }\zeta \in \mathcal {A}_\varrho . \end{aligned}$$
(8)

Conversely, suppose \(\varrho [\mathbb {I}] = \langle \mathbb {I}, \mu _\zeta \rangle \). Then for any \(X \in \mathcal {Z}\) and \(k \in \mathbb {R}\):

$$\begin{aligned} \varrho [X + k\mathbb {I}]&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{ \langle \zeta , X + k \mathbb {I}\rangle - \varrho ^*[\zeta ] \Big \} \\&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{ \int _\varOmega \langle \zeta (\omega ), X(\omega ) + k\mathbb {I}\rangle P(d\omega ) - \varrho ^*[\zeta ] \Big \} \\&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{ \langle \zeta , X \rangle + k \int _\varOmega \langle \mathbb {I}, \zeta (\omega ) \rangle P(d\omega ) - \varrho ^*[\zeta ] \Big \} \\&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{ \langle \zeta , X \rangle - \varrho ^*[\zeta ] + k\varrho [\mathbb {I}] \Big \} = \varrho [X] + k\varrho [\mathbb {I}]. \end{aligned}$$

Hence, the translation property is satisfied. \(\square \)

It follows from Theorem 3.1 that if the systemic risk measure \(\varrho \) is proper, lower semicontinuous, and satisfies the axioms of monotonicity, convexity, translation equivariance, and positive homogeneity, then representation (7) holds with \(\mathcal {A}_\varrho \) given by:

$$\begin{aligned} \mathcal {A}_\varrho \subset \Big \{ \zeta \in \mathcal {Z}^* \;|\; \int _\varOmega \zeta (\omega ) dP(\omega ) = \mu _\zeta , \; \zeta \ge 0 \text { a.s.}, \; \langle \mathbb {I}, \mu _\zeta \rangle =r \Big \}, \end{aligned}$$

where \(r\in \mathbb {R}\) is a constant such that \(r=\varrho [\mathbb {I}].\)

Corollary 3.1

If a multivariate risk measure \(\varrho [\cdot ]\) is coherent, then \(\varrho [0] = 0\) and \(\mathcal {A}_\varrho = \partial \varrho [0].\)

Proof

We see that \(\varrho \) is the support function of \(\mathcal {A}_\varrho = \textrm{dom}(\varrho ^*)\). Hence,

\(\varrho [0] = \sup _{\zeta \in \mathcal {Z}^*} \{ \langle 0,\zeta \rangle - \varrho ^*[\zeta ] \big \} = 0.\) To show the form of the set \(\mathcal {A}_\varrho \) recall that

$$\begin{aligned} \partial \varrho [0]&= \{ \zeta \in \mathcal {Z}^* \;|\; \langle \zeta , X - 0 \rangle \le \varrho [X] - \varrho [0] \quad \forall X \in \mathcal {Z}\} \\&= \{ \zeta \in \mathcal {Z}^* \;|\; \langle \zeta , X \rangle \le \varrho [X] \quad \forall X \in \mathcal {Z}\}. \end{aligned}$$

Hence, for all \(\zeta \in \mathcal {A}_\varrho \), (7) implies that \(\zeta \in \partial \varrho [0]\). On the other hand, if \(\zeta \in \partial \varrho [0]\), then \(\zeta \in \mathcal {A}_\varrho \) by the definition of a support function. \(\square \)

We shall consider the following additional property.

  • Normalization A coherent systemic measure of risk \(\varrho : \mathcal {Z}\rightarrow \mathbb {R}\cup \{+\infty \}\) is normalized if \(\varrho [\mathbb {I}] = 1\).

The following statement follows from equation (8) assuming the normalization property.

Corollary 3.2

For a normalized coherent systemic measure of risk \(\varrho [\cdot ]\), we have \(\int _\varOmega \langle \textbf{1}, \zeta (\omega ) \rangle P(d\omega ) =1\) for all \(\zeta \in \mathcal {A}_\varrho \).

The normalization entails that for all \(\zeta \in \mathcal {A}_\varrho \), \(\zeta P\) can be interpreted as a probability measure on the space \(\varOmega \times \{ 1, 2,\ldots , m \}\).

There are only few works in the existing literature that consider a translation property for risk measures for random vectors. For example, in [5], the authors have adopted the following translation axiom.

Property T For any constant \(\alpha \in \mathbb {R}\) and any vector \(e^i\) whose i-th component is 1 (\(i=1,\ldots , m\)) and all other components are zero, the systemic measure of risk satisfies \(\varrho [X+\alpha e^i] = \varrho [X] + \alpha .\)

Theorem 3.2

Assume that \(\varrho \) is a proper lower-semicontinuous convex risk functional. Property T holds if and only if \(\int _\varOmega \zeta _i(\omega ) dP(\omega ) =1\) for all \(i=1,\ldots , m\) for all \(\zeta \in \textrm{dom}\varrho ^*\). Furthermore, if Property T holds, then

  1. (i)

    For all \( \zeta \in \textrm{dom}\varrho ^*\), \(\int _\varOmega \langle \textbf{1}, \zeta (\omega ) \rangle dP(\omega ) =\varrho [\mathbb {I}]=m\).

  2. (ii)

    For all \(X\in \mathcal {Z}\) and all \(a\in \mathbb {R}^m,\) \(\varrho [X+ a] = \varrho [X] + \varrho [a]. \)

  3. (iii)

    The systemic risk measure \(\varrho [\cdot ]\) is linear on deterministic vectors.

Proof

Suppose Property T holds. Then for a random vector X in the domain of \(\varrho \) and every \(\zeta \in \mathcal {Z}^*\), we have

$$\begin{aligned} \varrho ^*[\zeta ]&\ge \sup _{\alpha \in \mathbb {R}}\big \{\langle \zeta ,X+\alpha e^i\rangle -\varrho [X+ \alpha e^i]\big \} \\&=\sup _{\alpha \in \mathbb {R}} \Big \{\int _\varOmega \alpha \zeta _i(\omega ) P(d\omega )+\langle \zeta ,X\rangle -\varrho [X] - \alpha \Big \} \\&= \sup _{\alpha \in \mathbb {R}} \alpha \Big \{\int _\varOmega \zeta _i(\omega ) P(d\omega )- 1\Big \}+\langle \zeta ,X\rangle -\varrho [X]. \end{aligned}$$

It follows that \(\varrho ^*[\zeta ]=+\infty \) for any \(\zeta \in \mathcal {Z}^*\) such that \(\int _\varOmega \zeta _i(\omega ) P(d\omega ) \not = 1\) for \(i=1,\ldots ,m\). This entails the following for every constant vector \(a\in \mathbb {R}^m\):

$$\begin{aligned} \varrho [a]= \varrho \Big [\sum _{i=1}^m a_ie^i\Big ] = \sum _{i=1}^m a_i. \end{aligned}$$
(9)

The other direction is straightforward. Indeed,

$$\begin{aligned} \varrho [X+\alpha e^i]&= \sup _{\zeta \in \mathcal {Z}^*}\big \{\langle \zeta ,X+\alpha e^i\rangle -\varrho ^*[\zeta ]\big \} \\&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{\int _\varOmega \alpha \zeta _i(\omega ) P(d\omega )+\langle \zeta ,X\rangle -\varrho ^*[\zeta ]\Big \} \\&= \sup _{\zeta \in \mathcal {Z}^*} \Big \{\alpha +\langle \zeta ,X\rangle -\varrho ^*[\zeta ]\Big \} = \varrho [X]+ \alpha . \end{aligned}$$

This implies that \(\varrho [\mathbb {I}] = \int _\varOmega \langle \textbf{1}, \zeta (\omega ) \rangle dP(\omega ) = \sum _{i=1}^m \int _\varOmega \zeta _i(\omega ) dP(\omega ) = m\) which shows (i). Due to equation (9), for all \(X\in \mathcal {Z}\) and \(a\in \mathbb {R}^m\), we obtain

$$\begin{aligned} \varrho [X+a] = \varrho \left[ X+\sum _{i=1}^m a_ie^i \right] = \varrho [X] + \sum _{i=1}^m a_i = \varrho [X] + \varrho [a], \end{aligned}$$

which shows (ii). A special case of (ii) implies \(\varrho [a + b] = \varrho [a]+ \varrho [b]\) for all \(a,b\in \mathbb {R}^m.\) This combined with the fact that \(\varrho [0]=0\) and the positive homogeneity of the risk measure proves statement (iii). \(\square \)

In [12], the authors have analyzed law-invariant risk measures for bounded random vectors. They have introduced a set of axioms closest to ours: their axioms include our axioms with the two normalization properties \(\varrho [\mathbb {I}]=1\) and \(\varrho [0]=0\). We do not need these normalization properties to establish the dual representation for general random vectors with finite p-moments, \(p\ge 1\); we derive that the risk of the deterministic zero vector is zero from the dual representation. The property of strong coherence of risk measures, introduced in that paper, implies in particular that \(\varrho [a + b] = \varrho [a]+ \varrho [b],\) which appears to be a strong assumption.

3.2 Systemic Risk Measures Obtained Via Sets of Linear Scalarizations

In this section, we consider systemic risk measures based on sets S of linear scalarization vectors c taken from the simplex

$$\begin{aligned} S^m_+ =\left\{ c \in \mathbb {R}^m \;\bigg |\; \sum _{i=1}^m c_i = 1, \; c_i \ge 0, \; i = 1, \ldots , m \right\} . \end{aligned}$$

Let \(\varrho :\mathcal {L}_p(\varOmega ,\mathcal {F}, P)\rightarrow \mathbb {R}\cup \{ +\infty \}\) be a lower semi-continuous univariate risk measure. For any fixed closed set \(S\subset S^m_+\), we define the systemic risk measure

$$\begin{aligned} \varrho _S[X]=\varrho [X_S],\quad \text {where } X_S (\omega )= \max _{c\in S} c^\top X(\omega ),\;\; \omega \in \varOmega . \end{aligned}$$
(10)

It is straightforward to see that \(X_S\in \mathcal {L}_p(\varOmega ,\mathcal {F}, P)\) entailing that the systemic risk measure \(\varrho _S[\cdot ]\) is well-defined on \(\mathcal {L}_p(\varOmega , \mathcal {F}, P; \mathbb {R}^m).\)

Theorem 3.3

If \(\varrho :\mathcal {L}_p(\varOmega ,\mathcal {F}, P)\rightarrow \mathbb {R}\cup \{ +\infty \}\) is a coherent (convex) univariate measure of risk, then for any closed set \(S\subset S^m_+\), the systemic risk measure \(\varrho _S[X]=\varrho [X_S]\) defined in (10) is coherent (convex) according to Definition 3.1.

Proof

For two random vectors \(X,Y\in \mathcal {Z}\) with \(X\le Y\) componentwise a.s., we have \(c^\top X\le c^\top Y\) a.s. for all \(c\in S \subset S^m_+\). This implies that \(\max _{c\in S} c^\top X\le \max _{c\in S} c^\top Y\) a.s. as well. Hence, \(\varrho [X_S]\le \varrho [Y_S]\) by the monotonicity property of \(\varrho [\cdot ]\). Thus, the monotonicity axiom in Definition 3.1 is satisfied.

Given two random vectors \(X, Y \in \mathcal {Z}\) and \(\alpha \in (0,1)\), consider their convex combination \(\alpha X + (1-\alpha )Y\). Due to the convexity and monotonicity of \(\varrho [\cdot ]\), we have

$$\begin{aligned} \varrho _S [\alpha X + (1-\alpha )Y]&= \varrho [\max _{c\in S} c^\top (\alpha X + (1-\alpha )Y)] \\&\le \varrho [\alpha \max _{c\in S} c^\top X + (1-\alpha )\max _{c\in S} c^\top Y] \\&\le \alpha \varrho [\max _{c\in S} c^\top X] + (1-\alpha )\varrho [\max _{c\in S} c^\top Y]\\&= \alpha \varrho _S[X] + (1-\alpha )\varrho _S[Y]. \end{aligned}$$

Thus, the convexity axiom is satisfied.

Given a random vector \(X \in \mathcal {Z}\) and a constant \(a \in \mathbb {R}\), it follows:

$$\begin{aligned} \varrho _S[X]&= \varrho [\max _{c\in S} c^\top (X+a\mathbb {I})] = \varrho [\max _{c\in S} (c^\top X + a c^\top \mathbb {I})]\\&=\varrho [\max _{c\in S} c^\top X + a] = \varrho [X_S] + a. \end{aligned}$$

Positive homogeneity follows in a straightforward manner. \(\square \)

If the set S is a singleton, then the systemic risk measure has a simple form. For any vector \(c\in S^m_+\), we shall denote it \(\varrho _c[X]=\varrho [c^\top X]\). We observe the following properties of the aggregation by a single linear scalarization vector.

Proposition 3.1

Given a coherent risk measure \(\varrho : \mathcal {Z}\rightarrow \mathbb {R}\cup \{+\infty \}\) and a scalarization vector \(c\in S^m_+\), for any random vector \(X \in \mathcal {Z}\) the risk \(\varrho [c^\top X]\) does not exceed the maximal risk of its components measured by \(\varrho [\cdot ].\) Furthermore, the relation between aggregation methods is \( \varrho [c^\top X] \le \sum _{i=1}^m c_i \varrho [X_i].\)

Proof

The dual representation implies the following:

$$\begin{aligned} \varrho [c^\top X]&= \sup _{\xi \in \mathcal {A}_\varrho } \int _\varOmega \xi (\omega ) \sum _{i=1}^m c_i X_i(\omega ) P(d\omega ) = \sup _{\xi \in \mathcal {A}_\varrho } \sum _{i=1}^m c_i \int _\varOmega \xi (\omega ) X_i(\omega ) P(d\omega ) \\&\le \sum _{i=1}^m \sup _{\xi \in \mathcal {A}_\varrho } c_i \int _\varOmega \xi (\omega ) X_i(\omega ) P(d\omega ) = \sum _{i=1}^m c_i \varrho [X_i] \le \max _{1\le i\le m }\varrho [X_i]. \end{aligned}$$

The second claim follows by convexity. \(\square \)

We also show the following useful result, which implies that we can use statistical methods to estimate the systemic risk measure \(\varrho _S[X].\)

Proposition 3.2

If \(\varrho :\mathcal {L}_p(\varOmega ,\mathcal {F}, P)\rightarrow \mathbb {R}\cup \{ +\infty \}\) is a law-invariant risk measure, then for any closed set \(S\subset S^m_+\), the systemic risk measure \(\varrho _S[X]=\varrho [X_S]\) is law-invariant.

Proof

It is sufficient to show that for two random vectors X and Y, which have the same distribution, the respective random variables \(X_S\) and \(Y_S\) have the same distribution. We observe that the random variables \(c^\top X\) and \(c^\top Y\) have the same distribution for any fixed vector \(c\in \mathbb {R}^m\). Hence, for any \(r\in \mathbb {R}\), the following relations hold:

$$\begin{aligned} P(X_S\le r)&= P(c^\top X\le r,\; \forall c\in S) = P(c^\top Y\le r,\; \forall c\in S) = P(Y_S\le r). \end{aligned}$$

This shows the equality of the distribution functions of \(X_S\) and \(Y_S\). \(\square \)

Using the dual representation (1) of the coherent risk measures \(\varrho \) for scalar-valued random variables, we obtain the following relation:

$$\begin{aligned} \begin{aligned} \varrho [c^\top X]&= \sup _{\xi \in \mathcal {A}_\varrho } \int _\varOmega \xi (\omega ) c^\top X(\omega ) P(d\omega ) = \sup _{\zeta \in \tilde{\mathcal {A}}} \int _\varOmega \langle \zeta (\omega ), X(\omega ) \rangle P(d\omega ), \quad \end{aligned} \end{aligned}$$

with \(\tilde{\mathcal {A}}= \{ \xi c: \xi \in \mathcal {A}_\varrho \}.\)

The multifunction \(\omega \mapsto \arg \max _{c \in S} c^\top X(\omega )\) is measurable with respect to \(\mathcal {F}\) and it has non-empty, closed, and convex images for every \(\omega \). Hence, a measurable selection \(\nu _X(\omega ) \in \arg \max _{c \in S} c^\top X(\omega )\) exists by the Kuratowski-Ryll-Nardzewski theorem. We use the notation \(\nu _X (\cdot ) \in S\) for any such selection.

$$\begin{aligned} \begin{aligned} \varrho [\max _{c\in S} c^\top X]&= \sup _{\xi \in \mathcal {A}_\varrho } \int _\varOmega \xi (\omega ) \max _{c\in S} c^\top X(\omega ) P(d\omega )\nonumber = \sup _{\xi \in \mathcal {A}_\varrho } \int _\varOmega \xi (\omega ) [\nu _X^\top X](\omega ) P(d\omega )\nonumber \\&= \sup _{\zeta \in \tilde{\mathcal {A}}^\prime } \int _\varOmega \langle \zeta (\omega ), X(\omega ) \rangle P(d\omega ), \quad \text {with }\nonumber \tilde{\mathcal {A}}^\prime = \{ \xi \nu _X: \xi \in \mathcal {A}_\varrho \}. \end{aligned} \end{aligned}$$

Notice that the derived representations have the form of the dual representation in (7). However, we have not established that \(\tilde{\mathcal {A}}\) and \(\tilde{\mathcal {A}}^\prime \) coincide with the domains of the conjugate functions.

3.3 Systemic Risk Measures Obtained Via Nonlinear Aggregation of Risks

The second aggregation method that falls within the scope of our axiomatic framework is a nonlinear aggregation of risks. This class of risk measures cannot be obtained within the framework of aggregations by non-linear functions and does not fit the axiomatic approaches in [7] or in [5]. Furthermore, we shall see that this method of evaluating systemic risk allows to maintain fairness between the system’s participants whenever this is a desired property.

We define \(\varOmega _m = \{1,\ldots ,m\}\) and consider a probability space \((\varOmega _m,\mathcal {F}_c, c)\), where \(c\in S^m_+\) and \(\mathcal {F}_c\) contains all subsets of \(\varOmega _m\). We view c as a probability mass function of the space \(\varOmega _m\). Given a collection of m univariate measures of risk \(\varrho _i:\mathcal {L}_p(\varOmega ,\mathcal {F}, P)\rightarrow \mathbb {R}\), \(i=1,\ldots , m,\) we associate with every random vector \(X \in \mathcal {Z}= \mathcal {L}_p(\varOmega , \mathcal {F}, P; \mathbb {R}^m)\), a random variable \(X_R\) on the space \(\varOmega _m\) as follows:

$$\begin{aligned} X_R(i) = \varrho _i[X_i],\quad i=1,\ldots m. \end{aligned}$$

Choosing a scalar coherent measure of risk \(\varrho _0:\mathcal {L}_\infty (\varOmega _m,\mathcal {F}_c, c)\rightarrow \mathbb {R}\), the measure of systemic risk \(\varrho _\textrm{sys}:\mathcal {L}_p(\varOmega , \mathcal {F}, P; \mathbb {R}^m)\rightarrow \mathbb {R}\) is defined as follows:

$$\begin{aligned} \varrho _\textrm{sys} [X] = \varrho _0[X_R]. \end{aligned}$$
(11)

This is a nonlinear aggregation of the individual risks \(\varrho _i[X_i]\). Hence, this approach falls within the category of methods that evaluate the risk of each component first and then aggregate their values. The measure \(\varrho _\textrm{sys}[X]\) satisfies the axioms postulated for systemic risk measures.

Theorem 3.4

Suppose the univariate measures of risk \(\varrho _i:\mathcal {L}_p(\varOmega ,\mathcal {F}, P)\rightarrow \mathbb {R}\), \(i=0,\ldots , m\) are coherent and let \(\varrho _\textrm{sys}[\cdot ]\) be defined as in (11). Then \(\varrho _\textrm{sys}[\cdot ]\) satisfies properties (A1)–(A4).

Proof

(i) Given any \(X, Y \in \mathcal {Z}\) and \(\alpha \in (0,1)\), we consider the random vector \(Z = \alpha X + (1-\alpha )Y\). It follows that \(Z_R(i) = \varrho _i[Z_i] \le \alpha \varrho _i[X_i] + (1-\alpha )\varrho _i[Y_i]\) by the convexity of \(\varrho _i[\cdot ]\) for all \(i = 1, \ldots , m\). Now define a random variable \(Z^\prime \) on \(\varOmega _m\) by setting \(Z^\prime (i) = \alpha \varrho _i[X_i] + (1-\alpha )\varrho _i[Y_i]\) for \(i=1,\ldots ,m\). It follows that \(Z_R\le Z^\prime \). Then, using the monotonicity and convexity of \(\varrho _0,\) we obtain

$$\begin{aligned} \varrho _0[Z_R]\le \varrho _0[Z^\prime ] \le \alpha \varrho _0[X_R] + (1-\alpha )\varrho _0[Y_R]. \end{aligned}$$

Hence \(\varrho _\textrm{sys}[\alpha X + (1-\alpha )Y] \le \alpha \varrho _\textrm{sys}[X] + (1-\alpha )\varrho _\textrm{sys}[Y]\).

(ii) Suppose the vectors \(X, Y \in \mathcal {Z}\) satisfy \(X\le Y\) a.s. This implies that \(X_i \le Y_i\) a.s. and, hence, \(\varrho _i[X_i] \le \varrho _i[Y_i]\) for all \(i = 1, \ldots , m\) by the monotonicity property of \(\varrho _i\). This further implies that \(X_R\le Y_R\), entailing that \(\varrho _0[X_R]\le \varrho _0[Y_R]\). Thus (A2) is satisfied.

(iii) Given a random vector \(X \in \mathcal {Z}\), \(t > 0\), we have

$$\begin{aligned} \varrho _\textrm{sys}[tX] = \varrho _0[(tX)_R] = \varrho _0[t (X_R)]=t\varrho _0[X_R]. \end{aligned}$$

where we have used the positive homogeneity of \(\varrho _i[\cdot ]\) for all \(i=0,1, \ldots , m\).

(iv) Given a random vector \(X \in \mathcal {Z}\) and a real constant a, we have \((X+a\mathbb {I})_R (i) = \varrho _i[X_i+a]= \varrho _i[X_i]+a\). Hence \(\varrho _0[(X+a\mathbb {I})_R]= \varrho _0[X_R] + a.\) This shows property (A4). \(\square \)

Remark 3.1

Note that we can define the systemic measure of risk by using the maximum of a family of coherent univariate risk measures on \(\mathcal {L}_{\infty }(\varOmega _m,\mathcal {F}_c, c)\) instead of a single measure \(\varrho _0\), i.e. \(\varrho _\textrm{sys} [X] = \max _{\alpha \in J}\varrho _\alpha [X_R],\) where J stands for a family of univariate coherent measures of risk on \(\mathcal {L}_{\infty }(\varOmega _m,\mathcal {F}_c, c)\). Similarly, a weighted linear combination could replace \(\varrho _0\) in definition (11). Furthermore, recall that expectation is also a coherent measure of risk and could be used instead of \(\varrho _0\). Alternatively, we could use \(\varrho _i[X_i] =\mathbb {E}[X_i]\) for \(i=1,\ldots , m\) and a coherent univariate measure of risk \(\varrho _0\) in the construction of \(\varrho _\textrm{sys}.\) All these constructions satisfy axioms (A1)–(A4).

4 Relations to Multivariate Measures of Systemic Risk

In this section, we compare the proposed risk measures with the multivariate notions mentioned in Sect. 2.2.

Consider the Multivariate Value-at-Risk (\(\textsf{mVaR}\)) given by the set of p-efficient points of the respective probability distribution. For a random vector \(X \in \mathcal {Z}\), any scalarization vector \(c \in S^m_+\) provides a p-efficient point as the solution of the following problem:

$$\begin{aligned} \textsf{mVaR}_{p}(X)&= \textrm{argmin} \big \{ c^\top z ~|~ P(X \le z) \ge p \big \}. \end{aligned}$$
(12)

For every \(c \ge 0\), the solution set of problem (12) is nonempty and contains a p-efficient point corresponding to the vector c (c.f. [9]).

Proposition 4.1

Given a random vector \(X \in \mathcal {Z}\) with a continuous distribution

the following relations hold:

  1. (i)

    For \(\textsf{mAVaR}_p(X)\) defined in (4) with an increasing aggregation function \(\varLambda \), \(\textsf{mAVaR}_p(X) \ge \textsf{AVaR}_p(\varLambda (X))\) holds. In particular, \(\textsf{mAVaR}_p(X)>\varrho _S[X]\) when \(\varrho _S= \textsf{AVaR}_p[X_S]\) and \(\varLambda (X) = \max _{c\in S} c^\top X\) for any closed convex set \(S\subset S^m_+\) containing \(c>0.\)

  2. (ii)

    For any \(c \in S^m_+\), the scalarization of \(\textsf{vmAVaR}\) provides larger values than \(\textsf{AVaR}_p(c^\top X)\).

Proof

(i) Consider the \(\textsf{mAVaR}\) defined in (4). If \(X \in Z_p\) a.s., then for any p-efficient point v of \(Z_p\), we have \(P(\varLambda (X) \le \varLambda (v)) \ge p\) by the definition of a p-efficient point and the monotonicity of the function \(\varLambda (X)\). Denote the p-quantile of \(\varLambda (X)\) by \(\eta (p)\) and observe that \(\eta (p) \le \varLambda (v).\) Under the assumptions of the theorem, the cumulative distribution function of \(\varLambda (X)\) is continuous at \(\eta (p)\) and hence, we obtain

$$\begin{aligned} \mathbb {E}[\varLambda (X) ~|~ X \in Z_p] \ge \mathbb {E}[\varLambda (X) ~|~ \varLambda (X) \ge \eta (p) ] = \textsf{AVaR}_p(\varLambda (X)). \end{aligned}$$

Since the aggregation defined in (10) is strictly increasing when it contains a vector c with all components being positive, we obtain in particular that for any such \(S\subset S^m_+\), \(\textsf{mAVaR}_p(X) \ge \varrho _S[X],\) when the systemic measure \(\varrho _S\) uses the Average Value-at-Risk at level p.

(ii) Consider \(\textsf{vmAVaR}\) defined in (5). It is calculated as one of the Pareto-efficient optimal solutions of the following optimization problem:

$$\begin{aligned} \textsf{vmAVaR}_p(X) = \min \Big \{ v + \frac{1}{p} \mathbb {E}[(X-v)_+]: v\in \textsf{mVaR}_p(X)\Big \}. \end{aligned}$$

A feasible solution of a convex multiobjective optimization problem is Pareto-efficient if and only if it is an optimal solution of the scalarized problem with an objective function which is a convex combination of the multiple objectives, that is it is also optimal for the following problem with \(c \in S^m_+\):

$$\begin{aligned} c^\top \textsf{vmAVaR}_p(X) = \min \Big \{ c^\top v + \frac{1}{p} \mathbb {E}[c^\top (X-v)_+]: v\in \textsf{mVaR}_p(X)\Big \}. \end{aligned}$$

For any vector \(x\in \mathbb {R}^m\), the following inequality holds:

$$\begin{aligned} c^\top (x-v)_+ = \sum _{i=1}^m c_i \max \{ 0, x_i - v_i \} \ge \max \{0, c^\top (x-v) \}, \end{aligned}$$

due to the convexity of the max function. It follows that:

$$\begin{aligned} c^\top \textsf{vmAVaR}_p(X)= & {} \inf _{v\in \textsf{mVaR}_p(X)} \Big \{ c^\top v + \frac{1}{p} \mathbb {E}[c^\top (X-v)_+] \Big \}\\\ge & {} \inf _{v\in \textsf{mVaR}_p(X)} \Big \{ c^\top v + \frac{1}{p} \mathbb {E}[(c^\top X-c^\top v)_+] \Big \}. \end{aligned}$$

The relation \(P(X \le v) \ge p\) implies that \(P(c^\top X \le c^\top v) \ge p\). Denoting the p-quantile of \(c^\top X\) as \(\eta _X(p; c)\), it follows that: \( \eta _X(p;c) \le c^\top v, \) i.e. \(\eta _X(p;c)\) is not larger than \(c^\top v(c)\). Therefore:

$$\begin{aligned} c^\top \textsf{vmAVaR}_p(X)\ge & {} \inf _{v\in \textsf{mVaR}_p(X)} \Big \{ c^\top v + \frac{1}{p} \mathbb {E}[(c^\top X-c^\top v)_+] \Big \} \\\ge & {} \eta _X(p;c) + \frac{1}{p} \mathbb {E}[(c^\top X- \eta _X(p;c))_+]= \textsf{AVaR}_p(c^\top X). \end{aligned}$$

Hence, the scalarization of \(\textsf{vmAVaR}\) results in a larger value than the Average Value-at-Risk of the scalarized random vector; the latter is one of the systemic risk measures proposed in Sect. 3.\(\square \)

We do not pursue further investigation on set-valued systemic measures of risk as their calculation is numerically very expensive.

5 Distributed Two-Stage Risk-Averse Stochastic Optimization

In this section, we consider a two-stage stochastic programming problem formulated for a system of multiple agents cooperating to complete a common task. The systemic risk is associated with the operational risk of all agents, as well as with the successful completion of the system’s task. Our goal is to address the risk optimization using the measures proposed in Sect. 3. This type of problem is typical in robotics, but it is also relevant to the operation of energy systems, where power generation units jointly cover the energy demand in a certain area.

5.1 Two-Stage Stochastic Optimization Problem with Systemic Risk

Let \(x\in \mathbb {R}^{n_1}\) be the first-stage decision variable and \(f: \mathbb {R}^{n_1} \rightarrow \mathbb {R}\) be a continuous function representing the cost associated with x. The deterministic constraints imposed on x are represented by a closed convex set \(\mathcal {X}\subset \mathbb {R}^{n_1}\). The second-stage problem involves uncertainty given by some random data \(\xi \) with N realizations, denoted as \(\xi ^s\), and is formulated for a system of m agents. We assume that every agent i has a cost function \(g_i^s(y^s_i, z^s)= \langle q_i^s,y_i^s\rangle + \langle \tilde{q}_i^s,z^s\rangle \), which is associated with its operation and depends on two components: local decision variable \(y^s_i \in \mathbb {R}^{n_2}\) of i and global decision variables \(z^s \in \mathbb {R}^{n_3}\) pertaining to the state of the whole system. The constraints imposed on \(y^s_i\) and \(z^s\) are represented by closed convex sets \(Y^s_i \subset \mathbb {R}^{n_2}\) and \(\mathcal {G}^s \subset \mathbb {R}^{n_3}\), respectively. We assume that the local decisions \(y_i^s\) and the global decision variable \(z^s\) are coupled by linear constraints using matrices \(A^s_i \in \mathbb {R}^{d \times n_2}\) and \(B^s_i \in \mathbb {R}^{d \times n_3}\), \(i=1,2,\ldots ,m\).

We focus on the evaluation of the risk associated with the local operation of each agent, as well as with the global state of the system. The latter may reflect, for example, the state of completion of a common task. We shall use a class of linear scalarization vectors proposed in Sect. 3.2. Suppose \(S \subset S^m_+\) is a fixed set of scalarizations. We can evaluate the risk of the system as \(\varrho [\max _{c\in S} \sum _{i=1}^m c_i g_i(y_i,z)]\) using a coherent measure of risk \(\varrho [\cdot ]\). The risk-averse two-stage problem with a systemic measure of risk takes on the form:

$$\begin{aligned} \begin{aligned} \min _{x\in \mathcal {X}} ~&~ f(x) + \varrho [Q(x;\xi )], \end{aligned} \end{aligned}$$
(13)

where \(Q(x; \xi )\) has realizations \(Q(x; \xi ^s)\) defined as the optimal value of the second-stage problem for the realization of random data \(\xi ^s\). The second-stage problem for any \(s=1,\ldots N\) has the form:

$$\begin{aligned} Q(x;\xi ^s) = \min _{y^s, z^s} ~&~ \max _{c\in S} \sum _{i=1}^m c_i g_i^s(y^s_i, z^s) \end{aligned}$$
(14)
$$\begin{aligned} \text {s.t.} ~&~ \sum _{i=1}^m A^s_i y^s_i + \sum _{i=1}^m B^s_i z^s = b^s, \end{aligned}$$
(15)
$$\begin{aligned} ~&~ T^s_i x + W^s_i y^s_i = h^s_i, \quad i=1,2,\ldots ,m, \end{aligned}$$
(16)
$$\begin{aligned} ~&~ z^s \in \mathcal {G}^s,\quad y^s_i \in Y^s_i, \quad i = 1,2,\ldots ,m. \end{aligned}$$
(17)

Without loss of generality, we can assume that \(b^s=0\). For example, we can extend \(z^s\) by an additional component, which is then constraint to 1 in the set \(\mathcal {G}^s\) and the matrices \(B_i\) are extended by the additional column \(-\frac{1}{m} b^s.\)

Our goal is to develop a method to decompose the global second-stage problem (14)–(17) into m subproblems that can be solved in a distributed way to allow for independent operation under uncertainty of the agents in the system keeping the exchange of information limited.

Note that the application of the systemic measure of risk via the proposed non-linear aggregation in the two-stage problem does not fit formulation (14)–(17) and requires a dedicated analysis, which is a subject of another work.

5.2 Numerical Method

We use decomposition ideas based on the risk-averse multicut methods proposed in [16, 26] and multicut methods used in risk-neutral stochastic programming. We iteratively construct a piecewise linear approximation of the optimal value of the second-stage problem and approximate the measure of risk by subgradient inequalities based on the dual representation of coherent risk measures \(\varrho [Q] = \sup _{\mu \in \mathcal {A}_\varrho } \langle \mu , Q \rangle \). Specifically, at iteration t, we can approximate the value of \(\varrho ^t[Q] = \max _{0 \le \tau \le t-1} \langle Q, \mu ^\tau \rangle \), with \(\mu ^\tau \) being the probability measures from \(\mathcal {A}_\varrho \) calculated as subgradients of \(\varrho [\cdot ]\) at previous iterations. The description of the set \(\mathcal {A}_\varrho \) for many coherent risk measures is available in the literature, see, e.g., [35]. The initial \(\mu _0\) can be chosen as a nominal probability distribution, and subgradients \(\mu ^{\tau }\) are updated as shown in [16].

We introduce an auxiliary variable \(\eta \in \mathbb {R}\), which contains the lower approximation of the measure of risk. Let Q be a random variable with realizations \(\vartheta ^s\) which represent the lower approximations of the function \(Q(\cdot ,\xi ^s)\). We assume we know lower bounds \(\vartheta ^s_{\min }\) of the second-stage problem. Then, we can formulate a master problem of the following form:

$$\begin{aligned} \begin{aligned} \hspace{-3ex} \min _{x, \eta , \vartheta } ~&f(x) + \eta \\ \text {s.t.} ~&\eta \ge \langle \mu ^{\tau }, Q \rangle , \quad \tau = 0,1, \ldots , t-1, \\ ~&\vartheta ^{s} \ge \hat{\vartheta }^{s,\tau } + \langle v^{s,\tau }, x - x^\tau \rangle , \quad \tau = 1, \ldots , t-1, ~ s = 1, \ldots , N,\\ ~&\vartheta ^s \ge \vartheta ^s_{\min }, \quad s=1,\ldots , N, \\ ~&x \in \mathcal {X}. \end{aligned} \end{aligned}$$
(18)

The optimal value \(\hat{\eta }^t\) contains the value of the approximation of \(\varrho [Q(\hat{x}^t; \xi )],\) where \(\hat{x}^t\) is the solution of the master problem at iteration t. The value \(\hat{\vartheta }^{s,\tau }= Q(x^\tau ;\xi ^s)\) is the optimal value of the second-stage problem in scenario s at iteration \(\tau \) and \(v^{s,\tau }\) is the subgradient calculated using the optimal dual variables of the constraints \(T^s_i x + W^s_i y^s_i = h^s_i\), \(i=1,2,\ldots ,m\). If the set \(\mathcal {X}\) is compact, then the convergence of the multicut method follows from the general convergence properties of the cutting plane method. Moreover, if the set \(\mathcal {A}_\varrho \) is polyhedral, then the method converges in finitely many iterations [26].

Consider the optimal value of the second-stage problem \(Q(x;\xi ^s)\) for a fixed first-stage decision variable \(x \in \mathbb {R}^{n_1}\). We deal with the objective by introducing an auxiliary variable \(r^s\in \mathbb {R}\), which will identify the maximum after adding the following constraint for each scenario s

$$\begin{aligned} \sum _{i=1}^m c_i \big (\langle q_i^s,y_i^s\rangle + \langle \tilde{q}_i^s,z^s\rangle \big ) \le r^s, \quad c\in S. \end{aligned}$$
(19)

This is a semi-infinite constraint when the set S is a general closed convex subset of \(S^m_+\). However, if S is polyhedral, for example \(S=S^m_+\), then \(\max _{c\in S} \sum _{i=1}^m c_i g_i^s(y^s_i, z^s)\) is attained at one of the vertices of S, and, hence, (19) reduces to finitely many inequalities. We shall treat this case and shall index the constraints (19) by \(j\in J.\) Additionally, we introduce slack variables \(w^s_j\) to each of the inequality constraints

$$\begin{aligned} \sum _{i=1}^m c_i^j \big (\langle q_i^s,y_i^s\rangle + \langle \tilde{q}_i^s,z^s\rangle \big ) + w^s_j - r^s =0, \quad j\in J. \end{aligned}$$

The need to do so will become apparent later. All slack variables are gathered in a vector \(w^s\in \mathbb {R}^{|J|}.\)

We need to decompose the coupling constraints and distribute the common decision variables \(z^s\), \(r^s\) and \(w^s\). We do this by introducing copies of them for every i, \(i = 1,2,\ldots ,m\), and enforcing all copies to be equal to ensure the uniqueness of the decision variables. Let us stack the copied variables \(z^s_i, r^s_i, w^s_i\) in one column vector denoted \(\mathsf{{z}}^s_i\in \mathbb {R}^{n_3+|J|+1}\) to de-clatter notation. Representing the system of agents by a network in every scenario, it is sufficient to create a spanning tree for that network and associate a constraint with each arc in the spanning tree. For every connected graph with m nodes, a spanning tree exists containing exactly \(m-1\) arcs. We denote the aforementioned set of arcs in scenario s by \(\mathbb {A}_s\). Hence, we need constraints \(\mathsf{{z}}^s_i=\mathsf{{z}}^s_\ell \) for all \((i,\ell )\in \mathbb {A}_s\). Alternatively, for every agent \(i=1,\ldots ,m\), we can identify set of constraints

$$\begin{aligned} k(i,s)=\big \{k: [A^s_i]_k \ne \textbf{0} \text { or }[B^s_i]_k \ne \textbf{0},\; 1\le k\le d\big \}, \end{aligned}$$

where \([A_i^s]_k\) and \([B_i^s]_k\) denote the k-th row of matrix \(A_i^s\) and \(B_i^s\), respectively, and \(\textbf{0}\) stands for a zero vector of proper dimension. We identify the set of agents associated with agent i in scenario s as follows

$$\begin{aligned} \mathcal {N}^s_i = \big \{\ell : [A^s_\ell ]_k \ne \textbf{0} \text { or } [B^s_\ell ]_k \ne \textbf{0} \text { for some } k\in k(i,s)\big \}. \end{aligned}$$

Then we include constraints \(\mathsf{{z}}^s_i=\mathsf{{z}}^s_\ell \) for all \(i=1,\ldots , m\) and for all \(\ell \in \mathcal {N}_i^s.\) This option results in more constraints than when using the sets \(\mathbb {A}_s\), but it provides more flexibility in the information exchange among agents.

The problem (14)–(17) with a finite set of scalarizations J takes on the form:

$$\begin{aligned} Q(x;\xi ^s) = \min _{y^s, z^s, r^s, w^s} ~&~ \sum _{i=1}^m r^s_i \end{aligned}$$
(20)
$$\begin{aligned} \text {s.t.} ~&~ \sum _{i=1}^m c_i^j\big (g_i^s(y_i^s,z_i^s) + w^s_{ij}- r^s_i\big )=0, \quad j\in J, \end{aligned}$$
(21)
$$\begin{aligned} ~&~ \sum _{i=1}^m \big (A^s_i y^s_i + B^s_i z_i^s\big ) =0, \end{aligned}$$
(22)
$$\begin{aligned} ~&~ \mathsf{{z}}^s_i = \mathsf{{z}}^s_{\ell }, \quad (i,\ell )\in \mathbb {A}_s, \end{aligned}$$
(23)
$$\begin{aligned} ~&~ y^s_i \in \mathcal {Y}^s_i, z^s_i \in \mathcal {G}^s, \quad i = 1,2,\ldots ,m, \end{aligned}$$
(24)

where \(\mathcal {Y}^s_i=\big \{ y^s_i \in Y^s_i:\; T^s_i x + W^s_i y^s_i = h^s_i\big \},\) \(i=1,\ldots , m\). Note that the problem (20)–(24) for each fixed scenario is a monotropic optimization problem with a linear objective function and separable constraints. The Lagrangian of problem (20)–(24) decomposes into m components, each one associated with each agent. We propose to apply the distributed version of the Augmented Lagrangian Method as described and analyzed in [6]. That method operates with equality constraints and this is the reason for introducing slack variables.

We associate Lagrange multipliers \(\nu ^s\) with constraint (21), \(\lambda ^s\) with (22) and \(\gamma ^s\) with (23). The global augmented Lagrangian \(\varLambda ^s_{\kappa _0}\) associated with problem (20)–(24) with a penalty parameter \(\kappa _0 > 0\) is defined as

$$\begin{aligned} \begin{aligned}&\varLambda ^s_{\kappa _0} (y^s, z^s, r^s, w^s, \lambda ^s, \gamma ^s, \nu ^s) = \sum _{i=1}^m r^s_i + \sum _{j\in J} \sum _{i=1}^m \nu ^s_j c_i^j \big (g^s_i(y^s_i, z^s_i) +w^s_{ij}- r^s_i\big )\\&\quad + \langle \lambda ^s, \sum _{i=1}^m A^s_i y^s_i +{\sum _{i=1}^m B^s_i z^s_i} \rangle + \sum _{(i,\ell )\in \mathbb {A}_s} \langle \gamma ^s_{i,\ell }, \mathsf{{z}}^s_i -\mathsf{{z}}^s_{\ell }\rangle + \frac{\kappa _0}{2}\!\! \sum _{(i,\ell )\in \mathbb {A}_s}\!\! \Vert \mathsf{{z}}^s_i -\mathsf{{z}}^s_{\ell }\Vert ^2 \\&\quad + \frac{\kappa _0}{2} \bigg \Vert \sum _{i=1}^m A^s_i y^s_i +{\sum _{i=1}^m B^s_i z^s_i}\bigg \Vert ^2 + \frac{\kappa _0}{2} \sum _{j\in J} \Big ( \sum _{i=1}^m c_i^j\big ( g_i^s(y^s_i, z^s_i)+ w^s_{ij}- r^s_i\big ) \Bigg )^2. \end{aligned} \end{aligned}$$

The local augmented Lagrangian \(\varLambda ^{s,i}_{\kappa _0}\) for each \(i = 1,2,\ldots ,m\) is defined as

$$\begin{aligned}{} & {} \varLambda ^{s,i}_{\kappa _0} (y^s_i, \bar{y}^{s}, \mathsf{{z}}^s_i, \bar{\mathsf{{z}}}^{s}, \lambda ^s, \gamma ^s, \nu ^s) = r^s_i + \sum _{j\in J} \nu ^s_j c_i^j \big (g_i^s(y_i^s,z_i^s) + w^s_{ij} -r^s_i\big ) \\{} & {} \quad + \langle \lambda ^s, A^s_i y^s_i + {B^s_i z^s_i} \rangle + \sum _{(i,\ell ), (\ell ',i)\in \mathbb {A}_s}\langle \gamma ^s_{i,\ell } - \gamma ^s_{\ell '\!,i}, \mathsf{{z}}^s_i \rangle \\{} & {} \quad + \frac{\kappa _0}{2} \bigg \Vert A^s_i y^s_i + {B^s_i z^s_i} + \sum _{\begin{array}{c} \ell =1\\ \ell \ne i \end{array}}^m \Big ( A^s_\ell \bar{y}^{s}_\ell + {B^s_\ell \bar{z}^{s}_\ell } \Big ) - b^s \bigg \Vert ^2 + \frac{\kappa _0}{2}\!\! \sum _{(i,\ell )\in \mathbb {A}_s}\!\! \Vert \mathsf{{z}}^s_i -\mathsf{{z}}^s_{\ell }\Vert ^2 \\{} & {} \quad + \frac{\kappa _0}{2} \sum _{j\in J} \Bigg (c_i^j \big (g^s_i(y^s_i, z^s_i) + w^s_{ij}-r^s_i\big ) + \sum _{\begin{array}{c} \ell =1\\ \ell \ne i \end{array}}^m c_\ell ^j \big ( g_\ell ^s(\bar{y}^s_\ell , \bar{z}^s_\ell ) + \bar{w}^s_{\ell j} - \bar{r}^s_\ell \big )\Bigg )^2. \end{aligned}$$

It is shown in [6] that the Accelerated Distributed Augmented Lagrangian (ADAL) converges to the optimal solution \((\hat{y}^s, \hat{z}^s,\hat{r}^s, \hat{w}^s)\) of the global problem (20)–(24) provided that the feasible sets for each i are nonempty convex compact sets. Hence, we choose sufficiently large compact convex sets \(\mathcal {G}_r^s\) and \(\mathcal {G}_w^s\), so that the requirement \(\mathsf{{z}}^s_i\in \mathcal {G}^s\times \mathcal {G}_r^s\times \mathcal {G}_w^s\) does not affect the optimal values of the variables \(\hat{r}^s\) and \(\hat{w}^s.\)

Once the second-stage problem is solved for every scenario s, we calculate the optimal value of the objective function \(\hat{Q}(x^t;\xi ^s) = \hat{r}^s_1\) and construct objective cuts for every scenario s:

$$\begin{aligned} Q(x; \xi ^s) \ge \hat{Q}(x^t; \xi ^s) + \langle v^{s,t}, x - x^t \rangle , \quad s=1,2,\ldots , N. \end{aligned}$$

Here \(v^{s,t}\) is the subgradient of \(Q(x;\xi ^s)\) at \(x^t\) in scenario s, which can be calculated as \(\sum _{i=1}^m -(T^s_i)^\top \pi ^s_i\), where \(\pi ^s_i\) is the optimal Lagrange multiplier associated with constraint (16) in subproblem i. We assume that the two-stage problem has a relatively complete recourse and feasibility cuts are not needed. The proposed method for solving the two-stage problem works as follows.

  • Distributed Risk-Averse Multicut Method

  • Step 0. Set \(t = 1\) and identify initial \(\mu ^0 \in \mathcal {A}_\varrho \).

  • Step 1. Solve the master problem (18) and denote its optimal solution by \((x^t, \eta ^t, \vartheta ^t)\).

  • Step 2. For every scenario \(s = 1, \ldots , N\) apply the following method.

    • (a) Set \(k = 1\) and define initial Lagrange multipliers \(\lambda ^{s,1}\), \(\gamma ^{s,1}\), \(\nu ^{s,1}\) and initial primal variables \(y^{s,1}, \mathsf{{z}}^{s,1}\).

    • (b) For all \(i=1,2,\ldots ,m\), calculate \((\hat{y}_i^{s,k}, \hat{\mathsf{{z}}}_i^{s,k})\) as the optimal solution of subproblem i:

      $$\begin{aligned} \min _{y_i^s, \mathsf{{z}}^s_i} ~&~ \varLambda ^{s,i}_{\kappa _0} (y_i^s, y^{s,k}, \mathsf{{z}}^s_i, \mathsf{{z}}^{s,k}, \lambda ^{s,k}, \gamma ^{s,k}, \nu ^{s,k}) \nonumber \\ \text {s.t.} ~&~ y_i^s \in \mathcal {Y}^s_i, ~ \mathsf{{z}}^s_i\in \mathcal {G}^s\times \mathcal {G}_r^s\times \mathcal {G}_w^s. \end{aligned}$$
    • (c) Update the primal variables of subproblem i:

      $$\begin{aligned} y_i^{s, k+1} = y_i^{s,k} + \kappa _s (\hat{y}_i^{s,k} - y_i^{s,k}), \\ \mathsf{{z}}_i^{s, k+1} = \mathsf{{z}}_i^{s,k} + \kappa _s (\hat{\mathsf{{z}}}_i^{s,k} - \mathsf{{z}}_i^{s,k}). \end{aligned}$$
    • (d) If constraints (21), (22), and (23) are not satisfied at \((y_i^{s, k+1}, \mathsf{{z}}_i^{s, k+1})\), then update their respective Lagrange multipliers as follows:

      $$\begin{aligned} \lambda ^{s,k+1}&= \lambda ^{s,k} + \kappa _0 \kappa _s \Big ( \sum _{i=1}^m A_i^s y_i^{s,k+1} + \sum _{i=1}^m B_i^s z_i^{s,k+1} \Big ), \\ \gamma _{i,\ell }^{s,k+1}&= \gamma _{i,\ell }^{s,k} + \kappa _0 \kappa _s (\mathsf{{z}}_i^{s,k+1} - \mathsf{{z}}_\ell ^{s,k+1}), \quad (i,\ell ) \in \mathbb {A}_s,\\ \nu _j^{s,k+1}&= \nu _j^{s,k} + \kappa _0 \kappa _s \sum _{i=1}^m c_i^j\big (g_i^s(y_i^{s,k+1},z_i^{s,k+1}) + w^{s,k+1}_{ij}- r^{s,k+1}_i\big ), \; j\in J. \end{aligned}$$

      Increase k by one and return to Step (b). If the constraints are satisfied, then calculate the following quantities:

      $$\begin{aligned} v^{s,t} = \sum _{i=1}^m (-T_i^s)^\top \pi _i^{s,k}, \quad \hat{\vartheta }^{s,t} = r^{s,k+1}_i, \end{aligned}$$

      where \(\pi _i^{s,k}\) is the optimal Lagrange multiplier associated with the constraint \(T^s_i x + W^s_i y^s_i = h^s_i\) in subproblem i.

  • Step 3. Calculate \(\varrho ^{t}= \varrho [\hat{\vartheta }^{t}]\) and \(\mu ^{t}\in \partial \varrho [\hat{\vartheta }^{t}]\).

  • Step 4. If \(\varrho ^{t}= \eta ^t\), stop; otherwise, increase t by one and go to Step 1.

Theorem 5.1

Assume that problem (13), (20)–(24) has a relatively complete recourse and the recourse function \(Q(x,\xi ^s)\) is finite for all \(x\in \mathcal {X}\) and for all \(s=1,\ldots ,N\). Let the the parameters \(\kappa _s\) satisfy \(0<\kappa _s<1/m\) and the sets \(\mathcal {X}, \mathcal {Y}^s_i, \mathcal {G}^s, \mathcal {G}_r^s, \mathcal {G}_w^s\) be compact convex sets for all \(s=1,\ldots ,N\) and \(i=1,\ldots ,m\). Then, the distributed risk-averse multicut method converges to an optimal solution of problem (13), (20)–(24).

Proof

Under the assumptions of the theorem, every subproblem in step 2b of the method is solvable, and assumptions (A1)–(A3) of Theorem 2 in [6] are satisfied. Additionally, the parameters \(\kappa _s\) satisfy the stepsize requirement of that theorem. Hence, for every scenario s, the algorithm in Step 2 either stops at an optimal solution of the dual problem to (20)–(24) or generates a sequence converging to it. Moreover, every sequence of primal variables \((\hat{y}_i^{s,k}, \hat{\mathsf{{z}}}_i^{s,k})\) generated by the method has an accumulation point and any such point is an optimal solution of problem (20)–(24). Therefore, after solving the second-stage problem for all scenarios, we obtain the optimal value and the necessary subgradients \(v^{s,t}\) of the recourse functions \(Q(x^t;\xi ^s).\) From that point on, the convergence of the entire method is due to the convergence of the risk-averse multicut method in [26]. \(\square \)

Remark 5.1

If the set S is a singleton, then the scalarization can be used directly in the objective of the second stage and constraints (21) as well as the variables \(r_j^s\) and \(w_j^s\) (\(j\in J\)) can be dropped. In that case, the parameter \(\kappa _s\) should satisfy \(0 < \kappa _s \le \big (\max _{1 \le i \le m} \mathcal {N}^s_i +1\big )^{-1}\!.\)

We assume that the first-stage problem takes a strategic decision and is solved by a central authority. If that problem needs to be solved by the collaborating system, this can be accomplished by introducing copies of the first stage decisions \((x,\eta )\) for each component i. Observe that the terms \(\hat{\vartheta }^{s,\tau } = r_i^{s,k+1}\) (k being the last iteration for solving the second-stage problem of scenario s) and \(\langle v^{s,\tau }, x-x^\tau \rangle \) are separable with respect to the system’s components. Hence, the first two inequality constraints in problem (18) can be distributed by using Lagrange multipliers and applying the ADAL method again.

5.3 Two-Stage Wireless Information Exchange Problem

In this section, we formulate a two-stage information exchange problem and implement the proposed numerical method to solve it. Consider a wireless communication network consisting of J robots; they form a team denoted by \(\mathcal {J}\). The robots collect information about the unknown environment and send the information to a set \(\mathcal {K}= \{ 1, \ldots , K_0 \}\) of active reporting points by multi-hop communication. The communication links between robots and reporting points are subject to the risk of information loss. Therefore, the objective is to choose the optimal set of active reporting points to minimize the risk associated with the amount of lost information and the proportion of the total information that was gathered but has not reached the reporting points.

The first-stage decision variables are binary variables \(z_k \in \{0, 1 \}\) for \(k \in \mathcal {K}\), where \(z_k = 1\) if the k-th location is selected as an active reporting point and \(z_k = 0\) otherwise. We assume that at most K reporting points can be active, where \(1\le K < K_0.\) The chosen set of reporting points should work well for different spatial configurations of the robots. We model the uncertainty of the spatial configuration and the amount of information to be observed by a set \(\mathcal {S}=\{ 1,\ldots , N\}\) of scenarios.

The robots deliver the observed data to the reporting points or their neighbors, who then deliver it to the active reporting points. The following second-stage decision variables are involved in the problem. The variables \(T_{ij}^s\) represent the amount of data sent by node i to node j in scenario \(s\in \mathcal {S}\). The amount of data observed but not sent by robot i in scenarios s is denoted by \(y_i^s\). The proportion of all data successfully delivered to the reporting points in scenario s is denoted by \(x^s\). Every robot i generates data \(r_i^s\) and can send it to its neighbors within some communication range. The communication links between the nodes depend on a function \(R_{ij}^s\) that calculates what proportion of information sent by node \(i \in \mathcal {J}\) is received and correctly decoded by node \( j \in \mathcal {J}\cup \mathcal {K}\). Then \(R_{ij}^s T_{ij}^s\) is the amount of data received and correctly decoded by node \(j \in \mathcal {J}\cup \mathcal {K}\) in scenario \(s\in \mathcal {S}\). Then the set of neighbors of node i in scenario \(s \in \mathcal {S}\) is the set of nodes within its communication range \(\mathcal {N}^s(i) = \{ j \in \mathcal {J}\cup \mathcal {K}: R_{ij}^s > 0 \}\).

We associate a local risk with each robot regarding the information \(y_i^s\) that is not communicated to neighbors nor delivered to the reporting points. For every \(i \in \mathcal {J}\), it is calculated as

$$\begin{aligned} y_i^s = r_i^s + \sum _{j \in \mathcal {J}} R_{ji}^s T_{ji}^s - \sum _{j \in \mathcal {J}\cup \mathcal {K}} T_{ij}^s. \end{aligned}$$

The systemic risk addresses the total proportion of data not delivered to the reporting points: \((1-x^s)\), where \(x^s\) is calculated as follows

$$\begin{aligned} x^s \sum _{i \in \mathcal {J}} r_i^s = \sum _{i \in \mathcal {J}}\sum _{k \in \mathcal {K}} R_{ik}^s T_{ik}^s. \end{aligned}$$
(25)

To implement the distributed method for the operation of the robots, we introduce copies of the total proportion variable for each robot (denoted \(x_i^s\)) and impose equality among the auxiliary variables \(x_i^s\). Note that since nodes can share information only with the neighbors, one can enforce the equality of the proportion variable between neighboring nodes. Constraint (25) is then replaced by the following set of constraints

$$\begin{aligned} \begin{aligned} \sum _{i\in \mathcal {J}} x_i^s r_i^s = \sum _{i\in \mathcal {J}} \sum _{k\in \mathcal {K}} R_{ik}^s T_{ik}^s, \\ x_i^s = x_j^s, \quad i \in \mathcal {J}, j \in \mathcal {N}^s(i). \end{aligned} \end{aligned}$$
(26)

If the network is connected, constraint (26) enforces all \(x_i\) to be equal to each other and ensures the consistency and uniqueness of \(x_i\) for all \(i\in \mathcal {J}\). Then, the first-stage optimization problem takes on the following form:

$$\begin{aligned} \begin{aligned} \min _{z} ~&~ \varrho [Q(z; \xi )] \\ \text {s.t.} ~&~ \sum _{k \in \mathcal {K}} z_k \le K, \quad z_k \in \{ 0, 1 \}, \quad k \in \mathcal {K}. \end{aligned} \end{aligned}$$
(27)

Here \(Q(z; \xi )\) is a random variable with realizations \(Q(z; \xi ^s)\) for \(s\in \mathcal {S}\) denoting the optimal values of the second-stage problem. The second-stage problem deals with the operation of the robots after the locations of the active reporting points are fixed; it is formulated as follows:

$$\begin{aligned} Q(z;\xi ^s) = \min _{T^s, y^s, x^s}&~ c_1 \sum _{i\in \mathcal {J}} w_i y_i^s + c_2 \sum _{i\in \mathcal {J}} w_i(1-x_i^s) \end{aligned}$$
(28)
$$\begin{aligned} \text {s.t.} ~&~ y_i^s = r_i^s + \sum _{j\in \mathcal {J}} R_{ji}^s T_{ji}^s - \sum _{j\in \mathcal {J}\cup \mathcal {K}}T_{ij}^s, \quad i \in \mathcal {J}, \end{aligned}$$
(29)
$$\begin{aligned} ~&~ \sum _{j \in \mathcal {J}} T_{ji}^s + \sum _{j\in \mathcal {J}\cup \mathcal {K}} T_{ij}^s \le a, \quad i \in \mathcal {J}, \end{aligned}$$
(30)
$$\begin{aligned} ~&~ \sum _{i \in \mathcal {J}} x_i^s r_i^s = \sum _{j \in \mathcal {J}} \sum _{k \in \mathcal {K}} R_{jk}^s T_{jk}^s , \end{aligned}$$
(31)
$$\begin{aligned} ~&~ x_i^s = x_j^s, \quad i \in \mathcal {J}, j \in \mathcal {N}^s(i) , \end{aligned}$$
(32)
$$\begin{aligned} ~&~ T_{ik}^s \le M z_k, \quad i \in \mathcal {J}, k \in \mathcal {K}, \end{aligned}$$
(33)
$$\begin{aligned} ~&~ T_{ij}^s \ge 0, \quad i \in \mathcal {J}, j \in \mathcal {J}\cup \mathcal {K}, \end{aligned}$$
(34)
$$\begin{aligned} ~&~ y_i^s \ge 0,\quad 0 \le x_i^s \le 1, \quad i \in \mathcal {J}. \end{aligned}$$
(35)

Here \(w_i \ge 0\) for \(i \in \mathcal {J}\) in (28) are modeling parameters chosen so that \(\sum _{i\in \mathcal {J}}w_i = 1\) in-line with the aggregation approach proposed in Sect. 3.2. Similarly, the modeling parameters \(c_1 > 0\) and \(c_2 > 0\) represent the weights associated with local and systemic sources of risk, respectively, and \(c_1 + c_2 = 1\). The constraint (30) imposes a limit on the total amount of information exchanged by one robot, where \(a>0\) is some given constant. The constraint (33) ensures that information can be delivered only to active reporting points and \(M > 0\) is a constant imposing a limit on the information that can be sent to the reporting point by one robot in a single hop. The second-stage problems are always feasible for any feasible first-stage decision, i.e., the two-stage problem has a relatively complete recourse. Hence, \(\varrho [Q(z; \xi )]\) is well-defined and finite for all feasible first-stage decisions z.

5.4 Numerical Results

The performance of the distributed method is tested for a team of robots working on a square map given by the points with relative coordinates (0, 0) and (1.5, 1.5). The spatial distribution of available data to be gathered follows a normal distribution with an expected value \(\mathcal {C} = (0.5, 1.25)\) in the upper left corner of the map. The network consists of 20 robots and 4 potential fixed locations of the reporting points. We generated 100 scenarios for different spatial configurations of the robots with equal probabilities. The rate function \(R_{ij}^s\) is a decreasing function of the distance \(\Vert d_{ij}^s\Vert \) between the nodes given by

$$\begin{aligned} R_{ij}^s = {\left\{ \begin{array}{ll} 1, &{} \text {if } \Vert d_{ij}^s \Vert \le \ell , \\ a \Vert d_{ij}^s \Vert ^3 + b\Vert d_{ij}^s\Vert ^2 + c\Vert d_{ij}^s\Vert + e, &{} \text {if } \ell < \Vert d_{ij}^s\Vert \le u, \\ 0, &{} \text {if } \Vert d_{ij}^s\Vert > u. \end{array}\right. } \end{aligned}$$

We set \(\ell = 0.3\), \(u = 0.6\) and values abce are chosen so that \(R_{ij}^s\) is a continuous function. This function is commonly used in literature, see e.g. [19, 36]. The information \(r_i^s\) gathered by robot i in scenario s, depends on the robot’s position relative to the expected value \(\mathcal {C}\) given above.

It is assumed that the network is connected in all possible scenarios, that is, every node has at least one neighbor within the communication range, and all nodes are connected to the reporting points through multiple hops. We solve the problem with fixed values of \(w_i = \frac{1}{J}\) for all \(i \in \mathcal {J}\) and different settings of c in (28) using a convex combination of the expected value and \(\textsf{AVaR}_\alpha (\cdot )\) with \(\alpha = 0.1\).

After extensive experiments performed with different values of parameters \(\kappa _0\) and \(\kappa _s\), we found out that for this specific problem setting \(\kappa _0 \in (0.01, 0.05)\) and \(\kappa _s \in (0.01, 0.2)\) in all scenarios leads to the convergence to the centralized solution of the problem. Moreover, higher values of \(\kappa _s\) lead to faster convergence in all scenarios, even though these values are outside of the theoretical range of \((0, \frac{1}{20})\). The problem is solved in both centralized and distributed ways and the results for \(c=[0.2,~0.8]\) and \(c=[0.5,~0.5]\) in one of the scenarios are shown in Figs. 1 and 2, respectively. As can be seen in Figs. 1b and 2b, nodes converge to the centralized solution of the proportion of information delivered to the reporting points. Note that the higher value of \(\kappa _s\) leads to faster convergence of the variables to their centralized values.

Fig. 1
figure 1

a Evolution of the sum of local losses \(\sum _{i \in \mathcal {J}} y_i^s\) and proportion constraint (31), and b convergence of the robots’ proportion variables \(x_i^s\) to the centralized solution in scenario s for \(c=[0.2,~0.8]\), \(\kappa _0 = 0.01\), \(\kappa _s = 0.2\)

Fig. 2
figure 2

a Evolution of the sum of local losses \(\sum _{i \in \mathcal {J}} y_i^s\) and proportion constraint (31) and b convergence of the robots’ proportion variables \(x_i^s\) to the centralized solution in scenario s for \(c=[0.5,~0.5], \kappa _0 = 0.01\), \(\kappa _s = 0.1\)

To compare different aggregation methods, we solve the problem for a bigger network of 50 robots and 4 reporting points on a \(2 \times 2\) square map over 200 scenarios in order to compare the two aggregation methods discussed in Sect. 3. First, we aggregate the individual losses of the robots using a linear scalarization and evaluate the risk. We solve the two-stage problem (27), (28)–(35) with fixed scalarization vectors \(w_i = \frac{1}{J}\) for all \(i \in \mathcal {J}\) and \(c = [0.8, 0.2]\) using two different risk measures: Mean-\(\textsf{AVaR}_\alpha (\cdot )\) for three values of \(\alpha = 0.1, 0.2, 0.3\) and Mean-Upper-Semideviation of order 1. Note that this problem can be solved using a multicut method in a centralized way or in a distributed way, as was shown before. We reformulate the two-stage problem into a one-large scale optimization problem where the total risk is evaluated using nonlinear aggregation methods shown in Sect. 3.3.

The setup of the communication network problem and the optimal solutions in one of the scenarios for both methods are shown in Fig. 3. Notice that depending on the risk evaluation method, the set of optimal reporting points might differ. The values of the risk for both methods are aligned with the theoretical comparison: imposing a risk measure on linear scalarization of the individual losses results in smaller values than aggregation of individual risks. More importantly, more data is delivered to the reporting points if we first aggregate the losses of the robots and evaluate the risk of the total. The distribution of the proportion x of data delivered to the reporting points for two methods is shown in Fig. 4.

Fig. 3
figure 3

Network of 50 robots and 4 reporting points in one scenario. The lighter colors indicate higher rates of information generation, and the link’s thickness indicates the rate \(R_{ij}\) between nodes i and j. a Initial spatial configuration of robots (green) and reporting points (blue). The optimal routing decisions between nodes when b we evaluate the risk of total loss or c aggregate the individual risks of the nodes. Blue nodes are selected and red nodes are not selected

Fig. 4
figure 4

Proportion of information delivered to the reporting points using a Mean-AVaR at \(\alpha = 0.1\) and b Mean-Upper-Semideviation of order 1. c Comparison of the risk values for two aggregation methods

Using the optimal values of the decision variables, we have also calculated \(\textsf{mAVaR}\) and \(\textsf{vmAVaR}\) for comparison to the optimized \(\textsf{AVaR}\). The following formulas were used to calculate the values:

$$\begin{aligned} \textsf{AVaR}_\alpha ({Q})= & {} \inf _{\eta \in \mathbb {R}} \Big \{ \eta + \frac{1}{\alpha } \mathbb {E}\big [ (c^\top {Q} - \eta )_+ \big ] \Big \}, \\ \textsf{mAVaR}_\alpha ({Q})= & {} \mathbb {E}\big [ c^\top Q ~|~ Q \in Z_{1-\alpha } \big ], \\ \textsf{vmAVaR}_\alpha ({Q})= & {} \inf _{v \in \textsf{mVaR}_{1-\alpha }(Q)} \Big \{ c^\top v + \frac{1}{\alpha } \mathbb {E}[ c^\top ({Q} - v)_+ ]\Big \}. \end{aligned}$$

The values of \(\textsf{AVaR}\), \(\textsf{mAVaR}\) and \(\textsf{vmAVaR}\) are shown in Table 1. We see that \(\textsf{AVaR}_\alpha (Q)\) results in smaller values than \(\textsf{mAVaR}_\alpha (Q) \) and \(\textsf{vmAVaR}_\alpha (Q)\) at all confidence levels \(\alpha \) as it was shown theoretically in Sect. 4. Those measures of risk are computationally very demanding and not amenable to the type of decision problems we are considering.

Table 1 Comparison of \(\textsf{AVaR}\), \(\textsf{mAVaR}\) and \(\textsf{vmAVaR}\) values for \(\alpha = 0.1, 0.2, 0.3\)

6 Conclusions

We propose a sound axiomatic approach to measures of risk for distributed systems and construct several classes of non-trivial measures. The important features of the proposed measures are as follows: they conform to the axioms, they can be calculated efficiently, and they are amenable to distributed optimization. These measures are less conservative than most of the other systemic measures of risk. The class of measures proposed in Sect. 3.3 goes beyond the popular ways to aggregate the risks of agents.

We have devised a distributed method for solving the risk-averse two-stage problems with a monotropic structure, which works for any coherent measure of risk. We demonstrate the viability of the proposed framework on a non-trivial two-stage problem involving wireless communication. The numerical experiments confirm the theoretical observations and show the advantage of the proposed approach to risk aggregation in distributed systems.

The approach offers a good balance of robustness to uncertainty, optimality of the loss functions, and the efficiency of the numerical operation.