Elicitability and identifiability of set-valued measures of systemic risk

Identification and scoring functions are statistical tools to assess the calibration of risk measure estimates and to compare their performance with other estimates, e.g. in backtesting. A risk measure is called identifiable (elicitable) if it admits a strict identification function (strictly consistent scoring function). We consider measures of systemic risk introduced in Feinstein et al. (SIAM J. Financial Math. 8:672–708, 2017). Since these are set-valued, we work within the theoretical framework of Fissler et al. (preprint, available online at arXiv:1910.07912v2, 2020) for forecast evaluation of set-valued functionals. We construct oriented selective identification functions, which induce a mixture representation of (strictly) consistent scoring functions. Their applicability is demonstrated with a comprehensive simulation study.


Introduction 1.Systemic risk measures
In the financial mathematics literature, there is a great interest in various types of risk and, in particular, its quantitative measurement.The quantitative assessment of risk connected to a particular financial position dates back to Artzner, Delbaen, Eber, and Heath (1999) and has since then been discussed, from several points of view, in many further works, see e.g.(Artzner, Delbaen, & Koch-Medina, 2009;Föllmer & Schied, 2002;Föllmer & Weber, 2015).For a thorough overview of risk measures we refer the reader to the textbook Föllmer and Schied (2004).
The financial crisis of 2007 -2009 and its aftermaths in the last decade have starkly underpinned the need to quantitatively assess the risk of an entire financial system rather than merely its individual entities.One of the first academic works on systemic risk is the seminal paper by Eisenberg and Noe (2001).The focus of this work, however, lies on modelling the financial system rather than measuring its systemic risk.Since then, financial mathematicians have developed a rich strand of literature, encompassing different approaches and emphasising various aspects of systemic risk.The model of Eisenberg and Noe (2001) has been generalised in different ways, for instance by considering illiquidity (Rogers & Veraart, 2013) or central clearing (Amini, Filipovic, & Minca, 2015).One strand of literature defines systemic risk measures by applying a scalar risk measure to the distribution of the total profits and losses of all firms in the system (Acharya, Pedersen, Philippon, & Richardson, 2016;Adrian & Brunnermeier, 2016).Recognising the drawbacks of treating the economy as a portfolio, Chen, Iyengar, and Moallemi (2013) introduce an axiomatic approach to measuring systemic risk, further extended by Kromer, Overbeck, and Zilch (2016) and Hoffmann, Meyer-Brandis, and Svindland (2016).The axiomatic approach of Chen et al. (2013) is widely used and amounts to systemic risk measures of the form ρ(Λ(Y )), where Y is a d-dimensional random vector representing the financial system, ρ is a scalar risk measure and Λ : R d → R a nondecreasing aggregation function.However, this approach of aggregating first and then adding a total capital requirement of the system has the drawback that it results in the measurement of bailout costs rather than capital requirements that prevent a financial crisis.These types of risk measures are also called insensitive as they do not take into account the impact of capital regulation on the system.
As an alternative, so called sensitive systemic risk measures have been introduced by Feinstein et al. (2017); see also Biagini, Fouque, Frittelli, and Meyer-Brandis (2019) and Armenti, Crépey, Drapeau, and Papapantoleon (2018) for related approaches.Here, one first adds the capital requirements to the d financial institutions and then applies an aggregation function.That is, one considers systemic risk measures of the form (1.1) Thus, the impact of regulation on the system is taken into account.
In this paper, we will mainly focus on this type of systemic risk measures as introduced in Feinstein et al. (2017); see Section 2.1 for more details.R(Y ) specifies the set of all capital allocations k ∈ R d such that the new system Y + k is deemed acceptable with respect to ρ after being aggregated via Λ.As such, R takes an ex ante perspective prescribing the injections to (and withdrawals from) each financial firm adequate to prevent the system Y from a crisis, whereas ρ(Λ(Y )), as described above, can be interpreted as the bailout costs of the system after a systemic event has occurred.

Elicitability and identifiability
The field of quantitative risk management has seen a lively debate about which scalar risk measure is most appropriate in practice; see Embrechts, Puccetti, Rüschendorf, Wang, and Beleraj (2014) and Emmer, Kratz, and Tasche (2015) for detailed academic discussions and Bank for International Settlements (2014) for a regulatory perspective in banking.Besides differences in axiomatic properties such as coherence (Artzner et al., 1999) and convexity (Föllmer & Schied, 2002) of risk measures, the debate has also considered more statistical aspects of risk measures.The two most widely discussed statistical desiderata are robustness in the sense of Hampel (1971)-cf.Cont, Deguest, and Scandolo (2010); Krätschmer, Schied, and Zähle (2014)-and elicitability.
The term elicitability is due to Osband (1985) and Lambert, Pennock, and Shoham (2008).Using the terminology of mathematical statistics, a real-valued law-invariant risk measure ρ is elicitable if it admits an M -estimator (Huber & Ronchetti, 2009).That means, there is a loss or scoring function S : R × R → R such that S(ρ(F ), y) dF (y) < S(x, y) dF (y) (1.2) for all F in some class of distribution functions M and for all x = ρ(F ).Any scoring function S satisfying (1.2) is called strictly M-consistent for ρ : M → R.Besides M -estimation, possibly in a regression framework, such as quantile regression (Koenker, 2005;Koenker & Basset, 1978) or expectile regression (Newey & Powell, 1987), using strictly consistent scoring functions encourages truthful forecasting.This incentive compatibility opens the way to meaningful forecast comparison (Gneiting, 2011a) which is closely related to comparative backtests in finance (Fissler, Ziegel, & Gneiting, 2016;Nolde & Ziegel, 2017).Ziegel (2016a) showed that expectiles are basically the only elicitable and coherent risk measures.In line with this, the prominent risk measure Value at Risk at level α ∈ (0, 1) (VaR α ), which corresponds to a quantile under mild conditions, turns out to be elicitable but not coherent.On the other hand, Expected Shortfall at level α ∈ (0, 1) (ES α ), a tail expectation, is coherent, but fails to be elicitable.Interestingly, Fissler and Ziegel (2016) and Acerbi and Szekely (2014) showed that the pair (VaR α , ES α ) is elicitable despite ES's failure to have a strictly consistent scoring function on its own.A similar result has recently been established providing the elicitability of the triplet consisting of the risk measure Range Value at Risk along with VaR at two different levels (Fissler & Ziegel, 2019a).
Closely related to the notion of elicitability is the concept of identifiability.While the former is useful for forecast comparison or model selection, the latter aims at model and forecast validation or checks for calibration.Again invoking the language of mathematical statistics, a law-invariant real-valued risk measure ρ is identifiable if it is a Z-functional.That means if it admits a moment function or strict M-identification function V : R × R → R such that V (x, y) dF (y) = 0 ⇐⇒ x = ρ(F ) (1.3) for all F ∈ M and for all x ∈ R. Steinwart, Pasin, Williamson, and Zhang (2014) showed that, under appropriate regularity conditions, the identifiability of a real-valued risk measure is equivalent to its elicitability.Coherently, VaR α is identifiable under mild regularity conditions, using a simple coverage check, whereas ES α fails to have a strict identification function.For a discussion of identifiability and calibration in the context of evaluating risk measures, we refer the reader to Davis (2016) and Nolde and Ziegel (2017).

Novel contributions and structure of the paper
The aim of this paper is to establish elicitability and identifiability results for systemic risk measures of the form at (1.1).Since these risk measures are set-valued rather than real-valued, we use the strict distinction between selective and exhaustive reports introduced in Fissler et al. (2019a) along with the corresponding notions of elicitability and identifiability.In a nutshell and translated to the setting of systemic risk measures of the form at (1.1), a selective forecast specifies a single capital allocation that makes the system acceptable.On the other hand, exhaustive forecasts are more ambitious, aiming at reporting all adequate capital allocations simultaneously in form of a set.Consequently, exhaustive scoring or identification functions take sets as their first argument, whereas their selective counterparts work with points as inputs.
The corresponding definitions along with basic properties and assumptions on systemic risk measures defined at (1.1) and derived quantities such as efficient cash-invariant allocation rules (EARs) (Feinstein et al., 2017) are gathered in Section 2. Section 3 contains our main results, most notably Theorem 3.1 asserting the existence of oriented selective identification functions for R 0 and Theorem 3.8, which uses these identification functions to construct strictly consistent exhaustive scoring functions for R. Interestingly, these scoring functions arise as an integral construction of elementary scores, exploiting the orientation of the identification function.This can be considered a higher-dimensional analogon to the mixture representation of scoring functions for one-dimensional forecasts established in the seminal paper Ehm, Gneiting, Jordan, and Krüger (2016).Similarly, this gives rise to the diagnostic tool of Murphy diagrams facilitating the assessment of forecast dominance; see Subsection 3.2.6.Thanks again to the orientation of the identification functions we derive order-sensitivity results of these consistent scoring functions (Proposition 3.10).Concerning EARs mentioned above, Proposition 3.6 establishes strict selective identification functions for EARs, interestingly mapping to a function space.Since systemic risk measures R of the form at (1.1) are translation equivariant in the sense that R(Y + k) = R(Y ) − k for all k ∈ R d and-under mild assumptions-homogeneous in that R(cY ) = cR(Y ) for all c > 0 (Lemma 4.2), it makes sense to determine the subclasses of translation invariant or positively homogeneous consistent scoring functions for R, which is the content of Section 4.
The elicitability results on R rely on the identifiability / elicitability of the underlying scalar risk measure ρ.This spells doom for the elicitability of systemic risk measures induced by ES as a scalar risk measure.Section 5 outlines this issue and establishes a solution to this challenge at the cost of a higher forecast complexity.Similarly to the scalar case, considering a pair of R based on ES together with a VaR-related quantity leads to selective identifiability and exhaustive elicitability results (Proposition 5.1 and Theorem 5.2).The practical applicability of our results is demonstrated in terms of a simulation study, being the content of Section 6. Employing Diebold-Mariano tests, we examine how well the strictly consistent scores are able to distinguish different forecast performances.We also graphically illustrate the diagnostic tool of Murphy diagrams in a simulation example, utilising a traffic-light approach suggested in Fissler et al. (2016).Section 7 closes the paper with a discussion and outlook of possible applications of our results and avenues of future research.All proofs and purely technical results are deferred to the Appendix.Results concerning risk measures insensitive with respect to capital allocations along with some additional graphics of simulation results are collected in an online supplementary material Fissler, Hlavinová, and Rudloff (2019b).

Measures of systemic risk
We consider set-valued systemic risk measures studied in Feinstein et al. (2017).In particular, we concentrate on law-invariant risk measures R that are induced by some law-invariant scalar risk measure ρ.To settle some notation, let (Ω, F, P) be an atomless probability space.For some integer Where convenient we will tacitly assume that Y d and Y are closed under translation meaning that We consider some scalar monetary law-invariant risk measure ρ : Y → R (Artzner et al., 1999).That means, we can alternatively consider ρ as a map ρ : M → R such that for a random variable X ∈ Y with distribution F X ∈ M we define ρ(F X ) := ρ(X).We assume that ρ is cash-invariant, that is, ρ(X + m) = ρ(X) − m for all m ∈ R and all X ∈ Y, and monotone, meaning X ≥ Z P-a.s.implies that ρ(X) ≤ ρ(Z) for all X, Z ∈ Y.We often dispense with the usual normalisation assumption that ρ(0) = 0.
Remark 2.1.Often, the scalar risk measure is assumed to map to R * = (−∞, ∞].We could also do that at the costs of a more technical treatment.However, to avoid unnecessary technicalities, we refrain from that and will assume throughout the paper that any scalar risk measure will attain real values only.
We present the two most natural law-invariant set-valued measures of systemic risk that are based on ρ and Λ, namely In (2.2) and later on, we use the shorthand k := d i=1 k i for some vector k = (k 1 , . . ., k d ) ∈ R d .Note the difference between R and R ins .The risk measure R takes an ex ante perspective in the sense that it specifies all capital allocations k ∈ R d needed to be added to the system Y to make the aggregated system Λ(Y + k) acceptable under ρ.On the other hand, R ins takes an ex post perspective on quantifying the risk of the system Y .That means it first considers the current aggregated system Λ(Y ) and then specifies the total capital requirement k one needs to add to make the aggregated system acceptable, which amounts to specifying the bail-out costs of the aggregated system Λ(Y ) under ρ.In particular, the risk measure R ins is insensitive to the capital allocation to each financial firm, disregarding possible transaction costs or other dependence structures between the financial firms.This justifies the mnemonic terminology.We would like to remark that both risk measures, R and R ins , can be of interest in applications, taking into regard the different perspectives on systemic risk.However, the mathematical treatment and complexity differ considerably: Due to the cash-invariance of ρ, R ins takes the equivalent form This means that R ins is actually a bijection of the scalar risk measure ρ • Λ : Chen et al. (2013).Therefore, one has to evaluate the risk measure ρ only once to determine R ins .In contrast, such an appealing equivalent formulation is generally not available for R, unless Λ is additive, or is even the sum in which case R and R ins coincide.Consequently, in general, one is bound to evaluate ρ infinitely often to compute R; see also the discussion in Feinstein et al. (2017).The main focus of this paper are elicitability and identifiability results of systemic risk measures of the form at (2.1) and (2.2).However, since one can exploit the one-to-one relation between R ins and ρ • Λ and make use of the revelation principle (Fissler, 2017;Gneiting, 2011a;Osband, 1985) to establish (exhaustive) elicitability and identifiability results, we do not present results about R ins in the main body of the paper, but rather defer it to the online supplementary material Fissler et al. (2019b).
For the sake of completeness, we evoke the most important properties of R presented in Feinstein et al. (2017).Due to the properties that ρ is cash-invariant and that Λ is increasing, we obtain that the values of R defined at (2.1) are upper sets.That means for any Note that both R d and ∅ are elements of P(R d ; R d + ).Moreover, R defined at (2.1) can attain these values even if the underlying scalar risk measure ρ maps to R only, e.g., when Λ is bounded.While the case R(Y ) = ∅ corresponds to the case that a scalar risk measure of a financial position is +∞, meaning that the system Y + k is deemed risky no matter how much capital is injected, the case R(Y ) = R d corresponds to −∞ in the scalar case.The latter situation of "cash cows" with the possibility to withdraw any finite amount of money without rendering the position risky is usually deemed unrealistic and is excluded.Therefore, we shall usually only discuss the former case, but remark that a treatment of the latter were also possible for most results.The monotonicity carries over to To shorten the notation, we also introduce further subclasses of P(R d ; R d + ) where B(R d ) denotes the Borel-σ-algebra.
Definition 2.2.(i) The class of Borel-measurable upper subsets of R d is denoted with We shall regularly make use of the following assumptions.
If Λ : R d → R is continuous and ρ satisfies the Fatou property (which means it is lower-semicontinuous), the values of R are closed.Note that the law-invariance of ρ implies the Fatou property (Jouini, Schachermayer, & Touzi, 2006).If moreover the function Λ : R d → R is strictly increasing, the second part of Assumption (2) is satisfied as well.Similarly to R, we introduce the law-invariant map Since Λ is increasing and ρ is cash-invariant, one then obtains the relation That means that the values of R 0 determine R completely.Moreover, if Λ is strictly increasing, then R 0 (Y ) can be characterised as the topological boundary of R(Y ) which has the interpretation that R 0 (Y ) contains the efficient capital allocations that make Y acceptable under R.That means for such situations, R and R 0 are connected via a oneto-one relation.Again, this means that exhaustive elicitability results for R (Theorem 3.8) carry over to R 0 for such situations, invoking the revelation principle.
Finally, we introduce an important scalarization of the systemic risk measure R, called efficient cash-invariant allocation rule (EAR), as introduced in Feinstein et al. (2017).Under certain circumstances, an EAR can also be considered as a selection of R, or alternatively, of R 0 .1 Roughly speaking, for Y ∈ Y d , the value of EAR(Y ) gives the capital allocation(s) with minimal weighted costs of an allocation in R(Y ).For simplicity, we shall confine attention to EARs with a fixed price or weight vector w ∈ R d As discussed in Feinstein et al. (2017), EAR w (Y ) is actually not necessarily a singleton.More precisely, for closed R(Y ) it fails to be a singleton if and only if ∂R(Y ) contains a line segment that is orthogonal to the price vector w.
Since the scalar risk measure ρ is assumed to be law-invariant, also the derived quantities R, R ins , R 0 and EAR w are law-invariant.Therefore, we shall frequently abuse notation and write R(F Y ) := R(Y ) for Y ∈ Y d with distribution F Y ∈ M d ; with analogous conventions for the other law-invariant maps.

Elicitability and identifiability of set-valued functionals
We have already mentioned the definitions of elicitability and identifiability for scalar risk measures ρ : M → R at (1.2) and (1.3).All other risk measures considered, R, R 0 and EAR, are set-valued, assuming subsets of R d .Hence, we make use of the theoretical framework on forecast evaluation of set-valued functionals introduced in Fissler et al. (2019a).The main idea is to have a thorough distinction in the form of the forecasts between a selective notion where forecasts are single points and an exhaustive mode where forecasts are set-valued.Moreover, corresponding notions of identifiability and elicitability are introduced and discussed in a very general setting, with the main result being that a set-valued functional is elicitable either in the selective, or the exhaustive sense, or it is not elicitable at all (Fissler et al., 2019a, Theorem 2.14).To allow for a concise presentation, we confine ourselves to introducing only the notions we need and we shall do it directly in terms of R and R 0 ; the case of EAR will be considered later separately.In the sequel, let A ⊆ B(R d ).Moreover, for scoring functions S : A×R d → R or identification functions V : R d × R d → R we will use the shorthands S(A, F ) := S(A, y) dF (y) and V (x, F ) := V (x, y) dF (y), A ∈ A, x ∈ R, and will tacitly assume that these integrals exist for all F ∈ M d .Definition 2.4.(i) An exhaustive scoring function S : The exhaustive score S is strictly (ii) The risk measure R : M d → A is exhaustively elicitable if there is a strictly Mconsistent exhaustive scoring function for R.
Note that the strict consistency of an exhaustive scoring function

Main results
We present some of the main results of the paper in this section where we gather identifiability results in Subsection 3.1 and elicitability results are presented in Subsection 3.2.Theorem 3.1 establishes the selective identifiability of R 0 .Notably, the main assumption behind Theorem 3.1 and the subsequent results relying on this identifiability is the identifiability of the underlying scalar risk measure ρ in (2.1).A fortiori, it needs to admit an oriented identification function.According to Steinwart et al. (2014), a strict identification function V : R → R for a real-valued risk measure ρ : Invoking Theorem 8 in Steinwart et al. (2014) the existence of an oriented identification function for ρ is equivalent to the elicitability of ρ under mild regularity conditions.Proposition 3.7 establishes that under certain assumptions on the aggregation function Λ also the converse holds.That is, the elicitability of R implies the elicitability of ρ.

Identifiability results
Theorem 3.1.Let ρ : M → R be identifiable.Then the following assertions hold for (3.2) Remark 3.2.The orientation of V R 0 can be considered as the multivariate counterpart of the orientation of V ρ with respect to the componentwise order on R d .Indeed, in both cases, a negative expected identification function corresponds to the case of predicting a capital requirement too small to make the system Y ∈ Y d acceptable with respect to R or the single firm X ∈ Y acceptable with respect to ρ.
2) and which is such that the expected identification function VR 0 (•, F ) is continuous for any F ∈ M d , then the values of R are closed sets.Fissler (2017, Proposition 3.2.1)states that under some richness assumptions on the class M, any other strict identification function where g : R → R is a non-vanishing function.Moreover, if V ρ is oriented, then V ρ is oriented if and only if the function g in (3.3) is strictly positive; see also Steinwart et al. (2014, Theorem 8).Consequently, starting with such an identification function V ρ , the resulting (oriented Hence, the only difference is that one ends up with a scaled version of V R 0 where the scaling factor g(0) is positive if both V R 0 and V R 0 are oriented.
In a similar spirit as Remark 3.3, one might also wonder whether the (oriented) strict selective identification functions constructed in Theorem 3.1 are the only (oriented) strict selective identification functions for R 0 .This is definitely not the case since due to the linearity of the expectation, any function In particular, the constant g(0) appearing in (3.4) can be incorporated into the function h such that we see that it does not matter which (oriented) strict exhaustive identification function V ρ we choose to end up with the form at (3.5).The following theorem establishes that basically all selective M d -identification functions for R 0 are of the form at (3.5).
That is why we formulated the proposition in terms of a general action domain A ⊆ R d rather than R d .
(ii) If M d is rich enough, and under additional regularity conditions on V R 0 , one can also establish a pointwise version of (3.6); see Fissler andZiegel (2016, 2019b) for details.
Finally, we turn our attention to the efficient cash-invariant allocation rules that represent a possibility to choose one capital allocation that would make the system acceptable, namely the cheapest one with respect to a weight vector w.For any vector w ∈ R d , we use the notation w ⊥ := {x ∈ R d | w x = 0} for the orthogonal complement of the subspace spanned by w.With R w ⊥ we denote the space of all functions mapping from w ⊥ to R. Proposition 3.6.Let ρ be a scalar risk measure, R and R 0 as defined in (2.1) and (2.3), and suppose that Assumption (2) holds.Assume that R 0 is selectively identifiable with an oriented strict selective If the underlying risk measure R is known to assume convex sets only (e.g. if ρ is convex and Λ concave, see Feinstein et al. (2017)), it is even sufficient to evaluate VEARw (k, F )(x), or its empirical counterpart, for x ∈ R d in a neighbourhood of 0, which can also be seen nicely in Figure 3 in Appendix A.1.
In Section 2.4 of Fissler et al. (2019a) different versions of the Convex Level Sets (CxLS) property are introduced and their necessity for identifiability and elicitability for set-valued functionals is discussed.Since the selective identifiability in Proposition 3.6 deviates from the usual definition, it is worth noting that, under the conditions of Proposition 3.6, EAR w satisfies the selective CxLS property.However, we do not see how to establish the selective CxLS* property or, alternatively, the exhaustive CxLS property, which leaves the question open if and in what sense EAR w might be elicitable.
We would like to compare the concept of identifiability introduced in Proposition 3.6 to the discussion about the backtestability of loss value at risk in Section 5 of Bignozzi, Burzoni, and Munari (2018).One can interpret their proposal as using a functionvalued identification function, too.Then, their analogue of (3.7) is that the infimum of the function-valued identification function be 0 if using the correctly specified forecast.Interestingly, this version of identifiability does not imply that the functional under consideration has convex level sets.
We end this section by noting that the identifiability of ρ and the selective identifiability of R 0 are even equivalent if Λ : R d → R possesses a measurable right inverse.
Proposition 3.7.Let ρ : M → R be a risk measure, Λ : R d → R a surjective aggregation function, and R 0 : M d → 2 R d as defined in (2.3).Assume that there exists a measurable right inverse η : R → R d such that Λ • η = id R , Y is closed under translations, and that for any X ∈ Y, η(X) belongs to Y d .Then it holds that ρ is (selectively) identifiable if and only if R 0 is selectively identifiable.

Elicitability results and mixture representation
In the seminal paper Ehm et al. (2016) it is shown that, subject to regularity conditions, any non-negative scoring function S : R × R → [0, ∞] which is consistent for the αquantile (the τ -expectile) can be written as a mixture or Choquet representation where H is a non-negative measure on B(R) and S θ , θ ∈ R, are non-negative elementary scoring functions for the α-quantile (the τ -expectile).In particular, S θ take the form where V is an oriented identification function for the α-quantile (the τ -expectile).The score at (3.8) is strictly consistent if and only if the measure H is strictly positive, that is, it puts positive mass on any open non-empty set.Ziegel (2016b) and Dawid (2016) argued that this construction also works for more general one-dimensional functionals besides expectiles and quantiles which admit an oriented identification function; cf.Jordan, Mühlemann, and Ziegel (2019).Steinwart et al. (2014) showed that, for one-dimensional functionals satisfying certain regularity conditions, the existence of an oriented identification function is equivalent to the elicitability of the functional.While the orientation of the identification function immediately gives rise to the consistency of the elementary scores, and thus, of the mixtures at (3.8), an answer to the question as to whether all scoring functions for a certain functional are necessarily of the form at (3.8) can typically only be answered invoking Osband's Principle (Fissler & Ziegel, 2016;Osband, 1985) hence assuming smoothness and regularity conditions.
Our construction of strictly consistent exhaustive scoring functions for the systemic risk measures R also exploits the key result about the existence of oriented strict selective identification functions for R 0 and is similar in nature to the approach described above.For any y ∈ R d , we shall use the notation R(y) := R(δ y ).
Note that the condition at (3.10) is some weak form of orientation.However, it does not imply that V R 0 is an identification function for R 0 .In the one-dimensional setting, such a situation can occur in practice if the underlying risk measure is Value at Risk and the distributions are not continuous, implying that the corresponding quantile identification function will nowhere attain 0 in expectation.
Even though we defer the formal proof of Theorem 3.8 to Appendix A.1, we would still like to sketch and illustrate the idea, taking into account that Theorem 3.8 constitutes one of the main results of the paper.The key observation is the identity Then, one uses the weak orientation of V R 0 given at (3.10) to conclude that the first integral on the right hand side of (3.13) is ≥ 0 while the second integral is ≤ 0. A graphic illustration of the situation is provided in Figure 1.
It is in order to make some comments about the scoring functions constructed in Theorem 3.8.

Comparison with one-dimensional case
The similarity of the mixture representation at (3.12) and (3.8) is obvious.With a closer look, one can also see the similarities on the level of the elementary scores given at (3.11) and (3.9).Indeed, (3.9) can be re-written as The form of R(y) can be described explicitly in the following lemma, where we use the fact that ρ(ρ(0 where the scalar risk measure ρ is decreasing and cash-invariant, and the aggregation function Λ : R d → R is increasing.Then, for each y ∈ R d it holds that Accounting for the sign convention that the negative of a quantile or expectile are a scalar risk measure, one can see that the elementary scores at (3.11) essentially boil down to the ones at (3.9) for dimension d = 1.

Integrability
The non-negativity of the elementary scores at (3.11) guarantees that the integral at (3.12) always exists.However, as stated in part (iii) of Theorem 3.8, these scores are only strictly consistent if SR,π (R(F ), F ) < ∞, which suggests the question as to when the integral at (3.12) is finite.A sufficient condition for the latter is that

Normalisation
By construction, the elementary scores at (3.11), and therefore the scores at (3.12), are non-negative.It is well known that if a scoring function S(x, y) is (strictly) M-consistent for some functional T then for λ > 0 and some M-integrable function a : O → R, the score S (x, y) = λS(x, y) + a(y), λ > 0 is also (strictly) M-consistent for T .Following Gneiting and Raftery (2007) we say that S and S are equivalent.Therefore, if M contains all point measures, S is strictly M-consistent for T and y → S (T (δ y ), y) is M-integrable, the score S(x, y) = S (x, y) − S (T (δ y ), y) is non-negative by construction.However, sometimes relaxing the normalisation condition that a score be non-negative can also help to relax integrability conditions on the scoring function.A standard example is the squared loss S(x, y) = (x − y) 2 which is non-negative, consistent for the mean relative to any class of distributions with a finite first moment, and strictly consistent for the mean relative to any class of distributions with a finite second moment.On the other hand, the equivalent score S (x, y) = x 2 − 2xy maps to R, but is strictly consistent for the mean relative to any class of distributions with a finite first moment.
In that light, it might be interesting to consider scores S R,k which are equivalent to the elementary scores at (3.11).A natural choice might be S Dawid (2016).This leads to an alternative mixture representation akin to (3.12) of the form However, since the integrand S R,k (A, y) may attain both positive and negative values, one needs to impose that its negative part is π-integrable in order to guarantee the existence of the integral.

Characterisation of all consistent scoring functions
There is evidence that-under appropriate regularity conditions-all consistent scoring functions for the risk measure R are equivalent to a score of the form given at (3.12).That means, modulo equivalence, the choice of the consistent scoring function boils down to the choice of the measure π.
Firstly, note that Proposition 3.4 implies that it does not matter what oriented strict M d -identification V R 0 we actually start with.Indeed, if V R 0 were another such identification function, then V R 0 (k, y) = h(k)V R 0 (k, y) for some positive function h.But this solely amounts to a change of measure, since , where π has the density 1/h with respect to π.Secondly, the class of scoring functions of the form (3.12) is convex, which is a necessary condition (Gneiting, 2011a).Thirdly, as observed above, the mixture representation at (3.12) is the natural extension of the one-dimensional case.As remarked, for the one-dimensional case, one can typically establish this sort of necessary conditions only invoking Osband's principle.Since Osband's principle relies on a first-order-condition argument, it has only been established under smoothness conditions and for the finite dimensional case.We suspect that it is possible to generalise it to the infinite dimensional setting of predicting upper sets in . Possible approaches might work by borrowing ideas from the calculus of variations or by considering increments of scores rather than derivatives.However, the technical treatment of these approaches is beyond the scope of the paper at hand such that we defer it to future research.

Order-sensitivity
It is known that-under weak assumptions on ρ-all strictly consistent scoring functions S for ρ are order-sensitive or accuracy-rewarding; see (Nau, 1985, Proposition 3), (Lambert, 2013, Proposition 2), (Bellini & Bignozzi, 2015, Proposition 3.4).In the scalar setting, this property means that x 1 ≤ x 2 ≤ ρ(F ) or ρ(F ) ≤ x 2 ≤ x 1 implies that S(x 1 , F ) ≥ S(x 2 , F ).While one gets this useful property essentially 'for free' in the scalar case, asking for order-sensitivity in a multivariate setting is a lot more involved; see Fissler and Ziegel (2019c).One of the main questions in the multivariate setting is which order relation to use.In the present situation where our exhaustive action domain consists of closed upper subsets of R d , the canonical (partial) order relation is the subset relation.That means the canonical analogue of order-sensitivity in our setting is that for any distribution The following proposition establishes that this notion of order-sensitivity is fulfilled by all scoring functions introduced in Theorem 3.8(ii).The proof basically exploits the orientation of the underlying identification function V R 0 , which is a similar argument to the one given in Steinwart et al. (2014).
Proposition 3.10.Let the assumptions of Theorem 3.8(ii) prevail.Then, the scoring function S R,π defined at (3.12) is M d -order-sensitive for R in the sense that for all Under the assumptions of Theorem 3.8(iii), if SR,π (B, F ) < ∞ and the inclusions A ⊆ B or A ⊇ B on the left hand side of (3.14) is strict, then the inequality on the right hand side is also strict.

Forecast dominance and Murphy diagrams
The notion of (strict) consistency implies that-in expectation-a correctly specified forecast will score at most as high as (strictly less than) any misspecified score.On the level of the prediction space setting (Gneiting & Ranjan, 2013;Strähl & Ziegel, 2017), Holzmann and Eulert (2014) showed that for two ideal forecasts, the one measurable with respect to a strictly larger information set is preferred under any strictly consistent scoring function; cf.Tsyplakov (2014).Patton (2019) for all consistent scoring functions S R,π of the form at (3.12) where π is a σ-finite non-negative measure on B(R d ).
Note that the expectations are taken over the joint distribution of the forecasts and the observation.Since the scores S R,π at (3.12) are parametrised by the class of non-negative σ-additive measures on B(R d ), it is not very handy to check forecast dominance in practice using the definition.To this end, the following corollary is helpful.The proof is straightforward and therefore omitted.
Corollary 3.12.Let Y ∈ Y d and A, B two (stochastic) forecasts for some systemic risk measure R of the form at (2.1), taking values in Corollary 3.12 opens the way to an immediate multivariate analogue of Murphy diagrams considered in Ehm et al. (2016).That is, if A is a P(R d ; R d + )-valued forecast of a systemic risk measure R and Y is the corresponding R d -valued observation of a financial system, we can consider the map as a diagnostic tool.For an empirical setting with forecasts A 1 , . . ., A N ∈ P(R d ; R d + ) and observations Y 1 , . . ., Y N ∈ R d , (3.15) takes the form We illustrate the usage of Murphy diagrams in a simulation study presented in Subsection 6.2.

Homogeneous and translation invariant scoring functions
Recall that the systemic risk measure R is cash-invariant, or translation equivariant,  and Ziegel (2019c) have argued that it is a reasonable requirement for a scoring function that ordering different sequences of forecasts in terms of their realised scores be invariant under transformations to which the functional of interest is equivariant.Therefore, we discuss translation invariance and positive homogeneity of scores (or score differences) for R. We start by gathering some elementary definitions.(ii) A scalar risk measure ρ : Y → R is called positively homogeneous if ρ(cX) = cρ(X) for all c > 0 and for all X ∈ Y.
(iii) An exhaustive scoring function S : ) and for all c > 0.
With these definitions in mind, we can now state the following results.(i) V R 0 is translation invariant.
(ii) Assume that M d is convex and that for any x ∈ R d there are F 1 , F 2 ∈ M d such that VR 0 (x, F 1 ) > 0 and VR 0 (x, F 2 ) < 0. Then for any translation invariant strict for all x ∈ R d and for all F ∈ M d .
(iii) If V ρ (0, •) : R → R is positively homogeneous of degree a ∈ R and Λ is positively homogeneous of degree b ∈ R, then V R 0 is positively homogeneous of degree ab.
Proposition 4.5.Let Assumption (2) hold and assume that ρ is identifiable with an oriented strict M-identification function V ρ .
(i) Let L d be the d-dimensional Lebesgue measure and S R,k be an elementary score of the form at (3.11) with identification function V R 0 (k, y) = V ρ (0, Λ(y + k)).Then the scoring function is translation invariant and M d -consistent for R.
(ii) Any finite M d -consistent scoring function S for R of the form at (3.12) is translation invariant only if S(A, y) = γS R,L d (A, y) at (4.1) for some γ ≥ 0.
Note that for all examples of ρ and Λ we are aware of, it holds that for the score at (4.1) S R,L d (A, y) is finite if the symmetric difference A R(y) is bounded and only if it has a finite Lebesgue measure.Hence, Proposition 4.5(ii) implies that the only finite translation invariant consistent score is the 0-score.
Proposition 4.6.Let Assumption (2) hold and suppose that V R 0 is an oriented strict selective M d -identification function for R 0 which is positively homogeneous of degree a ∈ R.
(i) Let π be a non-negative σ-finite positively homogeneous measure of degree b ∈ R and S R,k be an elementary score of the form at (3.11) with identification function V R 0 .Then the scoring function is positively homogeneous of degree a + b.
(ii) Any finite M d -consistent scoring function S for R of the form at (3.12) is positively homogeneous of degree a + b only if S(A, y) = γS R,π (A, y) at (4.2) for some γ ≥ 0 and for some non-negative σ-finite positively homogeneous measure π of degree b ∈ R.
Remark 4.7.For many measures π and sets A the score S R,π (A, y) defined at (3.12) might not be finite which diminishes the practical statistical applicability in the context of forecast comparison.More to the point, score differences involving S R,π (A, y) will not be finite or might even not be defined at all.To overcome this issue we suggest to work with the following convention of score differences where S R,k are the elementary scores defined at (3.11), which assume finite values only.Indeed, the integral at (4.3) might exist and might even be finite, even if S R,π (A, y) or S R,π (B, y) are ∞.This can be particularly helpful when working with translation invariant or positively homogeneous scores.

Elicitability of systemic risk measures based on Expected Shortfall
The two most common scalar risk measures in quantitative risk management are Value at Risk (VaR α ) and Expected Shortfall (ES α ) at some level α ∈ (0, 1).Both are lawinvariant scalar risk measures such that we can define them directly as functionals on appropriate classes of distributions.For a probability distribution function F and α ∈ (0, 1) we define The last decade has seen quite a lively debate about which scalar risk measure is best to use in practice where the debate has mainly focused on the dichotomy of VaR α and ES α ; see Embrechts et al. (2014) and Emmer et al. (2015) for a comprehensive academic discussion and Bank for International Settlements (2014) for a regulatory perspective.VaR α is robust in the sense of Hampel (1971), but ignores losses beyond the level α.Moreover, Cont et al. (2010) showed that robustness and coherence are mutually exclusive implying that VaR α fails to be coherent.On the other hand, ES α is a coherent-thus non-robust-risk measure.As a tail expectation, it takes into account the losses beyond the level α by definition.Another layer of the joust between the two risk measures is their backtestability (Acerbi & Szekely, 2014, 2017).While the identifiability of a risk measure is important, but not necessary for traditional backtesting, comparative backtesting relies on the elicitability of the risk measure at hand; see Fissler et al. (2016) and Nolde and Ziegel (2017).
As the negative of a selection of the α-quantile, VaR α is elicitable on any class of distributions with a unique α-quantile.In stark contrast, Gneiting (2011a) demonstrated that ES α does generally not satisfy the CxLS property which rules out its elicitability; cf.Weber (2006).
Recall that Theorem 3.1 and Theorem 3.8 establish identifiability and elicitability results for systemic risk measures based on a scalar risk measure ρ which is identifiable, and therefore-under weak regularity assumption-elicitable; see Steinwart et al. (2014).Moreover, Proposition 3.7 establishes that, under weak regularity conditions, the identifiability / elicitability of ρ is also necessary for the identifiability and elicitability of the systemic risk measure based on ρ.Therefore, for some aggregation function Λ : generally fails to be elicitable.On the other hand, for scalar risk measures, Fissler and Ziegel (2016) established that the pair (VaR α , ES α ) is elicitable under weak regularity conditions; cf.Acerbi and Szekely (2014).This might trigger the suspicion that the pair R VaRα , R ESα mapping to the product space  Frongillo and Kash (2015).Therefore, we shall consider the function-valued functional (5.2)

Identifiability results
To simplify the exposition of the results, we shall make the following assumption about the class M.
Assumption (3).All distribution functions in M are continuous and strictly increasing.
Note that this assumption imposes also implicit restrictions on the class M d since we assume that for any Y with distribution in M d , the random variable (5.4)

Elicitability results
We introduce the following regularity assumption on T VaRα defined at (5.2).

Assumption (4). The functional T
With a standard argument one can verify that Assumption (3) together with the continuity of Λ imply Assumption (4).In order to present the following theorem more compactly, let us introduce S α,g (x, y) = (1{y ≤ x} − α)(g(x) − g(y)) for any increasing function g : R → R. Recall that S α,g is a non-negative consistent selective scoring function for the α-quantile.Moreover, if g is strictly increasing, S g is a strictly consistent selective scoring function for the α-quantile relative to any class M of distributions such that g is M-integrable; see Gneiting (2011b).
is a non-negative M d -consistent exhaustive scoring function for the functional where for each k ∈ R d the function g k : R → R is non-decreasing and S k is given at (5.5), is a non-negative M d -consistent exhaustive scoring function for 3), and (4) hold, if g k is strictly increasing for all k ∈ R d and if π 1 , π 2 are strictly positive, then the restriction of S π 1 ,π 2 defined at (5.6) Theorem 5.2(ii) suggests that there is again the possibility to consider Murphy diagrams to assess the quality of forecasts for (T VaRα , R ESα ) simultaneously over all scoring functions given at (5.6).However, a direct implementation would amount to defining them on the 2d-dimensional Euclidean space.If one further decomposes the functions g k in the spirit of Ehm et al. (2016), one would even end up with a map defined on R × R d × R d .However, arguing along the lines of Ziegel, Krüger, Jordan, and Fasciati (2019), the measure π 1 only accounts for forecast accuracy in the Value at Risk component.Therefore, if interest focuses on the Expected Shortfall component, it makes sense to set π 1 = 0 to facilitate the analysis.This implies that one can consider the Murphy diagram with the elementary scores S k given at (5.5).The empirical formulation in the spirit of (3.16) is straight forward.

Examples and simulations
6.1.Consistency of the exhaustive scoring function for R In this subsection, we shall demonstrate the discrimination ability of the consistent exhaustive scoring functions constructed in Theorem 3.8 via a simulation study.We shall do so in the context of the prediction space setting introduced in Gneiting and Ranjan (2013).That means we explicitly model the information sets of each forecaster.
For the sake of simplicity and following Gneiting, Balabdaoui, and Raftery (2007) and Fissler and Ziegel (2019a) we choose to consider prediction-observation-sequences that are independent and identically distributed over time.Despite this simplification, there is still a variety of parameters to consider in the simulation study: (i) the dimension of the financial system d, (iii) the aggregation function Λ; (iv) the scalar risk measure ρ; (v) the competing forecasts A t and B t , along with their joint distributions with Y t ; (vi) the measure π (and thus the scoring function S R,π ); (vii) the time horizon N .
We confine ourselves to the following choices of these parameters.
(i)-(iii) We work with two different combinations of Y t and Λ.In both cases, we work with a system with d = 5 participants.
(a) The vector Y t models the gains and losses of the participants in the system.At any time point t, Y t = µ t + t where µ t follows a 5-dimensional normal distribution with mean 0, correlations 0.5 and variances 1, and t follows a 5-dimensional standard normal distribution.Moreover, µ t and t are independent for all t.Thus, conditionally on µ t , Y t has distribution N 5 (µ t , I 5 ), whereas unconditionally, Y t ∼ N 5 (0, Σ) with (Σ) ij = 0.5 for i, j = 1, . . ., 5, i = j and 2 otherwise.The aggregation function Λ Amini et al. (2015), and we set β = 0.75.This way, both gains and losses influence the value of the aggregation function, however, the losses have a higher weight.Here and in what follows, x + and x − denote the positive and negative parts of x, such that x + = max(0, x) and x − = − min(0, x).

(b)
We consider an extended model of Eisenberg and Noe (2001); see Feinstein et al. (2017): The participants have liabilities towards each other, L ij,t represents the nominal liability of participant i towards participant j at time point t, i, j = 1, . . ., 5.Moreover, each participant i owes an amount L is,t to society at time point t.To simplify the simulations and shorten the computing time, we assume that the liabilities matrix is deterministic and constant in time, so that we can write L is instead of L is,t .Moreover, we denote by Ls the sum of all payments promised to society, i.e., Ls = d i=1 L is .The vector Y t represents the endowments of the participants at time point t.As suggested in Eisenberg and Noe (2001), if some of the endowments are negative, we introduce a so called sink node and interpret the negative endowments as liabilities towards this node.The value of the aggregation function Λ 2 corresponds to the sum of all payments society obtains in the clearing process as described in Eisenberg and Noe (2001).To simulate the endowments of the participants Y t , we assume that Y it = (µ it + it ) 2 for i = 1, . . ., 5 with µ t and t specified in (a).We construct the system in the following way: • The probability of a participant owing to another participant is 0.8.If there is a liability from i to j, its nominal value is 2. • In addition, each participant owes 2 to the society.
(iv) In setting (a), we consider the scalar risk measures VaR α , α ∈ (0, 1), defined at (5.1), and its expectile-based version defined as EVaR τ (X) = −e τ (X), τ ∈ (0, 1), where e τ satisfies the equation & Powell, 1987).For the interpretation of expectile-based risk measures in finance we refer to Bellini and Di Bernardino (2017) and to Ehm et al. (2016) for a novel economic angle on expectiles.In case (b), however, the aggregation function Λ 2 takes nonnegative values only and therefore any financial system would be deemed acceptable when working with ρ = VaR α or ρ = EVaR τ .Following Feinstein et al. ( 2017) we overcome this issue by considering the shifted risk measure ρ(X) = ρ(X) + 0.9 Ls , where ρ = VaR α or ρ = EVaR τ , thus considering the system acceptable if VaR α or EVaR τ of the amount that society obtains from the nodes is at most −0.9 Ls .Using the standard identification functions for VaR α and EVaR τ (Gneiting, 2011a), the selective identification functions for R 0 are the following: • for ρ(X) = VaR α (X) + a: V R 0 (k, y) = α − 1{Λ(k + y) − a ≤ 0}; (6.1) • for ρ(X) = EVaR τ (X) + a: We consider two ideal forecasters with different information sets: Anne has access to µ t and uses the correct conditional distribution of Y t given µ t for her predictions.That is, she issues A t = R(N 5 (µ t , I 5 )) = R(N 5 (0 5 , I 5 )) − µ t in case (a) and A t = R(N 5 (µ t , I 5 ) 2 ) in case (b) for each t = 1, . . ., N .Here, we use the notation N d (m, Σ) 2 for the distribution of a random variable Y = X 2 where X ∼ N d (m, Σ).
Bob is uninformed and issues the climatological forecast.That is, he uses the correct unconditional distribution of Y t for his forecasts.Therefore, he constantly predicts B t = R(N (0 5 , Σ)) in case (a) and B t = R(N (0 5 , Σ) 2 ) in case (b).
(vi) We choose π to be a 5-dimensional Gaussian measure with mean m ∈ R 5 and covariance I 5 .To enhance the discrimination ability of the score S R,π , we aim at choosing m close to the boundary of R(Y t ).Here we work with m = 2 • 1 as this value appears to be fairly close to the (deterministic) forecasts of Bob in all four cases.This choice of π turns out to be beneficial with respect to the integrability considerations and renders our scores finite.Indeed, since V R 0 for ρ = VaR α +a is bounded, it is π ⊗ F -integrable for any finite measure π.In the case of ρ = EVaR τ +a, more considerations are necessary.From the construction of Λ 2 it is clear that it is a bounded function, in particular, the values lie in the interval 0, d i=1 L is .This in turn implies that the identification function V R 0 is bounded.Therefore V R 0 is π ⊗ F -integrable for any finite measure π.Finally, since Λ 1 only grows linearly and both π and Y t are Gaussian, the integrability is also guaranteed in this case.
(vii) We work with sample sizes N = 250, being a good proxy for the number of working (and trading) days in a year.
To compare Anne's with Bob's forecast performance, we employ the classical Diebold-Mariano test (Diebold & Mariano, 1995) based on the scoring functions S R,π of the form at (3.12) arising from our choice of π and identification functions introduced in (6.1) and (6.2).We repeat the experiment 1 000 times for setting (a) and 100 times for setting (b), since due to the presence of clearing, the computation time tends to be quite lengthy in setting (b).We approximate π with a Monte Carlo draw of size 100 000.The computations are performed with the statistics software R, and in particular its Rcpp package to also integrate parts of C++ code to enhance the computational speed.
We consider tests with two different one-sided null hypotheses.The null hypothesis or in short H 0 : A B, means that Bob has a better forecast performance than Anne, evaluated in terms of S R,π .On the contrary, H 0 : A B stands for ] asserting that Anne's forecasts are superior to Bob's in terms of S R,π .In Table 1 we report the relative frequencies of the rejections for the respective null hypotheses.Invoking the sensitivity of consistent scoring functions with respect to increasing information sets established in Holzmann and Eulert (2014), we expect that Anne's forecasts are deemed superior to Bob's predictions.And in fact, the null A B is never rejected for either scenario, while A B is rejected in between 74% and 100% of all experiments over the various scenarios.In particular, with rejection rates for H 0 : A B between 0.94 and 1, we observe that the discrimination ability between Bob and Anne is considerably higher for model (a) as opposed to (b) where we yield rejection rates ranging from 0.74 to 0.90.This might be due to the fact that Λ 1 is unbounded whereas Λ 2 only takes values between 0 and Ls , which might translate into a smaller influence of the predictive distributions upon which the forecasts are based.Moreover, both in case (a) and (b), the number of instances when Anne's forecasts are preferred over Bob's ones is higher for ρ = EVaR α +a than for ρ = VaR α +a.

Murphy diagrams
In this subsection, we illustrate the use of Murphy diagrams, following Corollary 3.12.
To allow for graphical illustrations, we reduce the dimension to d = 2, translating case (a) of subsection 6.1 to d = 2.In particular, we have Y t = µ t + t where µ t follows a 2-dimensional normal distribution with mean 0, variances 1 and correlations 0.5, and t follows a 2-dimensional standard normal distribution.As the scalar risk measure ρ we only consider VaR 0.05 and we use the aggregation function Λ 1 : R 2 → R, i.e., Λ(x) = 0.25(x . Besides focused Anne and climatological Bob introduced above both using their respective information sets ideally, we also consider Celia.Just like Anne, Celia has access to µ t resulting in the same information set.However, she misinterprets it and issues sign-reversed forecasts C t assuming that In the left panel of Figure 2 we illustrate the differences of empirical Murphy diagrams [−5, 5] where f 1t , f 2t stand for one of the three considered forecasts, A t , B t or C t .In each pairwise comparison, we choose f 1t to be inferior to f 2t such that we expect a non-negative difference of the corresponding Murphy diagrams.Indeed, only in the comparison between Bob and Anne, there are some k where ŝ250,f 1 (k) − ŝ250,f 2 (k) < 0. For the remaining regions and situations, the Murphy diagrams behave consistently with our expectations.For all three pairwise comparisons, one can nicely recognise the region where the respective two forecasts differ, resulting in a positive score difference depicted in blue.This bluish region seems to correspond to a blurred version of the boundary of the considered risk measure.Interestingly, while these regions illustrating positive score differences are similar in shape and location for the two pairs involving Celia, this region seems to be slightly translated to the upper right corner in the comparison between Anne and Bob.Quite intuitively, the magnitude of the score difference with a maximum of approximately 0.05 is smaller in the joust between the two ideal forecasts issued by Bob and Anne in comparison to the situations involving the sign-reversed Celia where the maximal difference between the Murphy diagrams is larger than 0.15.We have performed this experiment several times and observed that the stylised facts are qualitatively stable.For transparency reasons, we have depicted the first experiment performed, but we report some more experiments in Fissler et al. (2019b).
In the right panel of Figure 2 we depict the results of pointwise comparative backtests using the traffic-light illustration suggested in Fissler et al. (2016), which is akin to the three-zone approach of the Bank for International Settlements (2013, pp. 103-108).That is, we perform a Diebold-Mariano test using the elementary score S R,k for each k in a grid of [−5, 5] 2 .This means, we would like to see whether the superiority of the forecasts f 1t is recognised at a significance level of 0.05 deploying the two possible one-sided null hypotheses H + 0 : f 1 f 2 and H − 0 : f 1 f 2 , using the notation introduced in the previous subsection.If for a certain k the null H + 0 is rejected deeming f 2 significantly superior to f 1 , we colour the corresponding k in green.Similarly, if the null H − 0 is rejected, considering f 1 to be superior to f 2 , we illustrate k in red.For all k in the yellow region, none of the two nulls is rejected, meaning that the procedure is indecisive at the significance level 0.05.Finally, the grey area corresponds to those points where the score difference is constantly zero for all t = 1, . . ., N .Due to the vanishing variance, a Diebold-Mariano test is apparently not possible there.But clearly, this still means that the two forecasts are just equally good in that region.The specific results nicely correspond to the situations obtained in the left panel of Figure 2.For all three pairwise comparisons and for k close to the four corners of the area [−5, 5] 2 , the score differences identically vanish, resulting in a grey colouration.Again, in all three cases, there is a "continuous" behaviour in that the grey region adjoins a yellow stripe before turning into a fairly broad green stripe.For the comparisons involving Celia, clearly using an inferior predictive distribution to both Anne's and Bob's, it is reassuring that a substantial region is coloured in green.In this region, the procedure is decisive, deeming Celia significantly inferior to Anne and to Bob.Moreover, for this particular simulation, there is no red region.The situation comparing the two ideal forecasters Anne and Bob is somewhat more involved.While most of the previous observations also apply to that situation, there is a small red stripe close to the upper right corner.For k in that region and for this particular simulation, this means that Bob's forecasts outperform Anne's ones.While this observation is somewhat unexpected, it reflects the finite sample nature of the simulation, rendering such outcomes possible.Having a look at some more experiments, the results of which are again reported in Fissler et al. (2019b), shows that this red region is not stable over different simulations (which would clearly violate the sensitivity of consistent scoring functions with respect to increasing information sets established in Holzmann and Eulert (2014)), but it moves and occasionally also vanishes (on the region [−5, 5] 2 considered).Interestingly, in all events with a red region present, this red region was still roughly located in a similar area.

Discussion
As mentioned in the introduction, the aim and main contribution of this paper consists of establishing selective identifiability results in Theorem 3.1 and exhaustive elicitability results in Theorem 3.8 for systemic risk measures sensitive with respect to capital allocations.Notably, the construction of consistent exhaustive scoring functions relies on a has the same distribution as the observations, via where S R : is a strictly consistent exhaustive scoring function for R.Under suitable conditions, the M -estimator R(Y ) at (7.1) is consistent for R(Y ).However, computationally, the optimisation problem at (7.1) might be rather expensive, if feasible at all.The reason is that one needs to optimise over the collection of all closed upper sets of R d .
Regression.A closely connected concept to the notion of M -estimation is regression where it is possible to bypass the complication to optimise over a collection of sets.Consider a time series (X t , Y t ) t∈N .Sticking to the usual denomination, let Y t denote the response variable, taking values in R d , and let X t be a p-dimensional vector of regressors.The regressors might consist of quantities which seem relevant to the systemic risk of the financial system.Examples include macroeconomic quantities such as GDP, unemployment, inflation, net-investments etc.Let Θ ⊆ R q be a parameter space and let M : R p × Θ → F(R d ; R d + ) be a parametric model taking values in the collection of closed upper subsets of R d .Suppose the model is correctly specified in that there exists a unique parameter θ 0 ∈ Θ such that R(F Yt|Xt ) = M (X t , θ 0 ) P-a.s. for all t ∈ N. (7.2) Here, R : is a law-invariant risk measure of the form at (2.1) satisfying the conditions of Theorem 3.8(iii).Further, suppose that the (regular version of the) conditional distribution F Yt|Xt of Y t given X t is an element of M d 0 almost surely, where we use the notation of Theorem 3.8.Note that the time series does not need to be strongly stationary, but only the conditional distribution F Yt|Xt needs to satisfy the 'semi-parametric stationarity condition' specified via (7.2).Let S R,π be a strictly M d 0 -consistent exhaustive scoring function for R.Then, under certain mixing and integrability assumptions specified in (White, 2001, Corollary 3.48) one yields the following Law of Large Numbers for all θ ∈ Θ.It is essentially a uniform version (in the parameter θ) of this Law of Large Numbers result which yields the consistency for the empirical estimator for θ 0 ; see Huber and Ronchetti (2009); Nolde and Ziegel (2017); van der Vaart (1998) for details.The advantage of this regression approach in comparison to M -estimation is that the optimisation at (7.3) needs to be performed over a subset Θ of R q only (which is often assumed to be compact).This makes the result computationally a lot more feasible than the optimisation procedure over a collection of upper sets.In other words, M -estimation can be considered as a special instance of regression where the regressor X t is constant and where Θ corresponds to F(R d ; R d + ).Besides the usual practical challenge of constructing reasonable parametric models M to model the systemic risk of a financial system Y t given regressors X t , we see some interesting theoretical problems related to this regression framework.While, under correct model specification given at (7.2), any strictly consistent scoring function S R,π induces a consistent estimator θ N at (7.3), the estimator will generally depend on the choice of S R,π (or π) in finite samples.Moreover, the efficiency of the estimator θ N , expressed in terms of the asymptotic variance of √ N ( θ N − θ 0 ), will depend on the choice of S R,π , suggesting an interesting optimality criterion for S R,π .A very modern and interesting approach circumventing this issue is to perform regression simultaneously with respect to the class of all consistent scoring functions (or a reasonably large subclass), which is explored in the recent paper Jordan et al. (2019).To perform this efficiently, the mixture representation of scoring functions in terms of elementary scores might prove beneficial.We defer this interesting problem to future research.

Acknowledgement
We would like to express our sincere gratitude to Tilmann Gneiting and Johanna Ziegel for insightful discussions and persistent encouragement.We are indebted to Timo Dimitriadis and to Peter Barančok who provided helpful comments in the context of equivariant scores, to Yuan Li for his careful proofreading of an earlier version of this paper, and to Lukáš Šablica for helpful programming advice on the simulation part of this project.Tobias Fissler gratefully acknowledges financial support from Imperial College London via his Chapman Fellowship and the hospitality of the Institute for Statistics and Mathematics at Vienna University of Economics and Business during several research visits when main parts of the project have jointly been developed.Thus ρ is identifiable with a strict selective M-identification function V ρ : R × R → R, V ρ (s, x) = V R 0 (0, η(x + s)).

A. Appendix
For the proof of Theorem 3.8, we need the following lemma.
Lemma A.1.Let A 1 , A 2 ∈ F(R d ; R d + ).Then, the symmetric difference A 1 A 2 = (A 1 \ A 2 ) ∪ (A 2 \ A 1 ) is empty if and only if its interior, int(A 1 A 2 ), is empty.
Proof of Lemma A.1.
Assume that there is an x ∈ A 1 A 2 .Without loss of generality, we can assume that x ∈ A 1 \ A 2 .If x ∈ int(A 1 \ A 2 ), we are done.Hence, let x ∈ (A 1 \ A 2 ) \ int(A 1 \ A 2 ) which implies that x ∈ ∂(A 1 \ A 2 ), where ∂(A 1 \ A 2 ) denotes the boundary of Due to the definition of the boundary, this means that for all ε > 0 it holds that B ε (x) ∩ A 1 = ∅, where B ε (x) is the open ball with centre x and radius ε.Assume that for all ε > 0 we have B ε (x) ∩ A 2 = ∅, then x ∈ Ā2 = A 2 , which is a contradiction to the assumption that x ∈ ∂A 1 \ A 2 .That means there exists an ε 0 > 0 such that B ε 0 (x) ∩ A 2 = ∅.Moreover, since A 1 is an upper set, x + R d ++ is a non-empty open subset of A 1 .Furthermore, we see that B ε 0 (x) ∩ (x + R d ++ ) is a non-empty open subset of A 1 which is disjoint from A 2 .This means that int(A 1 \ A 2 ) = ∅.
where the first integral is strictly positive and the second one is non-negative (and strictly positive if and only if π 2 (A ∩ K) > 0).
be some subclass of d-dimensional random vectors.From a risk management perspective, a random vector Y = (Y 1 , . . ., Y d ) ∈ Y d represents the respective gains and losses of a system of d financial firms.That is, positive values of the component Y i represent gains of firm i and negative values correspond to losses.Let M d be the class of probability distributions of elements of Y d .Let Λ : R d → R be an aggregation function meaning that it is non-decreasing with respect to the componentwise order.An aggregation function is typically, but not necessarily, assumed to be continuous or even concave.We introduce Y ⊆ L 0 (Ω; R) where {Λ(Y ) | Y ∈ Y d } ⊆ Y and let M be the class of distributions of elements of Y.
and where for any two sets A, B ⊆ R d , A + B := {a + b | a ∈ A, b ∈ B} denotes the usual Minkowski sum.Mutatis mutandis, the same is true for R ins .Following the notation of Feinstein et al. (2017), we denote the collection of upper sets in R d with ordering cone R d + as

Figure 1 :
Figure 1: A graphical illustration of Equation (3.13) for dimension d = 2. Suppose the red region corresponds to the correctly specified risk measure R(F ) and the blue region corresponds to some misspecified forecast A. The score difference SR,π (A, F ) − SR,π (R(F ), F ) is an integral of VR 0 (•, F ) over R(F ) \ A (the red only region), plus an integral of − VR 0 (•, F ) over A\R(F ) (the blue only region).
and for all k ∈ R d .Moreover, if ρ and Λ are positively homogeneous, so is R in the sense that R(cY ) = cR(Y ) for all Y ∈ Y d and c > 0; see Lemma 4.2.Patton (2011),Nolde and Ziegel (2017) and Fissler Definition 4.1 (Homogeneity, translation invariance).(i) We call a function f : R d → R positively homogeneous of degree b ∈ R if f (cx) = c b f (x) for all c > 0 and for all x ∈ R d .

Lemma 4. 2 .
If ρ is a positively homogeneous scalar risk measure and Λ is positively homogeneous of any degree b ∈ R, R as defined at (2.1) is positively homogeneous, i.e. for all c > 0 and Y ∈ Y d , R(cY ) = cR(Y ).Lemma 4.3.Assume that ρ : M → R admits a strict M-identification function V ρ : R× R → R. Then the following holds for V R 0 : R d × R d → R defined at (3.1): , in general, fails to have the exhaustive CxLS property for d ≥ 2, ruling out its exhaustive elicitability.Therefore, we need slightly more information thanR VaRα (Y ) = {k ∈ R d | VaR α (Λ(Y + k)) ≤ 0}in the other component to render the pair involving R ESα elicitable.Note that R VaRα (Y ) only encodes information about the sign of VaR α (Λ(Y + k)) for each k ∈ R d .Apart from k in the boundary of R VaRα (Y ) we know nothing about the actual size of VaR α (Λ(Y + k)).However, the positive result about the elicitability of the pair (VaR α , ES α ) actually exploits the fact that for the scoring function S α (x, y) = −(1{y ≤ −x} − α)x/α − 1{y ≤ −x}y/α, x, y ∈ R, VaR α (F ) is the minimiser of the expected score while ES α (F ) is its minimum; see

Figure 2 :
Figure 2: Left panel: Differences of empirical Murphy diagrams ŝ250,f 1(k) − ŝ250,f 2 (k) from (3.16) versus k ∈ R 2 .Right panel: Three-zone traffic light illustration of pointwise comparative backtests followingFissler et al. (2016).The green area corresponds to the region where the null H + 0 : f 1 f 2 is rejected, the red one is where H − 0 : f 1 f 2 is rejected, at level 0.05, respectively.Yellow means that neither H + 0 nor H − 0 are rejected.In the grey region, the two Murphy diagrams identically coincide.27

A. 1 .
Figure 3: A graphical illustration of the proof of Proposition 3.6 for dimension d = 2.Suppose the blue region corresponds to the correctly specified risk measure R(F ).In the left picture, EAR w (F ) is a singleton, containing only point A.Point B corresponds to case (i), whereas points C and D are examples of case (ii) for points that are not in EAR w (F ).In the right picture, EAR w (F ) = ∅.For any k ∈ R d there is some x ∈ w ⊥ such that VR 0 (k + x, F ) > 0.Proof of Proposition 3.7.The 'only if' part is a special case of Theorem 3.1.For the 'if' part, assumeV R 0 : R d × R d → R is a strict selective M d -identification function for R 0 .For any Y ∈ Y d it holds that E[V R 0 (0, Y )] = 0 ⇔ 0 ∈ R 0 (Y ) ⇔ ρ(Λ(Y )) = 0.Then we obtain that for any s ∈ R and any X ∈ Y ρ(X) = s ⇐⇒ ρ(X + s) = 0 ⇐⇒ E[V R 0 (0, η(X + s))] = 0.

Table 1 :
VaR 0.01 VaR 0.05 EVaR 0.01 EVaR 0.05 Ratios of rejections of the null hypotheses at significance level 0.05.