Regression-type analysis for multivariate extreme values

This paper devises a regression-type model for the situation where both the response and covariates are extreme. The proposed approach is designed for the setting where the response and covariates are modeled as multivariate extreme values, and thus contrarily to standard regression methods it takes into account the key fact that the limiting distribution of suitably standardized componentwise maxima is an extreme value copula. An important target in the proposed framework is the regression manifold, which consists of a family of regression lines obeying the latter asymptotic result. To learn about the proposed model from data, we employ a Bernstein polynomial prior on the space of angular densities which leads to an induced prior on the space of regression manifolds. Numerical studies suggest a good performance of the proposed methods, and a finance real-data illustration reveals interesting aspects on the conditional risk of extreme losses in two leading international stock markets. Supplementary Information The online version contains supplementary material available at 10.1007/s10687-022-00446-6.

sample behavior of the posterior given W 1 , . . . , W k independent and identically distributed observations sampled from an angular density h 0 . This means that the asymptotic setup examined here should be regarded as a major simplification, in the sense that in practice margins have to be estimated and the angular density is a limiting object. Still, we believe that in a similar spirit as Sabourin and Naveau, the asymptotic analysis provided below reveals already some interesting insights on the large sample behavior under a Bayesian setup of the methods devised herein.
In what follows the prior probability mass function for J is denoted by p(J), whereas p(πα : α ∈ F ) is the prior density for the m − d free parameters, which assumed to be induced by the prior density for all m parameters, the latter being denoted by p(πα : |α| = J).
Theorem 1 Suppose that H 0 ∈ H . In addition, suppose that p(J) > 0 for all J ∈ N and that g J (u) := p(πα : α ∈ F ) ∝ p(πα : is positive over ∆ m−d , and it is the density of an absolutely continuous distribution function with respect to Lebesgue measure in m − d dimensions. Then, the random Bernstein angular density is weakly consistent at H 0 . Proof The strategy of the proof is similar to that of Petrone and Wasserman (2002). First note that, Equation (1.2) follows from the uniform approximation of multivariate Bernstein polynomials (Barrientos et al. 2015, Section 4.1), and the assumption that h 0 (w) is bounded away from zero; indeed, together these imply that there exists J 0 ∈ N such that b{w; J, π(h 0 )} is bounded and bounded away from zero, for any J ≥ J 0 , and thus log h 0 (w) b{w; J, π J (h 0 )} < M, for any J ≥ J 0 . The upshot of (1.2) is that for any ε > 0 there exists J 0 such that K(h 0 , b 0 ) < ε, with b 0 (·) := b{·; J 0 , π J0 (h 0 )}. Next, we will show that for any ε > 0, there exists δ > 0 such that . To see this, note first that as dir d (w; α) ≤ (J − 1) × · · · × (J − d + 1), for all w and all α ∈ N d such that |α| = J ≥ d. Thus, there exists δ > 0 sufficiently small such that b * 0 (w) is bounded and bounded away from zero, for any π J0 ∈ Nε, and hence for all w ∈ ∆ d and π J0 ∈ N δ . This proves (1.3). We are now ready to claim that The final result is now a trivial consequence of Schwartz theorem.

⊓ ⊔
Theorem 1 warrants some remarks. The assumption that h 0 is bounded away from zero (i.e., inf w∈∆ d h 0 (w) > 0) that is made in Proposition 1 can be easily relaxed using a similar argument as in Petrone and Wasserman (2002, pp. 84-85). The proof of Proposition 1 uses the fact that dir d (w; α) ≤ (J − 1) × · · · × (J − d + 1), for all w and all α ∈ N d such that |α| = J ≥ d. Such inequality trivially extends a related claim made by Petrone and Wasserman (2002) for the Beta density and for completeness we include here a proof of this result. Our proof will resort to the following Stirling double inequality e 3/2{1−log(3/2)} n n+1/2 e −n < n! < e n n+1/2 e −n , (1.4) which can be found in Feller (1967, Eq. (9.5)), and which holds for any n ∈ N.
Lemma 1 The density of the Dirichlet distribution obeys the following inequality, Then, by evaluating the density of the Dirichlet distribution at its mode it follows that Next, we show that a ≤ 1 from where the final result follows. Note first that, (1.5) where the first inequality follows from Stirling inequality (1.4) and where the second inequality follows by noticing that f (α) The fact that a ≤ 1 now follows by observing that g(d) in (1.5) is decreasing and hence a ≤ g(d) ≤ g(2) ≈ 0.65 ≤ 1. So far we have assumed that α ∈ N d \{1 d }, and to finish the proof we just need to consider the remainder cases for α. If α = 1 d , then b d (α) = (d − 1)! and hence dir d (w; 1 d ) = (d − 1)! ≤ b d (α) as required. Finally, for the last case suppose without loss of generality that α 1 = 1 and that α j ≥ 1 for j = 2, . . . , d.

⊓ ⊔
2 Details on the Lambert W function The Lambert W function is used in the paper for deriving the regression manifold for the logistic model (see Example 1 and Appendix D), and thus we offer here some details on it. Formally, the Lambert W function is a set of functions representing the inverse relation of the function f (z) = ze z for any complex z. Since we deal only with positive real valued z, the equation f (z) = ze z has only one solution w = W (z), with W being the principal branch of the Lambert W function. A useful property of this function is that for any constant a ∈ R one has   In this section we report the results of a further simulation scenario under a discrete angular measure. Here, we use the same MCMC configuration, prior specification, and simulation study setup as those set in Section 4 in the paper. The simulation scenario considered next is based on the max factor model of Einmahl et al. (2012, Example 2). Specifically, let Z 1 , . . . , Zr be a sequence of independent unit Frechét random variables, and define the bivariate random vector where a ij ≥ 0, for any i, j, and a i1 + a i2 > 0, for i = 1, . . . , r. Then, the associated angular measure is discrete and has r atoms given by with atom i having mass 0.
In both scenarios, the a ij were fixed by sampling once from a standard exponential distribution, and we then conduct the Monte Carlo simulation given those fixed values of a ij . Figure 3.1 displays the outcome of a one shot experiment along with the results from a Monte Carlo study for r = 10 and r = 50. As it can be seen from the latter chart, the fitted angular measure recovers the true target reasonably well for both cases, and as expected the performance improves with increasing r. Hence, despite the fact that our prior is defined on the space of continuous angular measures, the results indicate a satisfactory performance even for the case of a discrete angular measure.

Supporting Monte Carlo experiments
This section reports a number of supplementary Monte Carlo experiments. We have repeated the Monte Carlo simulation study from the paper by thresholding the radial component at its 95% quantile, rather than at the 98% quantile. The fits reported in Figure 3.4 are reasonably in line with those from Figure 4.3 in the paper. In addition, we have also repeated the simulation study from the paper but for n = 10 000, rather than for n = 5 000. Figure 3.2 depicts a moderate improvement in the fits when n = 10 000 in line with the expected frequentist behavior of the proposed Bayesian methodologies. Finally, we have also re-executed the simulation study from the paper but using the approach of Guan (2016) for choosing J, rather than that of Hanson et al. (2017).
Contrarily to the approach of Hanson et al. (2017) (which only requires fitting the model once), the approach of Guan (2016) requires fitting the Bernstein polynomial model several times, for a sequence of values of J, and then sets the optimal J as the changepoint of the log likelihood ratio over a set of consecutive model degrees. Figure 3.3 should be compared with Figure 4.3 in the paper and it showcases that our strategy for choosing J has a comparable performance, if not superior, with respect to the rule of thumb of Guan (2016).

Induced prior for p-covariate setting
In this section we report two one-shot numerical experiments aimed at illustrating the approach in Section 3.2 in the paper, that induces a prior on the space of all regression manifolds by resorting to Bernstein polynomials and an approximation of a multivariate GEV density due to Cooley et al. (2012). For the numerical experiments in this supplementary material, we test our model by taking a trivariate logistic extreme value distribution with dependence parameter α = 0.1 ('strongly' dependent extremes) for the case p = 2, i.e. with the trivariate GEV distribution We generate two samples of sizes n = 10 000 and n = 20 000 which, after thresholding at 95% empirical quantiles of the pseudo-radius, yield k = 500 and k = 1000 data points to fit the model. Here, we use a similar prior specification and MCMC setup as in Section 4.1 of the paper.  .5 indicates that the proposed estimator of the angular density captures reasonably well the dependence between extremes by concentrating around the barycenter of the simplex, though in a less pronounced form than the true density. As can be seen from Figure 3.6, the resulting fitted regression lines resemble the true ones, Lq, and increasing sample size improves the fit as the lateral surfaces of the estimates become more slanting, for q ∈ {0.3, 0.5, 0.7}.

Comparing exact and limiting regression manifold for logistic model
Here we illustrate how the exact and limiting regression manifold for logistic model compare; see Appendix C for details on the derivation of these. As it can be seen from Figures 4.1-4.2, the linearly approximated regression manifold derived in Appendix C.1 in the paper offers a sensible approximation of the true regression manifold, for large values of x.

Further empirical analysis
In this section we present results of testing multivariate regular variation applying the methods of Einmahl et al. (2021) to negative log-returns of the NYSE and NASDAQ. As can be seen from Figure 5.1, at a 5% significance level there is no evidence to reject that the pair (NYSE, NASDAQ) follows a MRV distribution for a broad range of thresholds. In addition, we also present below the reverse analysis to that presented in Section 5 of the paper; that is, here NASDAQ is the response, whereas NYSE is taken as covariate. Figure 5.3 is thus the equivalent of Fig. 5.2 in the paper but for the reverse analysis; and the same applies to Table 1, which is the reverse analysis equivalent of Table 1 in the paper. Interpretations follow along the same lines as in Section 5 of the paper.
To supplement the analysis of this extremal asymmetry from the paper we have also fittedusing our Bernstein polynomial prior-the coefficient of extremal asymmetry (Semadeni 2020)    where A is the Pickands dependence function, that is, A(t) = 1 − t + 2 t 0 H(w)dw, for t ∈ [0, 1]. The obtained coefficient is 0.226, which confirms the extremal asymmetry foreseen in Figure 5.1 in the paper.
We further examine how our methods for learning about the regression manifold perform in terms of quantile verification score (QVS) of Bentzien and Friederichs (2014). Loosely speaking, the QVS is an expected quantile score which is defined by a check loss function of Koenker (2005), i.e. ρτ (u) = uτ I(u ≥ 0) + u(τ − 1)I(u < 0), and essentially shows how good a quantile forecast provided by a model is (the smaller the value the better). To calculate the QVS we split our data into train (first two-thirds of observations) and test (last third of observations) sets. We learn about the regression manifold thresholding the train set data at the 95% quantile and running a MCMC of length 10 000 and a burn-in of 4 000 with other parameters being the same. Figure 5.4 depicts QVS evaluated over a grid of quantiles on the unit interval and indicates that the exact approach outperforms the approximate one, with the approximated approach resulting from the combination of our random Bernstein polynomial prior with the approximation of Cooley et al. (2012, Proposition 1).