1 Introduction

Establishing an initial capital that insurance companies must hold to protect themselves from unexpected events over a fixed period is a primary goal of insurance regulation. The computation of this initial capital relies on a risk measure. For instance, the Solvency II directive (see Directive 2009/138/EC), which regulates the solvency capital requirement of the insurance companies in the European Union, adopts the Value-at-Risk (VaR) as a risk measure, even if, as evidenced by Artzner et al. (1999), it is not a coherent risk measure because it does not fulfil the sub-additivity property. Unlike VaR, the Conditional Value-at-Risk (CVaR), or Expected Shortfall, adopted by the Swiss Solvency Test (see Federal Office of Private Insurance (FOPI) 2004), represents a coherent risk measure and, as showed by Artzner (1999) and Acerbi and Tasche (2002), it is a more reliable risk measure than the VaR for evaluating the risk of financial positions or computing capital requirements. Supporting this evidence, Dhaene et al. (2006) provided a comprehensive analysis of the theoretical properties of well-known risk measures that can be applied in the context of solvency capital requirements. They also gave special attention to the class of distortion risk measures that includes the CVaR as a particular case, and they investigated the relationship between these risk measures and theories of choice under risk.

When connecting minimum capital requirements to risk measures, the standard approach consists in investing the minimum capital in a single security, which is often a risk-free asset. However, Balbás (2008) showed the non-optimality of such a strategy in many important cases, and Farkas et al. (2015) laid out new results for risk measures when investing in multiple eligible assets. The assumption of investing the minimum capital in a portfolio of assets implies the definitions of optimization problems that employ both the initial capital and portfolio weights as decision variables. In this sense, Mankai and Bruneau (2012) proposed an optimization problem with a CVaR constraint that maximizes the expected return on risk-adjusted capital and uses both the initial capital and portfolio weights as decision variables. Asimit et al. (2015) developed an optimization problem that minimizes the initial capital and optimally allocate it among the available financial assets under a ruin probability constraint. In this research line, Asanga et al. (2014) defined three optimization problems representing a dynamic improvement of the approach developed in Asimit et al. (2015). The first problem uses a ruin probability constraint as in Asimit et al. (2015), the second problem a CVaR constraint, and the third problem an expected policyholder deficit constraint. All the three problems include a portfolio performance constraint defined as a lower bound on the expected return on capital (ROC). Kaucic and Daris (2015) proposed an alternative approach to deal with multi-objective portfolio optimization problems with chance constraints and applied this optimization framework to an EU-based non-life insurance company that tries to minimize the risk of the discrepancy between assets and liabilities. The authors adopt shareholders’ capital and investment weights as decision variables.

In the present paper, we focus on the optimization problem with the CVaR constraint of Asanga et al. (2014). The standard approach for solving the problem relies on a Monte Carlo approximation that leads to a linear programming (LP) formulation (see Rockafellar and Uryasev 2000, 2002; Krokhmal et al. 2002), whose solution becomes computationally burdensome when the number of simulations is large. To tackle this issue, Asanga et al. (2014) proposed a semiparametric formulation showing its convexity when the insurer’s liability is lognormally distributed. Deepening this evidence, we find that the CVaR optimization problem in the semiparametric form is convex for any integrable liability distribution. Furthermore, when the insurer’s liability has a continuous distribution, the function defining the CVaR constraint is continuously differentiable allowing us to obtain its gradient. To solve the considered problem, we use the Kelley-Cheney-Goldstein (KCG) algorithm and show its convergence to an optimal solution.

We apply the CVaR optimization problem assuming three different liability distributions: a lognormal distribution, a gamma distribution, and a mixture of Erlang distributions with a common scale parameter. The lognormal and gamma distributions are unimodal distributions commonly used to describe insurance data. However, insurance data often present a multimodal shape and fatter tails than lognormal or gamma distributions. Hence, mixtures of Erlang distributions with a common scale parameter are more suitable for capturing such features (see, e.g., Lee and Lin 2010; Willmot and Lin 2011; Lee and Lin 2012; Verbelen et al. 2015). This class of distributions is dense in the space of positive continuous distributions (see Tijms 1994) and hence allows us to approximate any positive continuous distribution. Moreover, we can easily compute risk measures, such as VaR and CVaR, and aggregate risks because mixtures of Erlang distributions with a common scale parameter are closed under convolution and compounding (see Lee and Lin 2010, and the references therein for further properties of this class of distributions).

To conduct the numerical experiments, we use the insurer’s liability data set danishuni from the R package CASdatasets of Dutang and Charpentier (2020) comprising 2167 fire losses from January 1980 to December 1990. We adjust this data set using the U.S. inflation to reflect losses from January 2010 to December 2020. Regarding the assets, we use daily log-returns of the S&P 500 index and two exchange-traded funds that track the investment results of U.S. treasury and corporate bond indices. The numerical experiments show that the choice of the liability distribution plays a crucial role, and the most marked differences are between the mixture distribution and the other two distributions. Indeed, since the mixture distribution can describe the right tail of the empirical distribution of the insurer’s liability better than the other two distributions, it implies higher capital requirements. The differences among the three liability distributions depend on the available liability data set. If the empirical distributions are unimodal and with a not-too-fat right tail, the differences among the capital requirements computed with the three liability distributions are less evident. On the contrary, when the empirical distributions are multimodal, or their right tail is fat, mixtures of Erlang distributions with a common scale parameter are more suitable to the scope of capturing accurately the data distribution, and marked differences in the computed capital requirements arise in comparison to the other two distributions. This is a crucial aspect to investigate since, as evidenced by Lee and Lin (2010), actuaries often face heavy-tailed and/or irregular data.

The paper makes the following contributions. Firstly, we prove that the optimization problem in the semiparametric form is convex for any integrable insurer’s liability and that the function defining the CVaR constraint under the semiparametric formulation is continuously differentiable when the insurer’s liability has a continuous distribution. Secondly, we show how to implement the KCG algorithm to solve the optimization problem in the semiparametric form, and we prove its convergence to an optimal solution. Lastly, we are, to the best of our knowledge, the first to use the mixture of Erlang distributions to describe the insurer's liability in the context of capital requirement computation.

The rest of the article is organized as follows. In Sect. 2, after recalling the optimization problem with CVaR constraint and portfolio performance constraint of Asanga et al. (2014), we show how to obtain the semiparametric formulation. Furthermore, we derive some theoretical results regarding the optimization problem in the semiparametric form. We conclude Sect. 2 by reporting the KCG algorithm for solving the optimization problem in the semiparametric form and giving a proposition that states the algorithm convergence. In Sect. 3, we explain the method for generating asset log-return scenarios that we use in the semiparametric formulation of the optimization problem and show the form of the CVaR constraint in relation to the three distributions used to model the insurer’s liability. The empirical results with an in-sample and out-of-sample analysis are shown in Sect. 4. Finally, in Sect. 5, we conclude the article.

2 Optimization with CVaR constraint

Given a probability space \(\left( \Omega ,\mathcal {F},\mathbb {P}\right)\) and a set \(\mathcal {T}=\{0,1,\ldots ,T\}\) of trading dates, we consider a financial market made up of n assets with gross returns over the period \([t,t+1]\) being the components of the random vector \({\textbf {R}}_{t+1}=(R_{1,t+1},\ldots ,R_{n,t+1})'\). We denote by \(\mathcal {F}_t\) the historical information on the asset gross return evolution up to time t, that is, \(\mathcal {F}_t=\sigma ({\textbf {R}}_{1},\ldots ,{\textbf {R}}_{t})\), and set \(\mathbb {P}_t(\cdot )=\mathbb {P}(\cdot \vert \mathcal {F}_t)\) and \(E_t[\cdot ]=E[\cdot \vert \mathcal {F}_t]\).

The optimization problem at a generic time \(t\in \mathcal {T}\) involves a non-life insurance company with a one-period setting \([t,t+\tau ]\), where \(\tau\) is an integer representing the solvency horizon, \(\tau \le T-t\). We denote by \(p_t\) the premium collected from policyholders and available for investment at time t and suppose that the insurer provides a regulatory initial capital of size \(c_t\), i.e., \(p_t+c_t\) is the total amount that the insurer invests in t. The vector \({\textbf {x}}_t=(x_{1,t},\ldots ,x_{n,t})'\) contains the portfolio weights at time t, and it satisfies the budget constraint \(\sum _{i=1}^nx_{i,t}=1\) with no short sales allowed, i.e., \(x_{i,t}\ge 0\), for \(i=1,\ldots ,n\).

The univariate random variable (r.v.) \(Y_{t,t+\tau }\) represents the insurer’s liability over the period \([t,t+\tau ]\), and we assume that the payment of \(Y_{t,t+\tau }\) occurs in \(t+\tau\). Then, the insurer’s net loss over the solvency horizon is

$$\begin{aligned} L_{t,t+\tau }=Y_{t,t+\tau }-(p_t+c_t){\textbf {R}}_{t,t+\tau }'{} {\textbf {x}}_t, \end{aligned}$$

where \({\textbf {R}}_{t,t+\tau }\) denotes the gross return vector over \([t,t+\tau ]\) with the ith component equal to \(\prod _{\iota =1}^\tau R_{i,t+\iota }\).

The optimization problem has the portfolio weights \({\textbf {x}}_t\) and the capital requirement \(c_t\) as decision variables and minimizes \(c_t\) subject to two key constraints. One constraint is the solvency constraint that imposes a non-positive CVaR for \(L_{t,t+\tau }\). Given the confidence level \(\alpha \in (0,1)\), the CVaR of \(L_{t,t+\tau }\) in t is defined as

$$\begin{aligned} \text {CVaR}_t^\alpha (L_{t,t+\tau })=\frac{1}{1-\alpha }\int _\alpha ^1\text {VaR}_t^\beta (L_{t,t+\tau })\,d\beta , \end{aligned}$$

where

$$\begin{aligned} \text {VaR}_t^\beta (L_{t,t+\tau })=\inf \left\{ l\in \mathbb {R}:\mathbb {P}_t(L_{t,t+\tau }\le l)\ge \beta \right\} \end{aligned}$$

is the VaR of \(L_{t,t+\tau }\) in t at level \(\beta\). This definition of CVaR corresponds to that given in Kaas et al. (2008), where the name Tail-Value-at-Risk is used instead of CVaR. The CVaR of \(L_{t,t+\tau }\) in t can also be expressed as

$$\begin{aligned} \text {CVaR}_t^\alpha (L_{t,t+\tau })=\text {VaR}_t^\alpha (L_{t,t+\tau })+\frac{1}{1-\alpha }E_t\left[ \left( L_{t,t+\tau }-\text {VaR}_t^\alpha (L_{t,t+\tau })\right) _{+}\right] , \end{aligned}$$

where the positive part is defined as \((x)_+=\max (x,0)\), and it holds the characterization

$$\begin{aligned} \text {CVaR}_t^\alpha (L_{t,t+\tau })=\inf _{s\in \mathbb {R}}\left\{ s+\frac{1}{1-\alpha } E_t\left[ \left( L_{t,t+\tau }-s\right) _{+}\right] \right\} . \end{aligned}$$
(1)

The second constraint is a portfolio performance constraint defined as

$$\begin{aligned} E_t[\text {ROC}_{t,t+\tau }]\ge \gamma , \end{aligned}$$

where

$$\begin{aligned} \text {ROC}_{t,t+\tau }=-\frac{L_{t,t+\tau }}{c_t} \end{aligned}$$

is the return on capital over the period \([t,t+\tau ]\) and \(\gamma\) is the associated lower bound. Hence, the optimization problem with CVaR solvency constraint is given by

$$\begin{aligned} \min _{c_t,{\textbf {x}}_t}&\quad c_t \end{aligned}$$
(2)
$$\begin{aligned} \quad \text {s.t.}&\quad \text {CVaR}_t^\alpha (L_{t,t+\tau })\le 0, \end{aligned}$$
(3)
$$\begin{aligned}&\quad E_t[\text {ROC}_{t,t+\tau }]\ge \gamma , \end{aligned}$$
(4)
$$\begin{aligned}&\quad {\textbf {1}}'{} {\textbf {x}}_t=1,\;{\textbf {x}}_t\ge {\textbf {0}},\;c_t\ge 0, \end{aligned}$$
(5)

where \({\textbf {0}}\) and \({\textbf {1}}\) are vectors of size n with all the elements equal to 0 and 1, respectively, and \({\textbf {x}}_t\ge {\textbf {0}}\) means that the inequality must hold element-by-element. By equation (1), Problem (2)−(5) has the equivalent formulation

$$\begin{aligned} \min _{s,c_t,{\textbf {x}}_t}&\quad c_t \end{aligned}$$
(6)
$$\begin{aligned} \quad \text {s.t.}&\quad s+\frac{1}{1-\alpha }E_t\left[ \left( L_{t,t+\tau }-s\right) _{+}\right] \le 0, \end{aligned}$$
(7)
$$\begin{aligned}&\quad E_t[\text {ROC}_{t,t+\tau }]\ge \gamma , \end{aligned}$$
(8)
$$\begin{aligned}&\quad {\textbf {1}}'{} {\textbf {x}}_t=1,\;{\textbf {x}}_t\ge {\textbf {0}},\;c_t\ge 0, \end{aligned}$$
(9)

and the traditional solution method relies upon approximating the conditional expectation of the CVaR constraint with a Monte Carlo-type estimator that leads to an LP problem. Indeed, if we generate m scenarios for \(Y_{t,t+\tau }\) and \({\textbf {R}}_{t,t+\tau }\), namely \(Y_{t,t+\tau }(j)\) and \({\textbf {R}}_{t,t+\tau }(j)\) with \(j=1,\ldots ,m\), conditional on \(\mathcal {F}_t\) and set \({\textbf {z}}_t=(p_t+c_t){\textbf {x}}_t\), Problem (6)–(9) becomes

$$\begin{aligned} \min _{s,c_t,{\textbf {z}}_t}&\quad c_t \nonumber \\ \quad \text {s.t.}&\quad s+\frac{1}{m(1-\alpha )}\sum _{j=1}^{m}\left( Y_{t,t+\tau }(j)-{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-s\right) _{+}\le 0, \nonumber \\&\quad \frac{1}{m}\sum _{j=1}^m\left( {\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-Y_{t,t+\tau }(j)\right) \ge \gamma c_t, \nonumber \\&\quad {\textbf {1}}'{} {\textbf {z}}_t=p_t+c_t,\;{\textbf {z}}_t\ge {\textbf {0}},\;c_t\ge 0, \nonumber \end{aligned}$$

and it can be linearised by introducing the non-negative variables \({\textbf {y}}=(y_1,\ldots ,y_m)'\) as

$$\begin{aligned} \min _{s,c_t,{\textbf {z}}_t}&\quad c_t \nonumber \\ \quad \text {s.t.}&\quad s+\frac{1}{m(1-\alpha )}\sum _{j=1}^{m}y_j\le 0, \nonumber \\&\quad y_j\ge Y_{t,t+\tau }(j)-{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-s,\quad j=1,\ldots ,m, \nonumber \\&\quad \frac{1}{m}\sum _{j=1}^m\left( {\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-Y_{t,t+\tau }(j)\right) \ge \gamma c_t \nonumber \\&\quad {\textbf {y}}\ge {\textbf {0}},\;{\textbf {1}}'{} {\textbf {z}}_t=p_t+c_t,\;{\textbf {z}}_t\ge {\textbf {0}},\;c_t\ge 0. \nonumber \end{aligned}$$

When m is too large, solving this LP formulation becomes computationally burdensome, and alternative approaches are necessary. For instance, Asanga et al. (2014) proposed the following semiparametric reformulation

$$\begin{aligned} \min _{s,c_t,{\textbf {z}}_t}&\quad c_t \end{aligned}$$
(10)
$$\begin{aligned} \quad \text {s.t.}&\quad s+\frac{1}{m(1-\alpha )}\sum _{j=1}^{m}E_t^{(j)}\left[ \left( Y_{t,t+\tau }-{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-s\right) _{+}\right] \le 0, \end{aligned}$$
(11)
$$\begin{aligned}&\quad \frac{1}{m}\sum _{j=1}^m{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-\mu _{t,t+\tau }\ge \gamma c_t, \end{aligned}$$
(12)
$$\begin{aligned}&\quad {\textbf {1}}'{} {\textbf {z}}_t=p_t+c_t,\;{\textbf {z}}_t\ge {\textbf {0}},\;c_t\ge 0, \end{aligned}$$
(13)

where \(E_t^{(j)}[\cdot ]=E_t[\cdot \vert \mathcal {F}_t\bigvee \sigma \left( \{{\textbf {R}}_{t,t+\tau }={\textbf {R}}_{t,t+\tau }(j)\}\right) ]\) and \(\mu _{t,t+\tau }=E_t[Y_{t,t+\tau }]\). The symbol \(\bigvee\) denotes the smallest \(\sigma\)-algebra containing the two \(\sigma\)-algebras \(\mathcal {F}_t\) and \(\sigma \left( \{{\textbf {R}}_{t,t+\tau }={\textbf {R}}_{t,t+\tau }(j)\}\right)\). By defining the function

$$\begin{aligned} g(s,{\textbf {z}}_t)=s+\frac{1}{m(1-\alpha )}\sum _{j=1}^{m}E_t^{(j)}\left[ \left( Y_{t,t+\tau }-{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-s\right) _{+}\right] , \end{aligned}$$
(14)

the CVaR constraint (11) can be equivalently written as \(g(s,{\textbf {z}}_t)\le 0\). Problem (10)−(13) is convex because \(g(s,{\textbf {z}}_t)\) is a convex function. Indeed, for the convexity of \(g(s,{\textbf {z}}_t)\), it is sufficient to prove that the function

$$\begin{aligned} E_t^{(j)}\left[ \left( Y_{t,t+\tau }-{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t-s\right) _{+}\right] \end{aligned}$$
(15)

is convex in the variables s and \({\textbf {z}}_t\). Since the positive part is a convex function, we can easily verify that the function

$$\begin{aligned} h^{(j)}(l)=E_t^{(j)}\left[ \left( Y_{t,t+\tau }-l\right) _{+}\right] , \end{aligned}$$

is convex too. Then, the convexity of (15) follows because we have a convex function applied to a linear transformation (see, e.g., Theorem 5.7 in Rockafellar 1970).

To solve Problem (10)–(13), we use an iterative algorithm that, at each iteration, requires the gradient, or a subgradient, of the current solution for the CVaR constraint. Proposition 1 states that the CVaR constraint is continuously differentiable and provides the gradient of \(g(s,{\textbf {z}}_t)\) if \(Y_{t,t+\tau }\) has a continuous distribution.Footnote 1

Proposition 1

When the r.v. \(Y_{t,t+\tau }\) is continuous with cumulative distribution function \(F_{Y_{t,t+\tau }}\), the function \(g(s,{\textbf {z}}_t)\) is continuously differentiable with gradient

$$\begin{aligned} \nabla g(s,{\textbf {z}}_t) =\begin{bmatrix} 1+\frac{1}{m(1-\alpha )}\sum _{j=1}^mw(j)\\ \frac{1}{m(1-\alpha )}\sum _{j=1}^m\left[ w(j)\prod _{k=1}^\tau R_{1,t+k}(j)\right] \\ \vdots \\ \frac{1}{m(1-\alpha )}\sum _{j=1}^m\left[ w(j)\prod _{k=1}^\tau R_{n,t+k}(j)\right] \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} w(j)=-1+F_{Y_{t,t+\tau }}({\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t+s), \end{aligned}$$

for \(j=1,\ldots ,m\).

Proof

Since \(Y_{t,t+\tau }\) is a continuous r.v., Lemma 1 of Rockafellar and Uryasev (2000) ensures the differentiability of \(h^{(j)}(l)=E_t^{(j)}\left[ \left( Y_{t,t+\tau }-l\right) _{+}\right]\) with derivative

$$\begin{aligned} \frac{d}{dl}h^{(j)}(l)=-1+F_{Y_{t,t+\tau }}(l). \end{aligned}$$

Then, the statement of the proposition about the gradient of the function \(g(s,{\textbf {z}}_t)\) is just a consequence of the classical chain rule of multivariate calculus. The components of \(\nabla g(s,{\textbf {z}}_t)\) are all continuous functions, hence the function \(g(s,{\textbf {z}}_t)\) is continuously differentiable.

In general, for any integrable r.v. \(Y_{t,t+\tau }\), the function \(g(s,{\textbf {z}}_t)\) admits the subdifferential

$$\begin{aligned} \partial g(s,{\textbf {z}}_t) =\begin{bmatrix} 1+\frac{1}{m(1-\alpha )}\sum _{j=1}^mw(j)\\ \frac{1}{m(1-\alpha )}\sum _{j=1}^m\left[ w(j)\prod _{k=1}^\tau R_{1,t+k}(j)\right] \\ \vdots \\ \frac{1}{m(1-\alpha )}\sum _{j=1}^m\left[ w(j)\prod _{k=1}^\tau R_{n,t+k}(j)\right] \end{bmatrix}, \end{aligned}$$

where w(j) now represents the subdifferential of the function \(h^{(j)}(l)=E_t^{(j)}\left[ \left( Y_{t,t+\tau }-l\right) _{+}\right]\) evaluated at point \({\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t+s\). Specifically, w(j) is the interval [ab] where

$$\begin{aligned} a=-1+\mathbb {P}(Y_{t,t+\tau }<{\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t+s)\quad \text {and}\quad b=-1+\mathbb {P}(Y_{t,t+\tau }\le {\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t+s). \end{aligned}$$

When the distribution of \(Y_{t,t+\tau }\) does not have a probability mass at point \({\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t+s\), the subdifferential \(\partial g(s,{\textbf {z}}_t)\) is only composed of the gradient \(\nabla g(s,{\textbf {z}}_t)\) with the expression specified in Proposition 2.

To solve Problem (10)−(13), we propose applying the Kelley-Cheney-Goldstein (KCG) algorithm (see Kelley 1960). We assume that it holds \((s,{\textbf {z}}_t)\in R=\{(s',{\textbf {z}}'):|s'|\le \lambda ,0\le z'_{1}\le \lambda ,\ldots ,0\le z'_{n}\le \lambda \}\), with \(\lambda >0\) large enough, for each feasible point \((s,c_t,{\textbf {z}}_t)\). Then, the KCG algorithm for Problem (10)–(13) is Algorithm 1.

figure a

Proposition 2 states the convergence of the KCG algorithm for Problem (10)–(13).

Proposition 2

Suppose that the feasible set \(\mathcal {G}\) of Problem (10)–(13) is nonempty and that there exists a positive number \(\lambda\) such that \((s,{\textbf {z}}_t)\in R=\{(s',{\textbf {z}}'):|s'|\le \lambda ,0\le z'_{1}\le \lambda ,\ldots ,0\le z'_{n}\le \lambda \}\) for each \((s,c_t,{\textbf {z}}_t)\in \mathcal {G}\). Furthermore, suppose that there exists a positive number K such that

$$\begin{aligned} \sup \{||\xi ||:\xi \in \partial g(s,{\textbf {z}}_t),\;(s,{\textbf {z}}_t)\in R\}\le K. \end{aligned}$$

Let \((s^0,c_t^0,{\textbf {z}}_t^0)\) be an optimal solution to Problem (16)–(19) and \((s^{k+1},c_t^{k+1},{\textbf {z}}_t^{k+1})\) an optimal solution to Problem (20)–(24), then the sequence \(\{(s^k,c_t^k,{\textbf {z}}_t^k)\}\) contains a subsequence converging to an optimal solution for Problem (10)–(13).

Proof

Let \(S_0\) denote the feasible set of Problem (16)–(19) and \(S_{k+1}\) the feasible set of Problem (20)–(24), then

$$\begin{aligned} \mathcal {G}=S_0\cap G, \end{aligned}$$

where \(G=\{(s,c_t,{\textbf {z}}_t):g(s,{\textbf {z}}_t)\le 0\}\), and

$$\begin{aligned} \mathcal {G}\subset S_{k+1},\quad \forall k\ge 0, \end{aligned}$$
(16)

because, by definition of subgradient (see, e.g., Section 23 in Rockafellar 1970),

$$\begin{aligned} g(s,{\textbf {z}}_t)\ge g(s^k,{\textbf {z}}_t^k)+(\xi ^k)'\left( \begin{bmatrix} s\\ {\textbf {z}}_t \end{bmatrix}-\begin{bmatrix} s^k\\ {\textbf {z}}_t^k \end{bmatrix}\right) ,\quad \forall (s,{\textbf {z}}_t)\in R, \end{aligned}$$

where \((s^k,{\textbf {z}}_t^k)\in R\) and \(\xi ^k\in \partial g(s^k,{\textbf {z}}_t^k)\).

We split the sequence \(\{(s^k,c_t^k,{\textbf {z}}_t^k)\}\) into the two sequences \(\{c_t^k\}\) and \(\{(s^k,{\textbf {z}}_t^k)\}\). Since \(S_{k-1}\subset S_{k-2}\subset \cdots \subset S_0\), and the decision variable \(c_t\) corresponds to the objective function in Problem (10)–(13), the sequence \(\{c^k_t\}\) is monotone increasing. Moreover, the sequence \(\{c_t^k\}\) is bounded since

$$\begin{aligned} 0\le c_t^k={\textbf {1}}'{} {\textbf {z}}_t^k-p_t\le n\lambda -p_t,\;\forall k, \end{aligned}$$

and we conclude that \(\{c^k_t\}\) converges to a point \(c^*\). Hence, if \(\{(s^k,{\textbf {z}}^k_t)\}\) contains a subsequence \(\{(s^{k_j},{\textbf {z}}^{k_j}_t)\}\) converging to a point \((s^*,{\textbf {z}}^*)\in G'=\{(s,{\textbf {z}}_t):g(s,{\textbf {z}}_t)\le 0\}\), then \(\{(s^{k_j},c^{k_j}_t,{\textbf {z}}^{k_j}_t)\}\) converges to \((s^*,c^*,{\textbf {z}}^*)\) that solves Problem (10)–(13). Indeed, the point \((s^*,c^*,{\textbf {z}}^*)\) belongs to \(\mathcal {G}\) because \((s^*,c^*,{\textbf {z}}^*)\in G\) and it is an accumulation point for the compact set \(S_0\), so that \((s^*,c^*,{\textbf {z}}^*)\in S_0\). Moreover, there could not exist another point \((s',c',{\textbf {z}}')\in \mathcal {G}\) with \(c'<c^*\) because, otherwise, by the convergence of the monotone increasing sequence \(\{c_t^{k_j}\}\) to \(c^*\), there would exist an index i such that \(c_t^{k_j}>c'\) for each \(j>i\), thus contradicting the relation in (16).

Suppose now that \(\{(s^k,{\textbf {z}}^k_t)\}\) does not have a subsequence converging to a point in \(G'\). Then, there exists an \(\alpha >0\), independent of k, such that

$$\begin{aligned} g(s^h,{\textbf {z}}_t^h)\ge \alpha , \end{aligned}$$

for \(h=0,1,\ldots ,k\). If \((s^{k+1},c_t^{k+1},{\textbf {z}}_t^{k+1})\) solves Problem (20)–(24), then

$$\begin{aligned} g(s^h,{\textbf {z}}_t^h)+(\xi ^h)'\left( \begin{bmatrix} s^{k+1}\\ {\textbf {z}}_t^{k+1} \end{bmatrix}-\begin{bmatrix} s^h\\ {\textbf {z}}_t^h \end{bmatrix}\right) \le 0, \end{aligned}$$

for \(h=0,1,\ldots ,k\). From the last two relations and the Cauchy-Schwarz inequality, it follows that

$$\begin{aligned} \alpha \le g(s^h,{\textbf {z}}_t^h)\le (\xi ^h)'\left( \begin{bmatrix} s^{h}\\ {\textbf {z}}_t^{h} \end{bmatrix}-\begin{bmatrix} s^{k+1}\\ {\textbf {z}}_t^{k+1} \end{bmatrix}\right) \le K\left| \left| (s^{h},{\textbf {z}}_t^{h})-(s^{k+1},{\textbf {z}}_t^{k+1})\right| \right| , \end{aligned}$$

for \(h=0,1,\ldots ,k\). Hence, for every subsequence \(\{k_j\}\) of indices, we have

$$\begin{aligned} \left| \left| (s^{k_j},{\textbf {z}}_t^{k_j})-(s^{k_i},{\textbf {z}}_t^{k_i})\right| \right| \ge \frac{\alpha }{K},\quad i<j, \end{aligned}$$

that is, \(\{(s^k,{\textbf {z}}^k_t)\}\) does not have any Cauchy subsequence and this aspect contradicts that \(\{(s^k,{\textbf {z}}^k_t)\}\subset R\) is bounded.

3 Generation of asset log-return scenarios and liability modelling

To generate asset log-return scenarios, we use a moment-matching method. Given a sample of daily log-returns, we compute sample means, standard deviations, skewness, kurtosis, and correlations denoted by \(\mu\), \(\sigma\), \(\delta\), \(\kappa\), and \(\rho\), respectively. Assuming independent and stationary increments, we use the following formulae for sample statistics referring to a different time scale:

$$\begin{aligned} \mu _\tau&=\tau \,\mu , \end{aligned}$$
(26)
$$\begin{aligned} \sigma _\tau&=\sqrt{\tau }\,\sigma , \end{aligned}$$
(27)
$$\begin{aligned} \delta _\tau&=\frac{\delta }{\sqrt{\tau }}, \end{aligned}$$
(28)
$$\begin{aligned} \kappa _\tau&=3\frac{\tau -1}{\tau }+\frac{\kappa }{\tau }, \end{aligned}$$
(29)
$$\begin{aligned} \rho _\tau&=\rho , \end{aligned}$$
(30)

where \(\tau\) is the number of days in the new time scale. Given these target moments and correlations, to generate scenarios of log-returns over the period \([t,t+\tau ]\), we use the moment-matching method of Høyland et al. (2003) that ensures the matching of the first four moments for the marginal distributions and the matching of all the correlations. If \({\textbf {V}}_{t,t+\tau }(j)\) denotes the jth log-return scenario, then \({\textbf {R}}_{t,t+\tau }(j)=\exp ({\textbf {V}}_{t,t+\tau }(j))\) is the jth gross return scenario, where the exponential function is applied element-by-element. To evidence the differences with respect to the Asanga et al. (2014) approach, in Appendix B, we present an analysis where the generation of monthly log-returns is carried out with a multivariate generalized autoregressive conditional heteroskedastic (MV-GARCH) model, in particular, with the dynamic conditional correlation (DCC) model introduced by Engle (2002).

We model historical insurance data through three different continuous distributions: the lognormal distribution, the gamma distribution, and a mixture of Erlang distributions with a common scale parameter. The first two probability distributions are usually employed to fit historical data but they are unimodal, while empirical distributions show multimodal behaviour and fatter tails. In this sense, mixtures of Erlang distributions with a common scale parameter are more appropriate to capture the shape of empirical distributions. Moreover, Tijms (1994) proved that this class of distributions is dense in the space of positive continuous distributions and can be used to approximate any positive continuous distribution.

We report in Appendix A the definitions of the three probability distributions applied to describe the insurer’s liability. Here, we express the function g of Eq. (14) as

$$\begin{aligned} g(s,{\textbf {z}}_t)=s+\frac{1}{m(1-\alpha )}\sum _{j=1}^mh\left( {\textbf {R}}_{t,t+\tau }'(j){\textbf {z}}_t+s\right) \end{aligned}$$

and give the expression of \(h(\cdot )\) corresponding to each liability distribution:

  • If \(Y_{t,t+\tau }\) is lognormally distributed with parameters \(\mu \in \mathbb {R}\) and \(\sigma >0\), then

    $$\begin{aligned} \begin{aligned} h(l)=&\left\{ \begin{aligned}&\exp \left( \mu +\frac{\sigma ^2}{2}\right) -l,&\text {if }l\le 0,\\&\exp \left( \mu +\frac{\sigma ^2}{2}\right) \Phi \left( \frac{\mu -\ln (l)+\sigma ^2}{\sigma }\right) -l\Phi \left( \frac{\mu -\ln (l)}{\sigma }\right) ,&\text {if }l>0, \end{aligned}\right. \end{aligned} \end{aligned}$$

    where \(\Phi (\cdot )\) denotes the cumulative distribution function of a standard normal distribution;

  • If \(Y_{t,t+\tau }\) is gamma distributed with shape parameter \(\beta >0\) and scale parameter \(\theta >0\), then

    $$\begin{aligned} \begin{aligned} h(l)=&\left\{ \begin{aligned}&\beta \theta -l,&\text {if }l\le 0,\\&\beta \theta \left[ 1-F_G(l;\beta +1,\theta )\right] -l\left[ 1-F_G(l;\beta ,\theta )\right] ,&\text {if }l>0, \end{aligned}\right. \end{aligned} \end{aligned}$$

    where \(F_G(\cdot ;\beta ,\theta )\) denotes the cumulative distribution function of a gamma r.v. with shape parameter \(\beta\) and scale parameter \(\theta\);

  • if \(Y_{t,t+\tau }\) is a mixture of \(\varpi\) Erlang distributions with shape parameters \(\beta _1,\ldots ,\beta _{\varpi }\) and a common scale parameter \(\theta >0\), then

    $$\begin{aligned} \begin{aligned} h(l)=&\left\{ \begin{aligned}&\theta \sum _{i=1}^\varpi \alpha _i \beta _i-l,&\text {if }l\le 0,\\&\sum _{i=1}^\varpi \alpha _i\left\{ \beta _i\theta \left[ 1-F_G(l;\beta _i+1,\theta )\right] -l\left[ 1-F_G(l;\beta _i,\theta )\right] \right\} ,&\text {if }l>0, \end{aligned}\right. \end{aligned} \end{aligned}$$

    where \(\alpha _1,\ldots ,\alpha _{\varpi }\) are the mixture weights, and \(F_G(\cdot ;\beta ,\theta )\) denotes, as before, the cumulative distribution function of a gamma distribution with shape parameter \(\beta\) and scale parameter \(\theta\).

4 Empirical analysis

In this section, we present an empirical analysis to show the differences that emerge when applying the optimization problem with CVaR constraint of Sect. 2 under the three liability distributions considered in Sect. 3. To generate asset log-returns, as explained in Sect. 3, we use the moment-matching method of Høyland et al. (2003). Similarly to Asanga et al. (2014), we perform both an efficient frontier analysis and an out-of-sample analysis. The additional analysis provided in Appendix B is an out-of-sample analysis with a DCC-GARCH model that allows us to investigate if a model with time-varying conditional volatilities and correlations (like the one used in Asanga et al. 2014) entails significant differences in the capital requirements and portfolio allocations with respect to the moment-matching method of Høyland et al. (2003).

4.1 Data description

We consider portfolios composed of the S&P 500 index, the iShares Barclays 1-3 Year Treasury Bond ETF (SHY), and the iShares iBoxx $ Investment Grade Corporate Bond ETF (LQD).

Table 1 Descriptive statistics about daily log-returns from January 2010 to December 2020 for the assets S&P 500, SHY, and LQD
Table 2 Descriptive statistics about the data set danishuni of the R package CASdatasets of Dutang and Charpentier (2020) after some adjustments to have monthly losses in millions of U.S. dollars and concerning the period January 2010 - December 2020

We download daily adjusted closing prices from January 2010 to December 2020, for a total of 2768 observations, and to this aim we use the function getSymbols of the R package quantmod of Ryan and Ulrich (2022). As the source database, we download the asset prices from the website Yahoo! Finance. Table 1 reports descriptive statistics about the daily log-returns. We observe that S&P 500 has negative skewness and that LQD has the largest kurtosis. We split the computed daily log-returns into two samples, Sample A and Sample B. Sample A contains the observations from January 2010 to December 2015, that is, over the first six years, and we use it for the efficient frontier analysis. Sample B contains the remaining observations over the last five years, from January 2016 to December 2020, and we use it for the out-of-sample analysis.

Regarding the insurer’s liability, we use the data set danishuni of the R package CASdatasets of Dutang and Charpentier (2020) that comprises 2167 fire losses in millions of Danish krone from January 1980 to December 1990 adjusted for inflation to reflect year-1985 values. We convert these values in millions of U.S. dollars according to the exchange rate registered on 31 December 1985, which is 0.11198, aggregate the losses every month, and apply the annual inflation index so that the first monthly loss reflects the January 2010 value, the second monthly loss the February 2010 value, and so on, up to the last monthly loss that reflects the December 2020 value. Like the asset log-returns, we split these monthly losses into two samples. Sample \(A'\) consists of monthly losses from January 2010 to December 2015, whereas Sample \(B'\) consists of monthly losses from January 2016 to December 2020. Table 2 reports descriptive statistics for the data set danishuni after the adjustments detailed above.

Table 3 Parameters estimates of the three distributions taken into account for Sample \(A'\). The losses in Sample \(A'\) are adjusted to reflect year-2015 values. The table reports in parentheses the standard errors below the estimates and the p-values below the KS test statistics
Fig. 1
figure 1

Histogram of the log-transformed data for Sample \(A'\) with the addition of the theoretical curves. The losses in Sample \(A'\) are adjusted to reflect year-2015 values

The choice of a liability data set that spans a period different from that of the asset log-returns and quite far in time is mainly due to the lack of data sets made available by insurance companies. Our adjustments to the data set danishuni assume the independence between liabilities and log-returns. The representativeness of the adjusted data set could be limited since there is no evidence that more recent liability data sets have similar characteristics, but we think that working with simulated losses would be less realistic than using liabilities originated from observed data. The period January 2010 - December 2020, chosen to compute the daily log-returns, has seen a growth of the capital markets if we exclude year 2010 (when the sovereign debt crisis in Europe was still ongoing), Black Monday 2011 (which refers to the date August 8, 2011, when U.S. and global stock markets crashed in the aftermath of the credit rating downgrade of the U.S. sovereign debt by Standard and Poor’s), and year 2020 (when the COVID-19 outbreak started). Referring to this period, portfolios designed to minimize capital requirements could have significant allocations in assets with higher returns. As a consequence, optimal portfolios are generally composed of the index S&P 500 and asset LQD, and only in a few cases they also include the less risky asset SHY.

4.2 Parameter estimation for the liability distributions

Fig. 2
figure 2

Q-Q plots of the three theoretical distributions for Sample \(A'\). The losses in Sample \(A'\) are adjusted to reflect year-2015 values

We use the maximum likelihood estimator (MLE) for the three theoretical distributions considered. While applying the MLE to the lognormal and gamma distributions is a standard task, it is less straightforward for a mixture of Erlang distributions with a common scale parameter. To this aim, we use the approach proposed by Lee and Lin (2010), who developed a modified expectation-maximization (EM) algorithm tailored to the class of mixtures of Erlang distributions with a common scale parameter. Their procedure consists of three parts that are the standard EM algorithm, a parameter initialisation using the approximation of Tijms (1994), and an adjustment and diagnosis of parameters. Table 3 reports the estimation results for Sample \(A'\) and, in particular, shows, for each distribution, the estimates with the corresponding standard errors in parenthesis, the log-likelihood value at the estimated vector of parameters, the Bayesian information criterion (BIC), and the Kolmogorov-Smirnov (KS) test. Before applying the MLE to Sample \(A'\), we use the annual inflation index to have losses reflecting monetary amounts for year 2015. The p-values of the computed KS statistics are in parentheses, and we observe that they do not imply the rejection of any of the three distributions. The modified EM algorithm of Lee and Lin (2010) returns a mixture of only two Erlang distributions for Sample \(A'\) after adjusting it to have amounts reflecting year-2015 values. The gamma distribution fits the sample worse than the other two distributions since it has the highest KS statistic, largest negative loglikelihood, and largest BIC. For the lognormal distribution and the mixture of two Erlang distributions, we have contrasting results because the lognormal distribution has a slightly smaller KS statistic but a larger negative loglikelihood. However, the lognormal distribution has a lower BIC than the selected mixture of Erlang distributions, due to the fact that the former has only two parameters and the latter has five. To investigate further the goodness-of-fit of the three distributions and examine the differences in fitting the sample, in Fig. 1 we display the histogram of the log-transformed data with the three theoretical curves. We see that the mixture of two Erlang distributions is the only one to admit values as large as the maximum loss in the sample. Indeed, the last rectangle on the right side of the histogram corresponds to the highest loss, and only the green curve, the one corresponding to the mixture of two Erlang distributions, presents density values that take this rectangle into account. The Q-Q plots in Fig. 2 confirm this result since the mixture of two Erlang distributions displays right-tail quantiles close to the empirical quantile corresponding to the highest loss in the sample.

4.3 CVaR constraint optimization

In this section, we solve the optimization Problem (10)–(13) by Algorithm 1. The algorithm has been implemented in the language and environment for statistical computing R, and it uses the function simplex of the R package glpkAPI of Gelius-Dietrich (2021) to solve the LP problem at each iteration. The confidence level for the CVaR is \(\alpha =99\%\), the standard value imposed by the Swiss Solvency Test. In the implementation of Algorithm 1, we set \(\lambda =1000\) and \(\epsilon =\text {10E-11}\). The solvency horizon is \(\tau =21\) since the observed losses are on a monthly basis. Similarly to Asanga et al. (2014), we apply the following strategy when solving Problem (10)–(13):

  1. 1.

    Compute the sample statistics needed for the moment-matching method of Høyland et al. (2003) and estimate the liability parameters for the three theoretical distributions as explained in Sect. 4.2.

  2. 2.

    Apply the expected premium principle to calculate the insurance premium: \(p_t=(1+\eta )\,E[Y_{t,t+\tau }]\), where the relative security loading factor \(\eta\) equals 0.1.

  3. 3.

    Generate \(m=10000\) scenarios \({\textbf {R}}_{t,t+\tau }(j)\), \(j=1,\ldots ,m\), for the asset gross returns by using the moment-matching method of Høyland et al. (2003) to simulate asset log-returns with a monthly time scale and taking the exponential of the simulated log-returns.

  4. 4.

    Solve the optimization problem under the three theoretical distributions for the insurer’s liability and find the optimal required capital \(c_t^*\) and optimal portfolio allocations \(x_{i,t}^*\), \(i=1,2,3\).

4.3.1 Efficient frontier analysis

We build efficient frontiers by applying the above strategy for different lower levels, \(\gamma\), of the expected ROC. To obtain the solution corresponding to the minimum value of \(\gamma\), we solve Problem (10)–(13) without the portfolio performance constraint. The sample statistics used to generate asset log-returns with the moment-matching method of Høyland et al. (2003) are displayed in Table 4, and they are computed by applying formulae (26)–(30) to daily log-returns in Sample A with \(\tau =21\). The liability parameters are estimated by using Sample \(A'\) and by adjusting the losses in order to reflect year-2015 values. These estimates are reported in Table 3.

Table 4 Sample statistics computed with formulae (26)–(30) using daily log-returns in Sample A and \(\tau =21\)
Fig. 3
figure 3

Efficient frontiers built under the three theoretical distributions for Sample \(A'\). The losses in Sample \(A'\) are adjusted to reflect year-2015 values

Fig. 4
figure 4

Optimal portfolio allocations under the three theoretical distributions for Sample \(A'\). The losses in Sample \(A'\) are adjusted to reflect year-2015 values

We analyse the behaviour of the optimal capital required \(c_t^*\) and optimal portfolio allocations \(x_{i,t}^*\), \(i=1,2,3\), under the three distributions. In Fig. 3, we plot the three efficient frontiers and observe that the optimal required capital \(c_t^*\) differs among the three distributions. In particular, the mixture of two Erlang distributions implies levels of \(c_t^*\) much higher than those of the other two distributions. This result is reasonable because, contrary to the other two distributions, the mixture of two Erlang distributions captures better the losses close to the maximum value present in the sample. The liability distribution also has an impact on the expected ROC. We see that, under the mixture of two Erlang distributions, the expected ROC values are smaller than those obtained under the other two distributions. To complete this efficient frontier analysis, we plot in Fig. 4 the portfolio allocations for the three liability distributions. We see that all the three distributions do not provide investment in the asset SHY but invest the available capital in the other two assets. Specifically, we notice that, as the expected ROC level increases, the S&P 500 allocations increase, and the LQD allocations decrease.

Fig. 5
figure 5

Optimal total investments \(p_{t+(\psi -1)\tau }+c_{t+(\psi -1)\tau }^*\), \(\psi =1,\ldots ,\Psi\)

Fig. 6
figure 6

Optimal asset allocations

4.3.2 Out-of-sample analysis

We perform an out-of-sample analysis that relies on a rolling-window approach for the asset log-returns and an expanding-window approach for the insurer’s losses when accomplishing a new optimization. In this way, we execute an out-of-sample analysis that differs from Asanga et al. (2014), who apply a rolling-window approach even for the insurer’s liabilities. We do not exclude any observed liability when estimating loss-distribution parameters because we think that large losses are rare and a rolling-window approach could underestimate their reappearance. In Appendix C, we perform an out-of-sample analysis that uses a rolling-window approach for the insurer’s liabilities. This further analysis highlights how the large-loss underestimation is concrete when applying a rolling-window approach.

The rolling-window length corresponds to six years of daily observations. Using Samples A and \(A'\), we first compute the optimal solutions \((c_t^*,{\textbf {x}}_t^*)\) for the period \([t,t+\tau ]\) by applying Steps 1-4 detailed above. Then, we build a new sample for asset log-returns by eliminating the observations of the first month in Sample A and including those of the first month in Sample B. In this way, we carry out a monthly portfolio rebalancing. Regarding the insurer’s liabilities, we build the new sample by not excluding any observation from Sample \(A'\) and including the first observation from Sample \(B'\). Furthermore, we adjust the losses of this new sample to reflect values of the year of the last loss, which is 2016 in this case. Given the new samples for asset log-returns and losses, we recompute the new optimal solutions \((c_{t+\tau }^*,{\textbf {x}}_{t+\tau }^*)\) for the next period by applying Steps 1-4. Repeating the sample and optimization procedures up to the end of Samples B and \(B'\), we end up with the collection of optimal solutions \((c_{t+(\psi -1)\tau }^*,{\textbf {x}}_{t+(\psi -1)\tau }^*)\), \(\psi =1,\ldots ,\Psi\), where \(\Psi\) is the length of Sample \(B'\). To avoid optimization problems with no feasible solution, we do not set a lower bound for the expected ROC. Figures 5 and 6 and Table 5 display the results.

Figure 5 shows how the optimal total investment \(p_t+c_t^*\) evolves over time. The lognormal and gamma distributions have a very similar evolution even though the lognormal distribution usually requires, on any investment date, an optimal investment that is larger than that of the gamma distribution. The mixture of Erlang distributions displays an evolution that differs from those of the other two distributions. In detail, the optimal total investment is much higher on each investment date, and its variation between two consecutive months has a more pronounced intensity even though it is in the same direction showed by the graphs of the other two distributions. Figure 6 shows the evolutions of the optimal allocations \(x_{i,t}^*\), \(i=1,2,3\). For the lognormal and gamma distributions, we observe that the optimal portfolios invest only in the two assets S&P 500 and LQD. To be precise, there are a few exceptions in 2020. Indeed, there is one date where the lognormal distribution invests a small percentage of the optimal total investment in the asset SHY and some dates where it is the gamma distribution to put money into the asset SHY. The mixture distribution never invests in the asset SHY, and there is a long period of time, from July 2016 to July 2019, with investments only in the asset S&P 500. These results collide with those of Asanga et al. (2014) that instead found, in their out-of-sample analysis, significant allocations in the asset SHY. To justify this difference in the results, we observe that we work with prices on a different period, use a different loss data set, and generate asset log-returns with a different model. We think that what makes the real difference in the two experiments is the way of generating asset log-returns. Asanga et al. (2014) chose to model asset log-returns with three MV-GARCH models by ignoring the mean effect present in the models. On the contrary, we use a moment-matching method that does not ignore the asset means. Likely, this dissimilarity in modelling asset means could be the reason for such different results. To confirm this explanation of the difference in the two analyses, we performed some experiments (results are available on request) where we generated asset log-returns with a DCC-GARCH model under the assumption of zero mean for each asset, and we obtained that most of the optimal capital was allocated in the asset SHY.

Table 5 reports some statistics about the realized insurer’s wealth at the end of each month from January 2016 to December 2020 under the three liability distributions. That is, the table gives summary statistics for the differences

$$\begin{aligned} {\textbf {r}}'_\psi {\textbf {z}}_{\psi -1}^*-y_\psi ,\quad \psi =1,\ldots ,\Psi , \end{aligned}$$

where \({\textbf {r}}_\psi\) is the vector of the realized asset gross returns in the \(\psi\)th month, \(y_{\psi }\) is the realized insurer’s liability in the \(\psi\)th month, and \({\textbf {z}}_{\psi -1}^*\) is the vector containing the optimal amounts invested in each asset at the beginning of the \(\psi\)th month. It is worth highlighting that only the mixture distribution ensures a positive insurer’s wealth in each month. Indeed, for the other two distributions, there are two dates where the insurer’s wealth is negative and the insurer faces bankruptcy.

Table 5 Statistics about the realized insurer’s wealth at the end of each month from January 2016 to December 2020

5 Conclusions

In this article, we extend the optimization problem with CVaR constraint of Asanga et al. (2014) to any integrable liability distribution. To solve the problem, we propose applying the Kelley-Cheney-Goldstein algorithm, and we give proof of the convergence of the algorithm. We adopt three distributions for modelling the insurer’s liability: the lognormal distribution, the gamma distribution, and a mixture of Erlang distribution with a common scale parameter. The results show that the choice of the liability distribution has a strong impact on the optimal capital and asset allocations. Specifically, the mixture distribution offers a higher protection for what concerns our data set of losses since, contrary to the other two distributions, in the out-of-sample analysis the insurer’s wealth never becomes negative at the end of a month. Clearly, this higher protection is at the cost of a greater optimal capital. The optimal allocations of our experiments usually do not include the asset SHY, which is the least risky because it is a fund investing in Treasury Bonds with a maturity from 1 to 3 years. When working with the mixture distribution, we often obtain that the optimal investment should be just in the asset S&P 500. Our results differ from those of Asanga et al. (2014), who found significant allocations in the asset SHY. This difference in the results is probably due to the mean effect in modelling asset log-returns because the analysis in Asanga et al. (2014) ignores it, whereas our empirical application does not. As discussed in Sect. 4.1, the almost total absence of the asset SHY in the optimal portfolios is mainly due to the chosen period from January 2010 to December 2020, which was a period of growth for capital markets. Hence, optimal portfolios minimizing the capital requirement tend to contain significant allocations in the assets with higher returns. Different periods characterized by little growth and turbulence in the markets, like the one used by Asanga et al. (2014), should return optimal portfolios with allocations in the less risky assets.

Future developments of the optimization problem proposed in this article could regard its application with other liability distributions and asset log-return models. For instance, it may be applied in the case of huge losses when the insurer’s liability is modelled with a generalized Pareto distribution, which is a distribution that finds justification in the Extreme Value Theory (EVT) (see, e.g., Embrechts et al. 2013). Regarding the asset log-return models, it would be interesting to consider as future research alternative models like those proposed by Mudry and Paraschiv (2016) and Koliai (2016) that use the EVT to model the marginal distributions of the returns and copula functions to capture the dependence structures. Finally, other developments could go towards the definition of multistage models that give the possibility of changing the asset allocations at intermediate dates.