Optimal investment strategies for pension funds with regulation-conform dynamic pension payment management in the absence of guarantees

Lichtenstern, Andreas; Zagst, Rudi

doi:10.1007/s13385-021-00298-7

Optimal investment strategies for pension funds with regulation-conform dynamic pension payment management in the absence of guarantees

Original Research Paper
Open access
Published: 29 October 2021

Volume 12, pages 647–700, (2022)
Cite this article

Download PDF

You have full access to this open access article

European Actuarial Journal Aims and scope Submit manuscript

Optimal investment strategies for pension funds with regulation-conform dynamic pension payment management in the absence of guarantees

Download PDF

2926 Accesses
2 Citations
Explore all metrics

Abstract

In this article we consider the post-retirement phase optimization problem for a specific pension product in Germany that comes without guarantees. The continuous-time optimization problem is defined consisting of two specialties: first, we have a product-specific pension adjustment mechanism based on a certain capital coverage ratio which stipulates compulsory pension adjustments if the pension fund is underfunded or significantly overfunded. Second, due to the retiree’s fear of and aversion against pension reductions, we introduce a total wealth distribution to an investment portfolio and a buffer portfolio to lower the probability of future potential pension shortenings. The target functional in the optimization, that is to be maximized, is the client’s expected accumulated utility from the stochastic future pension cash flows. The optimization outcome is the optimal investment strategy in the proposed model. Due to the inherent complexity of the continuous-time framework, the discrete-time version of the optimization problem is considered and solved via the Bellman principle. In addition, for computational reasons, a policy function iteration algorithm is introduced to find a stationary solution to the problem in a computationally efficient and elegant fashion. A numerical case study on optimization and simulation completes the work with highlighting the benefits of the proposed model.

A combined stochastic programming and optimal control approach to personal finance and pensions

Article 27 August 2014

New Challenges in Pension Industry: Proposals of Personal Pension Products

Rule-based strategies for dynamic life cycle investment

Article Open access 31 May 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this article, we study a pension insurance related optimization problem which targets to maximize the expected utility of the future stochastic pension cash flows of a client within a specific pension adjustment and investment model. This model covers a certain pension adjustment mechanism, where pension guarantees are disregarded, but the pension needs to be adjusted (either reduced or increased) if the pension fund becomes underfunded or significantly overfunded, and a buffer rule to smooth the pension development over time and to reduce the probability of pension shortenings. The solution to the problem is given in form of the optimal investment strategy in the proposed system.

The above is motivated by the need of a suitable pension product that allows for higher expected returns on the investments particularly when interest rates are low or even negative. In the recent low interest-rate environment, traditional pension funds, which allocate a high proportion of their wealth to defensive assets such as government bonds due to the promised guarantees, can only offer a relatively small expected return on the investments. By this, the pension fund wealth of a client grows at a rather small rate and consequently the future pension payments will be quite low. Generally, clients seek for and desire a stable evolution of their reported wealth (and their pension) at a high expected return and with a limited downside. Therefore, alternative strategies without guarantees but with a certain downside protection can provide a significant contribution.

For this reason, we consider a certain pension product^{Footnote 1} that comes with a buffer and a pension adjustment mechanism to enhance expected returns. The product allows company pension schemes to only make contribution-related promises but forbids performance-related guarantees. To allow for a performance- or return-seeking characteristic, the product comes with no pension cash flow guarantee at all. The product generally consists of two phases: the pre-retirement or accumulation phase and the post-retirement or decumulation phase. We focus on the wealth decumulation phase in what follows. This phase can be regarded as a modification of a defined benefit (DB) plan, where the pensions stay constant as long as the wealth remains inside a pre-defined corridor. As this new pension product is currently in a development stage in Germany and is being built up, we study the impact of the associated model. As every investor has an individual risk appetite or risk attitude, professional decision making under uncertainty needs to consider an adequate modeling of a certain risk-reward tradeoff. A more risk-averse investor generally prefers a portfolio with a lower risk in terms of some risk measure, coming at the cost of smaller returns on average. The general question arises how the pension fund’s wealth is to be invested such that the benefits for the clients are maximized. This particular question about a scientifically founded investment strategy for a Nahles–Rente pension product is addressed in this work. Thus, we contribute with the following: First, a mathematical model (for a single client and an age-grouped cohort) is built up that incorporates a certain buffer rule and a pension adjustment mechanism. Moreover, the optimal investment strategy is derived by maximizing the expected utility of the stochastic pension cash flows. Afterwards, a numerical optimization and simulation case study is carried out to illustrate the optimal control and analyze its characteristics. Based on the case study, under the assumption of a positive interest rate, we find that the introduction of the proposed buffer system significantly reduces the probabilities for pension reductions and leads to a certain tradeoff between the initial pension level at retirement time and performance: A more pronounced buffer system is connected to a smaller initial pension level, but at the same time leads to a superior performance of the pension evolution. We conclude that our proposed model leads to a sophisticated optimal dynamic asset allocation policy that provides remarkable benefits to clients and represents a meaningful alternative to risk-averse clients. For a proposal and some discussion of an alternative model formulation that designs a pension product without guarantees we refer to [6], where the modeling is related to but differs from our approach.

The remainder is organized as follows: First of all, Sect. 2 introduces the considered financial market model (classical Black–Scholes model with constant parameters) that consists of a riskless and multiple risky assets. Sect. 3 models the continuous-time mathematical framework for the decumulation phase under a constant force of mortality. The resulting portfolio selection problem (single-client and cohort version) is stated in Sect. 4 and is solved in discrete-time. Although the problem is finally solved in a discrete-time framework, we first introduce the continuous-time setup as we would like to embed the problem into the standard portfolio selection problems that deal with continuous-time capital market and decision models. Due to implementation reasons, Sect. 5 provides an approximate solution to the original problem in form of a stationary solution. To be able to solve the complex problem, we particularly impose the following assumptions and simplifications: The applied constant force of mortality implies an exponentially distributed remaining uncertain lifetime. For algorithmic reasons, the planning or investment horizon of the system is set to infinity to obtain an optimal stationary solution. This solution can be used as an approximation for a finite planning horizon when the survival probability of exceeding this horizon is sufficiently small. Additionally, separability of time and pension in the pension utility function is assumed jointly with an exponential time-dependence. Moreover, certain (mostly equidistant) discretization grids are utilized when the problem is addressed in discrete-time. An extensive numerical case study visualizes the optimal asset allocation strategy and highlights its benefits in Sect. 6. Finally, Sect. 7 concludes.

2 The financial market model

Let $T < \infty $ denote the initial time of the post-retirement phase, the retirement entry time in most cases. Further let ${\tilde{T}}$ denote the end of the investment period and let $(\Omega , {\mathcal {F}}, {\mathbb {F}} = \left( {\mathcal {F}}_{t}\right) _{t \in [T,{\tilde{T}}]}, {\mathbb {P}})$ be a filtered complete probability space that satisfies the usual conditions and let ${W = \left( W(t)\right) _{t \in [T,{\tilde{T}}]}}$, $W(t) = (W_{1}(t), \ldots , W_{N}(t))'$, $N \in {\mathbb {N}} $, denote a standard N-dimensional Brownian motion. $\Omega $ is the sample space, ${\mathbb {P}}$ denotes the real-world probability measure and ${\mathcal {F}}_{t}$ the natural filtration generated by W(s), ${T \le s \le t}$, that is augmented by all the null sets. By this we introduce uncertainty into the considered continuous-time financial market model that is frictionless and consists of $N+1$ continuously traded assets: one risk-less asset $P_{0}$ and N risky assets $P_{i}$, $i = 1, \ldots , N$. The price of the risk-less asset is subject to the equation

$$\begin{aligned} dP_{0}(t) = r P_{0}(t)dt,\quad P_{0}(T) = 1,\ T \le t \le {\tilde{T}}, \end{aligned}$$

(2.1)

where $r \ge 0$ is the constant risk-less interest rate. The remaining N assets, usually referred to as asset classes, are subject to the stochastic differential equations

$$\begin{aligned} dP_{i}(t)&= P_{i}(t) \left( \mu _{i} dt + \sigma _{i} dW(t)\right) = P_{i}(t) \left( \mu _{i} dt + \sum _{j = 1}^{N} \sigma _{ij} dW_{j}(t)\right) , \\&\quad P_{i}(T) = p_{i} > 0, \end{aligned}$$

(2.2)

where ${\mu = \left( \mu _{1},\ldots ,\mu _{N}\right) ' \in {\mathbb {R}}_{+}^{N}}$ with ${\mu - r {\mathbf {1}} > {\mathbf {0}}}$ is the constant drift and $\sigma _{i} = (\sigma _{i1}, \ldots , \sigma _{iN}) \in {\mathbb {R}}_{+}^{1 \times N}$ denotes the constant volatility vector of assets $i = 1, \ldots , N$. Here, $x'$ stands for the transpose of some vector x, ${\mathbf {1}} := (1, \ldots , 1)'$ and ${\mathbf {0}} := (0, \ldots , 0)'$. The volatility matrix is defined as ${\sigma = \left( \sigma _{ij}\right) _{i,j = 1,\ldots ,N}}$ with corresponding covariance matrix ${\Sigma = \sigma \sigma '}$ of the log-returns which is assumed to be strongly positive definite, i.e. there exists $K > 0$ such that ${\mathbb {P}}$-a.s. it is ${x' \Sigma x \ge K x' x}$, ${\forall x \in {\mathbb {R}}^{N}}$. Moreover, within this framework ${\gamma = \sigma ^{-1} (\mu - r {\mathbf {1}})}$ denotes the market price of risk. In accordance with [13] there exists a unique risk-neutral probability measure ${\mathbb {Q}}$ within the above market dynamics. Additionally, the financial market is complete which enables us to determine the present value of stochastic cash flows as expected discounted payments under the measure ${\mathbb {Q}}$. The associated pricing kernel or state price deflator, which we denote by ${\tilde{Z}}(t)$, is defined as

$$\begin{aligned} {\tilde{Z}}(t) := e^{- \left( r + \frac{1}{2}\Vert \gamma \Vert ^{2}\right) t - \gamma ' W(t)} \end{aligned}$$

(2.3)

and can be used for the valuation of cash flow streams under the real-world probability measure ${\mathbb {P}}$. The dynamics of the pricing kernel is

$$\begin{aligned} d {\tilde{Z}}(t) = - {\tilde{Z}}(t) \left( r dt + \gamma ' dW(t)\right) ,\ {\tilde{Z}}(0) = 1. \end{aligned}$$

(2.4)

Further, let $\varphi = (\varphi _{0}, {\hat{\varphi }})'$, ${{\hat{\varphi }} = (\varphi _{1}, \ldots , \varphi _{N})'}$ denote a trading strategy that is assumed to be ${\mathcal {F}}_{t}$-progressively measurable, self-financing with ${\mathbb {P}}\left( \int _{0}^{T} |\varphi _{0}(t)| + \Vert {\hat{\varphi }}(t)\Vert ^{2} dt < \infty \right) = 1$. $\varphi _{i}(t)$ represents the number of individual shares of asset i held by the investor at time t. Analogically, we denote the ${\mathcal {F}}_{t}$-progressively measurable and self-financing relative portfolio process with ${\pi = (\pi _{0}, {\hat{\pi }}')'}$, ${{\hat{\pi }} = (\pi _{1}, \ldots , \pi _{N})'}$, where ${\hat{\pi }}$ represents the risky relative investment and ${\pi _{0}(t) = 1-{\hat{\pi }}(t)'{\mathbf {1}}}$ the risk-less relative investment. In general, $\pi _{i}(t)$ denotes the proportion of wealth allocated to asset i at time t and is related to $\varphi _{i}(t)$ through

$$\begin{aligned} \begin{aligned} \pi _{i}(t) :=\left\{ \begin{array}{ll} \frac{\varphi _{i}(t) P_{i}(t)}{V(t)} ,&{}\quad \text {if } V(t) \ne 0, \\ 0,&{}\quad \text {if } V(t) = 0, \end{array}\right. \end{aligned} \end{aligned}$$

(2.5)

where V(t) denotes the corresponding wealth at time t. The wealth V(t), and in particular its characterizing dynamics dV(t), are to be defined in the upcoming section.

3 The decumulation phase mathematical model

In the following we present and explain the mathematical modeling of the pension plan dynamics associated with the decumulation phase. At first we consider a single client, but later relax the framework to a cohort model where customers are grouped by their age. Remember that time T denotes the initial time where the post-retirement pension fund is started. The total individual wealth of client i in cohort j at time $t \ge T$ is denoted by $V_{ij}^{\text {(total)}}(t)$, the individual continuously-withdrawn pension payment rate by $P_{ij}(t) \ge 0$ and the time-t present value of all outstanding future pension payments to this specific client under a constant pension development assumption by $E_{ij}(t)$. The latter can be expressed as

$$\begin{aligned} E_{ij}(t) := {\mathbb {E}} \left[ \int _t^{\tau _{ij}^{x}(T)} \frac{{\tilde{Z}}(s)}{{\tilde{Z}}(t)} P_{ij}(t) ds \bigg | {\mathcal {F}}_{t}, \tau _{ij}^{x}(T) \ge t\right] , \end{aligned}$$

(3.1)

where $\tau _{ij}^x(T)$ denotes the uncertain total lifetime of client i in cohort j who is aged x at time T. Throughout the paper we consider a constant mortality rate $\lambda _{x} = \lambda _{x(ij)} > 0$. Therefore, the survival probability of a client aged x at time T to survive from time T until time $t > T$ is given by ${\mathbb {P}}(\tau _{ij}^x(T) \ge t | \tau _{ij}^x(T) \ge T) = e^{- \lambda _{x} (t-T)}$, $\lambda _{x} > 0$. Moreover, we assume $\tau _{ij}^x(T)$ (uncertain total lifetime) to be independent of the filtration ${\mathbb {F}}$. Within this model, we have for $s \ge t \ge T$:

$$\begin{aligned} \begin{aligned} {\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge s \bigg | \tau _{ij}^{x}(T) \ge t\right)&= \frac{{\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge s, \tau _{ij}^{x}(T) \ge t\right) }{{\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge t\right) } = \frac{{\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge s\right) }{{\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge t\right) } \\&= \frac{e^{- \lambda _{x} (s - T)}}{e^{- \lambda _{x} (t - T)}} = e^{- \lambda _{x} (s - t)}. \end{aligned} \end{aligned}$$

(3.2)

Together with a given and thus known $P_{ij}(t)$ at time t, applying Fubini and using that $\tau _{ij}^x(T)$ is independent of ${\mathbb {F}}$, $E_{ij}(t)$ becomes

$$\begin{aligned} \begin{aligned} E_{ij}(t)&{\mathop {=}\limits ^{(3.1)}} {\mathbb {E}} \left[ \int _t^{\tau _{ij}^{x}(T)} \frac{{\tilde{Z}}(s)}{{\tilde{Z}}(t)} P_{ij}(t) ds \bigg | {\mathcal {F}}_{t}, \tau _{ij}^{x}(T) \ge t\right] \\&= P_{ij}(t) \int _t^{\infty } {\mathbb {E}} \left[ \frac{{\tilde{Z}}(s)}{{\tilde{Z}}(t)} {\mathbbm {1}}_{\tau _{ij}^{x}(T) \ge s} \bigg | {\mathcal {F}}_{t}, \tau _{ij}^{x}(T) \ge t\right] ds \\&= P_{ij}(t) \int _t^{\infty } {\mathbb {E}} \left[ \frac{{\tilde{Z}}(s)}{{\tilde{Z}}(t)} \bigg | {\mathcal {F}}_{t}, \tau _{ij}^{x}(T) \ge t\right] {\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge s \bigg | {\mathcal {F}}_{t}, \tau _{ij}^{x}(T) \ge t\right) ds \\&= P_{ij}(t) \int _t^{\infty } {\mathbb {E}} \left[ \frac{{\tilde{Z}}(s)}{{\tilde{Z}}(t)} \bigg | {\mathcal {F}}_{t}\right] {\mathbb {P}}\left( \tau _{ij}^{x}(T) \ge s \bigg | \tau _{ij}^{x}(T) \ge t\right) ds \\&{\mathop {=}\limits ^{(3.2)}} P_{ij}(t) \int _t^{\infty } e^{- r (s-t)} e^{- \lambda _{x} (s - t)} ds = P_{ij}(t) \frac{e^{- (r + \lambda _{x}) (s-t)}}{- (r + \lambda _{x})} \bigg |_{s = t}^{s = \infty } \\ &= \frac{P_{ij}(t)}{r + \lambda _{x}}. \end{aligned} \end{aligned}$$

(3.3)

$E_{ij}(t)$ can be regarded as perpetual annuity. Moreover, the pension rate $P_{ij}(t)$ is adjusted such that a certain capital coverage ratio

$$\begin{aligned} CCR_{ij}^{\text {(total)}}(t) := \frac{V_{ij}^{\text {(total)}}(t)}{E_{ij}(t)} \end{aligned}$$

(3.4)

is met. Particularly, regulations of BaFin force

$$\begin{aligned} CCR_{ij}^{\text {(total)}}(t) \in [100 \%, 125 \%],\quad \forall t \ge T. \end{aligned}$$

(3.5)

Generally for all $t \ge T$, the total wealth $V_{ij}^{\text {(total)}}(t)$ that belongs to client i in cohort j is internally divided into an investment portfolio $V_{ij}^{\text {(inv)}}(t)$ (portfolio mix of riskless and risky assets) and a buffer portfolio $V_{ij}^{\text {(buffer)}}(t)$ (deposit account with zero interest rate^{Footnote 2}) such that

$$\begin{aligned} V_{ij}^{\text {(total)}}(t) = V_{ij}^{\text {(inv)}}(t) + V_{ij}^{\text {(buffer)}}(t). \end{aligned}$$

(3.6)

Let us define

$$\begin{aligned} CCR_{ij}^{\text {(inv)}}(t) := \frac{V_{ij}^{\text {(inv)}}(t)}{E_{ij}(t)},\ CCR_{ij}^{\text {(buffer)}}(t) := \frac{V_{ij}^{\text {(buffer)}}(t)}{E_{ij}(t)}. \end{aligned}$$

(3.7)

This immediately implies the relationship

$$\begin{aligned} \begin{aligned} CCR_{ij}^{\text {(total)}}(t)&{\mathop {=}\limits ^{(3.4)}} \frac{V_{ij}^{\text {(total)}}(t)}{E_{ij}(t)} {\mathop {=}\limits ^{(3.6)}} \frac{V_{ij}^{\text {(inv)}}(t)}{E_{ij}(t)} + \frac{V_{ij}^{\text {(buffer)}}(t)}{E_{ij}(t)} {\mathop {=}\limits ^{(3.7)}} CCR_{ij}^{\text {(inv)}}(t) + CCR_{ij}^{\text {(buffer)}}(t). \end{aligned} \end{aligned}$$

(3.8)

We propose the following structure:

$$\begin{aligned} V_{ij}^{\text {(buffer)}}(t) &:= \alpha \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \end{aligned}$$

(3.9)

for some $\alpha \in [0,1]$. The remainder builds the investment portfolio

$$\begin{aligned} \begin{aligned} V_{ij}^{\text {(inv)}}(t)&{\mathop {=}\limits ^{(3.6)}} V_{ij}^{\text {(total)}}(t) - V_{ij}^{\text {(buffer)}}(t) {\mathop {=}\limits ^{(3.9)}} V_{ij}^{\text {(total)}}(t) - \alpha \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \\&= \alpha E_{ij}(t) + (1 - \alpha ) V_{ij}^{\text {(total)}}(t) = E_{ij}(t) + (1 - \alpha ) \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) . \end{aligned} \end{aligned}$$

(3.10)

Thus, we define the buffer balance to be the proportion $\alpha $ of the cushion or surplus $V_{ij}^{\text {(total)}}(t) - E_{ij}(t)$, the remaining fund flows into the investment portfolio. Further we would like to control the capital coverage ratio for the investment portfolio such that all pension payments can be made by the investment portfolio under normal circumstances, where the buffer account can help out in bad scenarios. For this sake, let us denote^{Footnote 3} by ${\bar{p}} \in [100\%, 125\%]$ the value for $CCR_{ij}^{\text {(inv)}}(t_{n})$ after readjustment at some re-set time $t_{n}$ where the pre-readjustment value at time $t_{n}$ falls outside the corridor $[100\%, 125\%]$. For instance, one could set ${\bar{p}} = 112.5 \%$ to the center of the corridor. Note that Eq. (3.9) leads to

$$\begin{aligned} \begin{aligned} CCR_{ij}^{\text {(total)}}(t)&{\mathop {=}\limits ^{(3.8)}} CCR_{ij}^{\text {(inv)}}(t) + CCR_{ij}^{\text {(buffer)}}(t) {\mathop {=}\limits ^{(3.7)}} CCR_{ij}^{\text {(inv)}}(t) + \frac{V_{ij}^{\text {(buffer)}}(t)}{E_{ij}(t)} \\&{\mathop {=}\limits ^{3.9}} CCR_{ij}^{\text {(inv)}}(t) + \alpha \left( CCR_{ij}^{\text {(total)}}(t) - 1\right) \end{aligned} \end{aligned}$$

(3.11)

for all $t \ge T$, which can be reformulated to

$$\begin{aligned} CCR_{ij}^{\text {(total)}}(t) = \frac{CCR_{ij}^{\text {(inv)}}(t) - \alpha }{1 - \alpha }. \end{aligned}$$

(3.12)

We would like to stress out that the parameters $\alpha $ and ${\bar{p}}$ are exogenously given and time- as well as client-independent. In the following we propose and describe a certain adjustment mechanism for the pension rate and the buffer system and demonstrate that it actually satisfies Eq. (3.9) for all adjustment times (called $t_{n}$) as well as all non-adjustment times ($t \ne t_{n}$), i.e. for all $t \ge T$.

3.1 System at re-adjustment times

As already mentioned, whenever the corridor is exceeded at some time $t_{n} \ge T$, the pension rate $P_{ij}(t_{n})$ needs to be adjusted (either reduced or increased) such that $CCR_{ij}^{\text {(total)}}(t_{n}) \in [100 \%, 125 \%]$. The fund keeps the pension rates constant between the re-adjustment times $t_{n}$, $n \in {\mathbb {N}}$, which are defined by

$$\begin{aligned} t_{n} := \inf \left\{ t \in (t_{n-1}, \tau _{ij}^{x}(T)]: CCR_{ij}^{\text {(total)}}(t) \notin [100 \%, 125 \%] \bigg | \tau _{ij}^{x}(T) \ge t_{n-1}\right\} . \end{aligned}$$

(3.13)

For the sake of convenience we define $t_{0} := T$ as first (re-)adjustment time. At time $t_{n}$ the system gets re-adjusted such that $CCR_{ij}^{\text {(inv)}}(t_{n})$ becomes ${\bar{p}}$:

$$\begin{aligned} \begin{aligned} {\bar{p}} := CCR_{ij}^{\text {(inv)}}(t_{n})&{\mathop {=}\limits ^{(3.7)}} {} \frac{V_{ij}^{\text {(inv)}}(t_{n})}{E_{ij}(t_{n})} {\mathop {=}\limits ^{(3.12)}} \alpha + (1 - \alpha ) CCR_{ij}^{\text {(total)}}(t_{n}). \end{aligned} \end{aligned}$$

(3.14)

In view of Eq. (3.12) this is equivalent to

$$\begin{aligned} CCR_{ij}^{\text {(total)}}(t_{n}) = \frac{{\bar{p}} - \alpha }{1 - \alpha }. \end{aligned}$$

(3.15)

Moreover, from Eq. (3.15) it follows

$$\begin{aligned} \begin{aligned} \frac{{\bar{p}} - \alpha }{1 - \alpha }&= CCR_{ij}^{\text {(total)}}(t_{n}) {\mathop {=}\limits ^{(3.4)}} \frac{V_{ij}^{\text {(total)}}(t_{n})}{E_{ij}(t_{n})} \Leftrightarrow E_{ij}(t_{n}) = \frac{(1 - \alpha ) V_{ij}^{\text {(total)}}(t_{n})}{{\bar{p}} - \alpha } \\&{\mathop {\Leftrightarrow }\limits ^{(3.3)}} {} P_{ij}(t_{n}) = \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x}) V_{ij}^{\text {(total)}}(t_{n}). \end{aligned} \end{aligned}$$

(3.16)

This means that when ${\bar{p}}$ and $V_{ij}^{\text {(total)}}(t_{n})$ are known at time $t_{n}$, then the selection of $\alpha $ determines the adjusted pension rates $P_{ij}(t_{n})$ at re-set time $t_{n}$. Moreover, a higher value of ${\bar{p}}$, everything else staying constant, implies a smaller adjusted pension rate $P_{ij}(t_{n})$. Hence, at every re-adjustment time $t_{n}$ (especially at time $t_{0} = T$), the adjusted pension rate $P_{ij}(t_{n})$ is selected according to Eq. (3.16) such that $CCR_{ij}^{\text {(total)}}(t_{n}) = \frac{{\bar{p}} - \alpha }{1 - \alpha }$ (and $CCR_{ij}^{\text {(inv)}}(t_{n}) = {\bar{p}}$), and it further holds^{Footnote 4}$P_{ij}(t) \equiv P_{ij}(t_{n})$ $\forall t \in [t_{n}, t_{n+1})$. Finally, we receive

$$\begin{aligned} \begin{aligned} CCR_{ij}^{\text {(buffer)}}(t_{n})&{\mathop {=}\limits ^{(3.8)}} {} CCR_{ij}^{\text {(total)}}(t_{n}) - CCR_{ij}^{\text {(inv)}}(t_{n}) {\mathop {=}\limits ^{(3.15),(3.14)}} \frac{{\bar{p}} - \alpha }{1 - \alpha } - {\bar{p}} = \alpha \frac{{\bar{p}} - 1}{1 - \alpha }. \end{aligned} \end{aligned}$$

(3.18)

Notice that as we define ${\bar{p}} := CCR_{ij}^{\text {(inv)}}(t_{n})$ in Eq. (3.14) to coincide for any customer (${\bar{p}}$ independent of ij) and to be time-independent, so do $CCR_{ij}^{\text {(total)}}(t_{n})$ and $CCR_{ij}^{\text {(buffer)}}(t_{n})$ which we learn from Eqs. (3.15) and (3.18). As $CCR_{ij}^{\text {(total)}}(t_{n}) \in [100 \%, 125 \%]$ is required, i.e. it has to stay inside the boundaries, we must have ${\frac{{\bar{p}} - \alpha }{1 - \alpha } \in [100 \%, 125 \%]}$. For economic reasons, suppose ${{\bar{p}} \in [100 \%, 125 \%]}$ and $\alpha \in [0,1]$. Therefore we have the regulatory condition

$$\begin{aligned} \alpha \in \left[ 0 \%, \frac{125 \% - {\bar{p}}}{125 \% - 100 \%}\right] \end{aligned}$$

(3.19)

on the variable $\alpha $. In particular, when ${\bar{p}} = 112.5 \%$, then $\alpha $ can be selected out of the interval ${\left[ 0 \%, 50 \%\right] }$.

3.2 Dynamics between re-adjustment times

So far we described the framework and mechanism at the re-adjustment times $t_{n}$. For all $t \ge T$, $t \ne t_{n}$, we propose the following buffer rate mechanism (rate of change of the buffer balance) that drives $V_{ij}^{\text {(buffer)}}(t)$, where we implicitly assume that the buffer account is a simple account that pays no interest (see Footnote 2):

$$\begin{aligned} dV_{ij}^{\text {(buffer)}}(t) := c_{ij}^{\text {(buffer)}}(t) dt := \alpha d\left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \end{aligned}$$

(3.20)

This means that between any two re-adjustment times, the buffer portfolio $V_{ij}^{\text {(buffer)}}(t)$ evolves according to the changes in the surplus $V_{ij}^{\text {(total)}}(t) - E_{ij}(t)$. Especially in a situation where the change in the total wealth $V_{ij}^{\text {(total)}}(t)$ and the change in the liabilities $E_{ij}(t)$ coincide, the buffer portfolio remains constant. Eq. (3.20) leads to

$$\begin{aligned} V_{ij}^{\text {(buffer)}}(t)&= V_{ij}^{\text {(buffer)}}(T) + \int _{T}^{t} dV_{ij}^{\text {(buffer)}}(s) \\&{\mathop {=}\limits ^{(3.20)}} \alpha \left( V_{ij}^{\text {(total)}}(T) - E_{ij}(T)\right) + \int _{T}^{t} \alpha d\left( V_{ij}^{\text {(total)}}(s) - E_{ij}(s)\right) \\&= \alpha \left( V_{ij}^{\text {(total)}}(T) - E_{ij}(T)\right) + \alpha \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \\&\quad - \alpha \left( V_{ij}^{\text {(total)}}(T) - E_{ij}(T)\right) \\&= \alpha \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \end{aligned}$$

(3.21)

for all $t \ge T$, i.e. Eq. (3.9) could be verified for all $t = t_{n}$ as well as $t \ne t_{n}$. Hence it turns out that the proportional distribution of the total wealth $V_{ij}^{\text {(total)}}(t)$ to the investment and buffer portfolio is identical for all times $t \ge T$. The formula for the pension rate at $t \ne t_{n}$ was already shown in Eq. (3.17). A very beneficial feature of this proposed buffer account and process is the following relation for times $t \ne t_{n}$:

$$\begin{aligned} CCR_{ij}^{\text {(total)}}(t) \searrow 100 \%\ \Leftrightarrow \ V_{ij}^{\text {(total)}}(t) \searrow E_{ij}(t)\ \Leftrightarrow \ V_{ij}^{\text {(buffer)}}(t) \searrow 0. \end{aligned}$$

(3.22)

Therefore, a downwards adjustment of the pension rate ($CCR_{ij}^{\text {(total)}}(t)$ falls short $100 \%$) comes at the same time as a zero value in the buffer account.

Additionally, for times $t \ne t_{n}$, the dynamics of the investment portfolio, which serves the pension outflows and the buffer rate (positive or negative), follows the stochastic differential equation of a classically portfolio that is invested into the capital market:

$$\begin{aligned} \begin{aligned} dV_{ij}^{\text {(inv)}}(t)&= V_{ij}^{\text {(inv)}}(t) \left[ \left( r + {\hat{\pi }}^{\text {(inv)}}(t)' (\mu -r {\mathbf {1}})\right) dt + {\hat{\pi }}^{\text {(inv)}}(t)' \sigma dW(t)\right] \\&\quad - P_{ij}(t) dt - c_{ij}^{\text {(buffer)}}(t) dt \end{aligned} \end{aligned}$$

(3.23)

The first component of the formula coincides with a pure classical investment part, where ${\hat{\pi }}^{\text {(inv)}}(t)$ denotes the risky relative investment that corresponds to the investment portfolio with wealth $V_{ij}^{\text {(inv)}}(t)$, plus two additional components in the form of a pension rate outflow $- P_{ij}(t) dt$ and a buffer rate inflow or outflow $- c_{ij}^{\text {(buffer)}}(t) dt$. Furthermore, bringing the dynamics of $V_{ij}^{\text {(inv)}}(t)$ and $V_{ij}^{\text {(buffer)}}(t)$ together, gives

$$\begin{aligned} dV_{ij}^{\text {(total)}}(t)&{\mathop {=}\limits ^{(3.6)}}dV_{ij}^{\text {(inv)}}(t) + dV_{ij}^{\text {(buffer)}}(t) \\&{\mathop {=}\limits ^{(3.23), (3.20)}}V_{ij}^{\text {(inv)}}(t) \left[ \left( r + {\hat{\pi }}^{\text {(inv)}}(t)' (\mu -r {\mathbf {1}})\right) dt + {\hat{\pi }}^{\text {(inv)}}(t)' \sigma dW(t)\right] \\&\quad - P_{ij}(t) dt - c_{ij}^{\text {(buffer)}}(t) dt + c_{ij}^{\text {(buffer)}}(t) dt \\&{\mathop {=}\limits ^{(3.10)}}\left[ E_{ij}(t) + (1 - \alpha ) \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \right] \\&\quad \times \left[ \left( r + {\hat{\pi }}^{\text {(inv)}}(t)' (\mu -r {\mathbf {1}})\right) dt + {\hat{\pi }}^{\text {(inv)}}(t)' \sigma dW(t)\right] - P_{ij}(t) dt. \end{aligned}$$

(3.24)

As the wealth of the buffer portfolio is not invested in the capital market, the total time-t risky exposure is given by ${\hat{\pi }}^{\text {(inv)}}(t) V_{ij}^{\text {(inv)}}(t)$ which determines the following relative risky investment ${\hat{\pi }}^{\text {(total)}}(t)$ of the total wealth $V_{ij}^{\text {(total)}}(t)$ depending on the capital coverage ratio:

$$\begin{aligned} \begin{aligned} {\hat{\pi }}^{\text {(total)}}(t) V_{ij}^{\text {(total)}}(t)&= {\hat{\pi }}^{\text {(inv)}}(t) V_{ij}^{\text {(inv)}}(t) \Leftrightarrow {} {\hat{\pi }}^{\text {(total)}}(t) = \frac{V_{ij}^{\text {(inv)}}(t)}{V_{ij}^{\text {(total)}}(t)} {\hat{\pi }}^{\text {(inv)}}(t)\\&{\mathop {=}\limits ^{(3.10)}} \frac{(1 - \alpha ) V_{ij}^{\text {(total)}}(t) + \alpha E_{ij}(t)}{V_{ij}^{\text {(total)}}(t)} {\hat{\pi }}^{\text {(inv)}}(t) \\&{\mathop {=}\limits ^{3.4}} \frac{(1 - \alpha ) CCR_{ij}^{\text {(total)}}(t) + \alpha }{CCR_{ij}^{\text {(total)}}(t)} {\hat{\pi }}^{\text {(inv)}}(t). \end{aligned} \end{aligned}$$

(3.25)

Since $CCR_{ij}^{\text {(total)}}(t) \in [100 \%, 125 \%]$ by regulation, we obtain

$$\begin{aligned} {\hat{\pi }}^{\text {(total)}}(t) = \frac{(1 - \alpha ) CCR_{ij}^{\text {(total)}}(t) + \alpha }{CCR_{ij}^{\text {(total)}}(t)} {\hat{\pi }}^{\text {(inv)}}(t) \in \left[ \frac{(1 - \alpha ) 1.25 + \alpha }{1.25}, 1\right] \cdot {\hat{\pi }}^{\text {(inv)}}(t). \end{aligned}$$

(3.26)

It follows immediately that ${\hat{\pi }}^{\text {(total)}}(t) \le {\hat{\pi }}^{\text {(inv)}}(t)$, hence the buffer indeed dampens the relative risky investment for the total portfolio. To prevent from leverage for a given $\alpha $, one has to restrict $({\hat{\pi }}^{\text {(inv)}}(t))' {\mathbf {1}} \le 1$ resp. $({\hat{\pi }}^{\text {(total)}}(t))' {\mathbf {1}} \le 1$. To exclude short-selling, one needs to enforce ${\hat{\pi }}^{\text {(total)}}(t) \ge {\mathbf {0}}$ resp. ${\hat{\pi }}^{\text {(inv)}}(t) \ge {\mathbf {0}}$.

Finally, the discretized version of the stochastic differential equation (3.24) of the total wealth is given by

$$\begin{aligned} \begin{aligned} V_{ij}^{\text {(total)}}(t + \Delta ) &= V_{ij}^{\text {(total)}}(t) + \left[ E_{ij}(t) + (1 - \alpha ) \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \right] \\&\quad \times \left[ \left( r + {\hat{\pi }}^{\text {(inv)}}(t)' (\mu -r {\mathbf {1}})\right) \Delta + {\hat{\pi }}^{\text {(inv)}}(t)' \sigma \sqrt{\Delta } Z\right] - P_{ij}(t) \Delta , \end{aligned} \end{aligned}$$

(3.27)

where $Z \sim {\mathcal {N}}(0,1)$ is an N-dimensional vector of independent standard normal random variables. Moreover, based on Eqs. (3.16) and (3.17), the discrete-time version of the pension rate development is

$$\begin{aligned} P_{ij}(t + \Delta ) = {\left\{ \begin{array}{ll} P_{ij}(t),&{}\quad \text {if } \frac{V_{ij}^{\text {(total)}}(t + \Delta )}{\frac{P_{ij}(t)}{r + \lambda _{x}}} \in [100 \%, 125 \%] \\ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x}) V_{ij}^{\text {(total)}}(t + \Delta ),&{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

(3.28)

Equation (3.28) tells that if the past performance of the total wealth investment was very high, then the pension rate for the next period gets larger. In opposite, if the performance of the total wealth in the preceding period was very low, the pension rate for the upcoming period gets reduced. Finally, the pension rate remains unchanged if the total wealth stays within some lower and upper boundary.

4 The decumulation phase portfolio selection problem

4.1 Continuous-time optimization problem

The fund’s target is to maximize the client’s expected accumulated utility coming from the stochastic future pension cash flows. The buffer portfolio is established to reduce the probability of undesired pension shortenings and thus to keep the pension more stable. The risk-return tradeoff in the optimization depends on the type of applied utility function U. Since no bequest payments are considered, the continuous-time portfolio selection problem for an initial wealth $V_{ij}^{\text {(total)}}(T) = v_{0}$ and planning horizon ${\tilde{T}} \in (T, \infty ]$ is given by

$$\begin{aligned} \begin{aligned}&{\mathcal {V}}(v_{0}, c_{ij}^{\text {(buffer)}}) = \sup _{\pi ^{\text {(inv)}} \in \Lambda } {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}}) \\&\text {s.t. } {\left\{ \begin{array}{ll} {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}}) = {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} e^{- \lambda _{x} (t-T)} U(t, P_{ij}(t)) dt\right] , \\ {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} e^{- \lambda _{x} (t-T)} \frac{{\tilde{Z}}(t)}{{\tilde{Z}}(T)} P_{ij}(t)dt\right] \le v_{0}. \end{array}\right. } \end{aligned} \end{aligned}$$

(4.1)

The dynamics of $P_{ij}(t)$ is covered by Eqs. (3.16)–(3.17). The set $\Lambda $ covers all admissible strategies $\pi ^{\text {(inv)}}$. A personal discount rate can be hidden in U. Later we select U to be an increasing concave utility function, which means that the client prefers a larger pension rate $P_{ij}(t)$, but an increase in the pension rate would lead to less additional satisfaction the larger the pension rate already is. The objective function that is to be maximized in Problem (4.1) arises from

$$\begin{aligned} \begin{aligned} {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}})&:= {\mathbb {E}}\left[ \int _{T}^{\tau _{ij}^x(T) \wedge {\tilde{T}}} U(t, P_{ij}(t)) dt\right] \\&= {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} U(t, P_{ij}(t)) {\mathbbm {1}}_{\tau _{ij}^x(T) \ge t} dt\right] \\&= \int _{T}^{{\tilde{T}}} {\mathbb {E}}\left[ U(t, P_{ij}(t)) \bigg | \tau _{ij}^x(T) \ge t\right] {\mathbb {P}}\left( \tau _{ij}^x(T) \ge t\right) dt \\&= \int _{T}^{{\tilde{T}}} {\mathbb {E}}\left[ U(t, P_{ij}(t))\right] {\mathbb {P}}\left( \tau _{ij}^x(T) \ge t\right) dt \\&= \int _{T}^{{\tilde{T}}} e^{- \lambda _{x} (t-T)} {\mathbb {E}}\left[ U(t, P_{ij}(t))\right] dt \\&= {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} e^{- \lambda _{x} (t-T)} U(t, P_{ij}(t)) dt\right] . \end{aligned} \end{aligned}$$

(4.2)

Similarly, the budget constraint in Problem (4.1) arises from

$$\begin{aligned} \begin{aligned} v_{0}&\ge {\mathbb {E}}\left[ \int _{T}^{\tau _{ij}^x(T) \wedge {\tilde{T}}} \frac{{\tilde{Z}}(t)}{{\tilde{Z}}(T)} P_{ij}(t) dt\right] = {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} \frac{{\tilde{Z}}(t)}{{\tilde{Z}}(T)} P_{ij}(t) {\mathbbm {1}}_{\tau _{ij}^x(T) \ge t} dt\right] \\&= {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} e^{- \lambda _{x} (t-T)} \frac{{\tilde{Z}}(t)}{{\tilde{Z}}(T)} P_{ij}(t) dt\right] . \end{aligned} \end{aligned}$$

(4.3)

Throughout, let the intertemporal utility function U(t, p) admit the following form:

$$\begin{aligned} U(t,p) := e^{- \beta (t-T)} {\tilde{U}}(p), \end{aligned}$$

(4.4)

where ${\tilde{U}}$ is a strictly increasing and concave utility function and $\beta \ge 0$ denotes the subjective discount rate with utility discount factor $e^{- \beta (t-T)}$.

4.2 Discrete-time dynamic optimization

In what follows we target to solve Problem (4.1). Due to the nature and complexity of the scheme (especially the pension rate adjustment mechanism) coming from the regulatory requirements, we consider the discrete-time version of Problem (4.1) from now on and apply discrete-time dynamic optimization methods. We first translate the Problem (4.1) into the corresponding discrete-time problem. For this sake, we divide the investment period $[T, {\tilde{T}}]$ into an equidistant grid with a distance of $\Delta > 0$ between every grid point

$$\begin{aligned} t^{(k)} := T + \Delta \cdot k,\quad k = 0, \ldots , N_{\Delta } \end{aligned}$$

(4.5)

with $N_{\Delta } := \frac{{\tilde{T}} - T}{\Delta }$, such that $t^{(0)} = T$ and $t^{(N_{\Delta })} = {\tilde{T}}$ with $t^{(k+1)} - t^{(k)} \equiv \Delta $. We assume $N_{\Delta } = \frac{{\tilde{T}} - T}{\Delta } {\mathop {\in }\limits ^{!}} {\mathbb {N}}$ which for instance holds true if ${\tilde{T}} - T \in {\mathbb {N}}$ is in full years and $\Delta \in \{1, \frac{1}{2}, \frac{1}{4}, \frac{1}{12}, \frac{1}{52}, \frac{1}{250}, \ldots \}$, i.e. pension rate adjustments and rebalancing of the portfolio take place annually, semi-annually, quarterly, monthly, weekly, daily, etc.. The decision variable $\pi ^{\text {(inv)}}(t^{(k)})$ is applied on the entire interval $[t^{(k)}, t^{(k+1)}) = [t^{(k)}, t^{(k)} + \Delta )$ and is updated again at time $t^{(k+1)} = t^{(k)} + \Delta $; the same holds for $P_{ij}(t)$. Within the discrete framework, the objective function ${\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}})$ that is to be maximized translates to

$$\begin{aligned} \begin{aligned} {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}})&= {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} e^{- \lambda _{x} (t-T)} U(t, P_{ij}(t)) dt\right] \\&= {\mathbb {E}}\left[ \int _{T}^{{\tilde{T}}} e^{- (\lambda _{x} + \beta ) (t-T)} {\tilde{U}}(P_{ij}(t)) dt\right] \\&= {\mathbb {E}}\left[ \sum _{k = 0}^{N_{\Delta } - 1} \int _{t^{(k)}}^{t^{(k+1)}} e^{- (\lambda _{x} + \beta ) (t-T)} {\tilde{U}}(P_{ij}(t)) dt\right] \\ {\mathop {=}\limits ^{P_{ij}(t) \equiv P_{ij}(t^{(k)}) \text { on } [t^{(k)}, t^{(k+1)})}} {}&{\mathbb {E}}\left[ \sum _{k = 0}^{N_{\Delta } - 1} \int _{t^{(k)}}^{t^{(k+1)}} e^{- (\lambda _{x} + \beta ) (t-T)} {\tilde{U}}(P_{ij}(t^{(k)})) dt\right] . \end{aligned} \end{aligned}$$

(4.6)

For simplifying notations, let us define for ${k \in \{0, \ldots , N_{\Delta }\}}$:

$$\begin{aligned} \begin{aligned} V_{(k)}& := V_{ij}^{\text {(total)}}(t^{(k)}), \\ P_{(k)} &:= P_{ij}(t^{(k)}), \\ S_{(k)} &:= \left( V_{(k)}, P_{(k)}\right) , \\ a_{k} &:= {\hat{\pi }}^{\text {(inv)}}(t^{(k)}) = {\hat{\pi }}^{\text {(inv)}}(t^{(k)}; S_{(k)}), \\ {\mathcal {F}}_{k} &:= {\mathcal {F}}_{t^{(k)}}. \end{aligned} \end{aligned}$$

(4.7)

$S_{(k)}$ denotes the two-dimensional state space with $S_{(k)} \subseteq {\mathbb {R}}_{+}^{2}$. $a_{k}$ is the action (or control variable) for period $[t^{(k)}, t^{(k+1)})$. It is the risky relative investment strategy of the investment portfolio, with $a_{k} \in {\mathbb {A}}$, where ${\mathbb {A}} := \left\{ a \in [0,1]^{N}:\ a' {\mathbf {1}} \le 1\right\} $ denotes the set that includes all possible portfolio weights at a given time point. The definition of ${\mathbb {A}}$ ensures the following: For a vector $a \in {\mathbb {A}}$, $a \ge {\mathbf {0}}$ prevents from short-selling of a risky asset, $a' {\mathbf {1}} \le 1$ rules out leverage. In the case of a single asset class ($N = 1$), the set reduces to ${\mathbb {A}} = [0,1]$. Moreover, ${\mathcal {F}}_{k}$ contains all the information accumulated from time $t = 0$ to time $t = t^{(k)}$, which particularly includes the information $\left( V_{ij}^{\text {(total)}}(t^{(k)}), P_{ij}(t^{(k)})\right) = \left( V_{(k)}, P_{(k)}\right) = S_{(k)}$. The optimization problem in discrete time then reads

$$\begin{aligned} \begin{aligned}&{\mathcal {V}}(v_{0}, c_{ij}^{\text {(buffer)}}) = \sup _{a_{0}, \ldots , a_{N_{\Delta }} \in {\mathbb {A}}} {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}}) \\&\text {s.t. } {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}}) = {\mathbb {E}}\left[ \sum _{k = 0}^{N_{\Delta } - 1} \int _{t^{(k)}}^{t^{(k+1)}} e^{- (\lambda _{x} + \beta ) (t-T)} {\tilde{U}}(P_{(k)}) dt\right] . \end{aligned} \end{aligned}$$

(4.8)

We now address the stochastic control problem in (4.8). We assume a Markov model, i.e. the objects at time $t^{(k+1)}$ depend only on the respective objects at time $t^{(k)}$ but not on all preceding times $t^{(0)}, \ldots , t^{(k-1)}$. Hence, the information $S_{(k)}$ at time $t^{(k)}$ is sufficient, ${\mathcal {F}}_{k}$ contains additional but unnecessary information. In view of the dynamic programming principle (or Bellman’s principle) (cf. [2, 4, 5, 14], or [9] for an application), we consider the following time-t problem (for convenience let $t = t^{(k)}$ for some $k \in \{0, \ldots , N_{\Delta } - 1\}$):

$$\begin{aligned} \begin{aligned}&{\mathcal {V}}_{k}(S_{(k)}; c_{ij}^{\text {(buffer)}}) = \sup _{a_{k}, \ldots , a_{N_{\Delta } - 1} \in {\mathbb {A}}} {\mathcal {J}}_{k}(a;S_{(k)}, c_{ij}^{\text {(buffer)}}) \\&\text {s.t. } {\mathcal {J}}_{k}(a;S_{(k)}, c_{ij}^{\text {(buffer)}}) = {\mathbb {E}}\left[ \sum _{i = k}^{N_{\Delta } - 1} \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - t^{(k)})} {\tilde{U}}(P_{(i)}) du \bigg | S_{(k)}\right] \end{aligned} \end{aligned}$$

(4.9)

and

$$\begin{aligned} {\mathcal {J}}_{N_{\Delta }}(a;S_{(N_{\Delta })}, c_{ij}^{\text {(buffer)}}) = 0. \end{aligned}$$

(4.10)

As we have a Markov model, we search for the optimal asset allocation decision rule $a_{k}^{\star } = {\hat{\pi }}^{\star \text {(inv)}}(t^{(k)}) = {\hat{\pi }}^{\star \text {(inv)}}(S_{(k)})$ at every time $t^{(k)}$. Note

$$\begin{aligned} {\mathcal {J}}_{0}(a;S_{(0)}, c_{ij}^{\text {(buffer)}}) = {\mathcal {J}}(\pi ^{\text {(inv)}};v_{0}, c_{ij}^{\text {(buffer)}}), \end{aligned}$$

(4.11)

where $S_{(0)} = (V_{(0)}, P_{(0)}) {\mathop {=}\limits ^{(3.16)}} (v_{0}, \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x}) v_{0} )$. In order to write the Bellman equation associated with Problem (4.9), the definition of the state transition function comes next. Let $Z \sim {\mathcal {N}}(0,1)$ be a multi-dimensional vector of independent standard normal random variables of dimension N (= number of risky assets). Z represents the stochastic part of the fund return in period $[t^{(k)}, t^{(k+1)})$ (independent in every period), i.e. Z is the risk driver or risk factor that drives the fund’s performance besides the deterministic drift part. According to Eqs. (3.27) and (3.28), the transition function $T_{B}$ for $S_{(k)} \mapsto S_{(k+1)}$ is

$$\begin{aligned} \begin{aligned}&T_{B}: {\mathbb {R}}_{+}^{2} \times {\mathbb {A} }\times {\mathbb {R}}\rightarrow {\mathbb {R}}_{+}^{2}, \\&\left( S_{(k)}, a_{k}, Z\right) \mapsto S_{(k+1)} \\&\quad = T_{B}(S_{(k)}, a_{k}, Z) = \left( \begin{matrix} V_{(k+1)}(S_{(k)}, a_{k}, Z) \\ P_{(k+1)}(S_{(k)}, a_{k}, Z) \end{matrix}\right) = \left( \begin{matrix} T_{B}^{(V)}(S_{(k)}, a_{k}, Z) \\ T_{B}^{(P)}(S_{(k)}, a_{k}, Z) \end{matrix}\right) , \end{aligned} \end{aligned}$$

(4.12)

where

$$\begin{aligned} \begin{aligned} V_{(k+1)}&= T_{B}^{(V)}(S_{(k)}, a_{k}, Z) \\&{\mathop {=}\limits ^{(3.27)}} V_{(k)} + \left[ E_{ij}(t^{(k)} | S_{(k)}) + (1 - \alpha ) \left( V_{(k)} - E_{ij}(t^{(k)} | S_{(k)})\right) \right] \\&\quad \times \left[ \left( r + a_{k}' (\mu -r {\mathbf {1}})\right) \Delta + a_{k}' \sigma \sqrt{\Delta } Z\right] - P_{(k)} \Delta \end{aligned} \end{aligned}$$

(4.13)

and

$$\begin{aligned} P_{(k+1)}&= T_{B}^{(P)}(S_{(k)}, a_{k}, Z) {\mathop { }\limits ^{(3.28)}}\\&= {\left\{ \begin{array}{ll} P_{(k)},&{}\quad \text { if } \frac{T_{B}^{(V)}(S_{(k)}, a_{k}, Z)}{E_{ij}(t^{(k+1)} | S_{(k)})} \in [100 \%, 125 \%] \\ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x}) T_{B}^{(V)}(S_{(k)}, a_{k}, Z),&{}\quad \text {otherwise} \end{array}\right. } \end{aligned}$$

(4.14)

with

$$\begin{aligned} \begin{aligned} E_{ij}(t^{(j)} | S_{(i)}) = \frac{P_{(i)}}{r + \lambda _{x}} \end{aligned} \end{aligned}$$

(4.15)

for $j \ge i$. We further have

$$\begin{aligned} V_{ij}^{\text {(buffer)}}(t^{(k)})&= \alpha \left( V_{(k)} - E_{ij}(t^{(k)} | S_{(k)})\right) , \\ V_{ij}^{\text {(inv)}}(t^{(k)})&= V_{(k)} - V_{ij}^{\text {(buffer)}}(t^{(k)}) \\&= E_{ij}(t^{(k)} | S_{(k)}) + (1 - \alpha ) \left( V_{(k)} - E_{ij}(t^{(k)} | S_{(k)})\right) , \\ CCR_{ij}^{\text {(total)}}(t^{(k)})&= \frac{V_{(k)}}{E_{ij}(t^{(k)} | S_{(k)})}, \\ {\hat{\pi }}^{\text {(total)}}(t^{(k)})&{\mathop {=}\limits ^{(3.25)}}\frac{V_{ij}^{\text {(inv)}}(t^{(k)})}{V_{(k)}} a_{k} \\&= \frac{V_{(k)} - V_{ij}^{\text {(buffer)}}(t^{(k)})}{V_{(k)}} a_{k} {\mathop {=}\limits ^{3.25}} \frac{(1 - \alpha ) CCR_{ij}^{\text {(total)}}(t^{(k)}) + \alpha }{CCR_{ij}^{\text {(total)}}(t^{(k)})} a_{k}. \end{aligned}$$

(4.16)

4.3 Bellman equation

The definition of the transition function enables us to introduce the associated Bellman equation

$$\begin{aligned} \begin{aligned} k&= N_{\Delta }:\ {\mathcal {V}}_{N_{\Delta }}(S_{(N_{\Delta })}; c_{ij}^{\text {(buffer)}}) = 0, \\ k&\in \{N_{\Delta } - 1, \ldots , 0\}:\ {\mathcal {V}}_{k}(S_{(k)}; c_{ij}^{\text {(buffer)}}) = r_{k}(S_{(k)}) \\&\quad + e^{- (\lambda _{x} + \beta ) \Delta } \sup _{a_{k} \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ {\mathcal {V}}_{k+1}(S_{(k+1)}; c_{ij}^{\text {(buffer)}}) \bigg | S_{(k)}\right] \right\} , \end{aligned} \end{aligned}$$

(4.17)

with $S_{(k+1)} = T_{B}(S_{(k)}, a_{k}, Z)$. The formula for $k \in \{N_{\Delta } - 1, \ldots , 0\}$ follows from inserting the one-period or one-stage reward function $r_{k}(S_{(k)}, a_{k}) \equiv r_{k}(S_{(k)})$ into the original equation

$$\begin{aligned} \begin{aligned} {\mathcal {V}}_{k}(S_{(k)}; c_{ij}^{\text {(buffer)}}) = \sup _{a_{k} \in {\mathbb {A}}} \left\{ r_{k}(S_{(k)}, a_{k}) + e^{- (\lambda _{x} + \beta ) (t^{(k+1)} - t^{(k)})} {\mathbb {E}}\left[ {\mathcal {V}}_{k+1}(S_{(k+1)}; c_{ij}^{\text {(buffer)}}) \bigg | S_{(k)}\right] \right\} . \end{aligned} \end{aligned}$$

(4.18)

The one-period reward function describes the contribution or reward to the client’s satisfaction in the period $[t^{(k)}, t^{(k+1)})$ linked to the pension $P_{(k)}$ that is paid out in $[t^{(k)}, t^{(k+1)})$ independently of the action or applied relative risky investment strategy $a_{k} = {\hat{\pi }}^{\text {(inv)}}(t^{(k)})$. Since the value for $r_{k}(S_{(k)})$ is already known at time $t^{(k)}$, i.e. it is deterministic and independent of the decision $a_{k}$. We moreover obtain

$$\begin{aligned} \begin{aligned} r_{k}(S_{(k)}, a_{k}) \equiv r_{k}(S_{(k)}) &= \int _{t^{(k)}}^{t^{(k+1)}} e^{- (\lambda _{x} + \beta ) (u - t^{(k)})} {\tilde{U}}(P_{(k)}) du = e^{(\lambda _{x} + \beta ) t^{(k)}} {\tilde{U}}(P_{(k)})\\&\quad \times \int _{t^{(k)}}^{t^{(k+1)}} e^{- (\lambda _{x} + \beta ) u} du \\ &= e^{(\lambda _{x} + \beta ) t^{(k)}} {\tilde{U}}(P_{(k)}) \left[ \frac{e^{- (\lambda _{x} + \beta ) u}}{- (\lambda _{x} + \beta )} \bigg |_{u = t^{(k)}}^{u = t^{(k+1)}}\right] \\ &= e^{(\lambda _{x} + \beta ) t^{(k)}} {\tilde{U}}(P_{(k)}) \left[ \frac{e^{- (\lambda _{x} + \beta ) t^{(k+1)}} - e^{- (\lambda _{x} + \beta ) t^{(k)}}}{- (\lambda _{x} + \beta )}\right] \\ &= {\tilde{U}}(P_{(k)}) \left[ \frac{e^{- (\lambda _{x} + \beta ) (t^{(k+1)} - t^{(k)})} - 1}{- (\lambda _{x} + \beta )}\right] \\ &= \frac{1}{\lambda _{x} + \beta } \left( 1 - e^{- (\lambda _{x} + \beta ) (t^{(k+1)} - t^{(k)})}\right) {\tilde{U}}(P_{(k)}) \\ &= \frac{1}{\lambda _{x} + \beta } \left( 1 - e^{- (\lambda _{x} + \beta ) \Delta }\right) {\tilde{U}}(P_{(k)}). \end{aligned} \end{aligned}$$

(4.19)

The original discrete-time dynamic optimization problem (4.8) can then be solved by backwards induction of the Bellman equation (4.17). The optimal decision rule or policy $a_{k}^{\star } = {\hat{\pi }}^{\star \text {(inv)}}(S_{(k)})$ needs to be determined in any step and for every possible state $S_{(k)}$ backwards in time. By this, we further receive the optimal total risky relative portfolio process ${\hat{\pi }}^{\star \text {(total)}} = {\hat{\pi }}^{\text {(total)}}(a_{k}^{\star })$ through Eq. (4.16).

4.4 Extension to a single-cohort model

In this section we briefly describe one possible method of how the so far explained single-client model can easily be extended and aggregated into a single-cohort model where one cohort covers all clients of (roughly) the same age. Let us consider a cohort of clients grouped by age ($x = x(j)$ years old at time T) that has $m_{j}$ members. We manage the total cohort portfolio and the pension collectively and thus define

$$\begin{aligned} P_{j}(t) := \sum _{i = 1}^{m_j} P_{ij}(t) \end{aligned}$$

(4.20)

to be the sum of all pension payments $P_{ij}(t)$ connected to all members i in cohort j at time t. Since we consider one cohort, there are no intertemporal inflows into the model. We assume that there is no bequest paid out in the case of a cohort member’s death. Further we re-interpret the mortality model: The survival probability

$$\begin{aligned} {\mathbb {P}}\left( \tau _{ij}^{x(j)}(T) \ge s \bigg | \tau _{ij}^{x(j)}(T) \ge t\right) = e^{- \lambda _{x(j)} (s - t)},\quad s \ge t, \end{aligned}$$

(4.21)

for a single client is now regarded as the average relative survival frequency of the cohort, i.e. we assume $e^{- \lambda _{x(j)} (s - t)}$ to be the average proportion of clients in cohort j that survive from time t to time s. This comes from the following observation: Let $\tau _{ij}^{x(j)}(T)$ denote the uncertain remaining lifetime of client i in cohort j which is identically distributed among all clients in one cohort. Then the uncertain proportion of survivors from time t to s in the cohort is described by the random variable $\frac{\sum _{i = 1}^{m_j} {\mathbbm {1}}_{\{\tau _{ij}^{x(j)}(T) \ge s | \tau _{ij}^{x(j)}(T) \ge t\}}}{m_j}$. Its expectation is

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \frac{\sum _{i = 1}^{m_j} {\mathbbm {1}}_{\{\tau _{ij}^{x(j)}(T) \ge s | \tau _{ij}^{x(j)}(T) \ge t\}}}{m_j}\right]& =\frac{\sum _{i = 1}^{m_j} {\mathbb {E}}\left[{ \mathbbm {1}}_{\{\tau _{ij}^{x(j)}(T) \ge s | \tau _{ij}^{x(j)}(T) \ge t\}}\right] }{m_j} \\ &= \frac{\sum _{i = 1}^{m_j} {\mathbb {P}}(\tau _{ij}^{x(j)}(T) \ge s | \tau _{ij}^{x(j)}(T) \ge t)}{m_j} \\ &{\mathop {=}\limits ^{\tau _{ij}^{x(j)}(T) \text { identically distributed } \forall i \in \{1, \ldots , m_j\}}} \frac{\sum _{i = 1}^{m_j} {\mathbb {P}}(\tau _{1j}^{x(j)}(T) \ge s | \tau _{1j}^{x(j)}(T) \ge t)}{m_j} \\ &= \frac{m_j {\mathbb {P}}(\tau _{1j}^{x(j)}(T) \ge s | \tau _{1j}^{x(j)}(T) \ge t)}{m_j} \\ &= {\mathbb {P}}(\tau _{1j}^{x(j)}(T) \ge s | \tau _{1j}^{x(j)}(T) \ge t) = e^{- \lambda _{x(j)} (s - t)}. \end{aligned} \end{aligned}$$

(4.22)

In other words, the average cohort proportion of surviving clients equals the survival probability of a single client in this cohort. Moreover, since the number of customers in cohort j reduces continuously in time due to deaths of cohort members, the average pension cash flows $P_{j}(t)$ needs to be adjusted to

$$\begin{aligned} P_{j}(s) := e^{- \lambda _{x(j)} (s - t)} P_{j}(t),\ s \ge t, \end{aligned}$$

(4.23)

assuming that all single-client pensions remain constant, and only those connected to a client’s death are removed. We define $P_{(k)} := P_{j}(t^{(k)})$ in the state $S_{(k)} = \left( V_{(k)}, P_{(k)}\right) $, where $V_{(k)}$ denotes the total collective wealth of cohort j. For this reason, we have to modify the transition function $T_{B}^{(P)}$ for the pension $P_{(k)}$ as follows:

$$\begin{aligned} P_{(k+1)}&= T_{B}^{(P)}(S_{(k)}, a_{k}, Z) \\&= {\left\{ \begin{array}{ll} e^{- \lambda _{x(j)} (t^{(k+1)} - t^{(k)})} P_{(k)},&{} \quad \text { if } \frac{T_{B}^{(V)}(S_{(k)}, a_{k}, Z)}{E_{j}(t^{(k+1)} | S_{(k)})} \in [100 \%, 125 \%] \\ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x(j)}) T_{B}^{(V)}(S_{(k)}, a_{k}, Z), &{} \quad \text { otherwise} \end{array}\right. } \end{aligned}$$

(4.24)

with

$$\begin{aligned} E_{j}(t^{(l)} | S_{(i)}) = \frac{e^{- \lambda _{x(j)} (t^{(l)} - t^{(i)})} P_{(i)}}{r + \lambda _{x(j)}} \end{aligned}$$

(4.25)

for $l \ge i$. Following earlier definitions we further introduce the collective cohort-specific functionals

$$\begin{aligned} \begin{aligned} E_{j}(t)&:= \sum _{i=1}^{m_j} E_{ij}(t) {\mathop {=}\limits ^{(3.3)}} \sum _{i=1}^{m_j} \frac{P_{ij}(t)}{r + \lambda _{x}} {\mathop {=}\limits ^{(4.20)}} \frac{P_{j}(t)}{r + \lambda _{x}}, \\ V_{j}^{\text {(total)}}(t)&:= \sum _{i=1}^{m_j} V_{ij}^{\text {(total)}}(t), \\ V_{j}^{\text {(buffer)}}(t)&:= \sum _{i=1}^{m_j} V_{ij}^{\text {(buffer)}}(t) {\mathop {=}\limits ^{(3.21)}} \sum _{i=1}^{m_j} \alpha \left( V_{ij}^{\text {(total)}}(t) - E_{ij}(t)\right) \\&=\alpha \left( V_{j}^{\text {(total)}}(t) - E_{j}(t)\right) , \\ V_{j}^{\text {(inv)}}(t)&:= \sum _{i=1}^{m_j} V_{ij}^{\text {(inv)}}(t) {\mathop {=}\limits ^{(3.6)}} \sum _{i=1}^{m_j} \left( V_{ij}^{\text {(total)}}(t) - V_{ij}^{\text {(buffer)}}(t)\right) \\&= V_{j}^{\text {(total)}}(t) - V_{j}^{\text {(buffer)}}(t) \\&= E_{j}(t) + (1 - \alpha ) \left( V_{j}^{\text {(total)}}(t) - E_{j}(t)\right) , \\ CCR_{j}^{\text {(total)}}(t)&:= \frac{V_{j}^{\text {(total)}}(t)}{E_{j}(t)},\ CCR_{j}^{\text {(inv)}}(t) := \frac{V_{j}^{\text {(inv)}}(t)}{E_{j}(t)},\ CCR_{j}^{\text {(buffer)}}(t) := \frac{V_{j}^{\text {(buffer)}}(t)}{E_{j}(t)}. \end{aligned} \end{aligned}$$

(4.26)

Note that the properties $CCR_{ij}^{\text {(inv)}}(t_{n}) \equiv {\bar{p}}$ and $CCR_{ij}^{\text {(total)}}(t_{n}) \equiv \frac{{\bar{p}} - \alpha }{1 - \alpha }$ at the re-adjustment times $t_{n}$ are passed to the collective objects

$$\begin{aligned} \begin{aligned} CCR_{j}^{\text {(inv)}}(t) &= \frac{V_{j}^{\text {(inv)}}(t)}{E_{j}(t)} = \frac{\sum _{i=1}^{m_j} V_{ij}^{\text {(inv)}}(t)}{\sum _{i=1}^{m_j} E_{ij}(t)} {\mathop {=}\limits ^{(3.14)}} \frac{\sum _{i=1}^{m_j} {\bar{p}} E_{ij}(t)}{\sum _{i=1}^{m_j} E_{ij}(t)} = {\bar{p}} = CCR_{ij}^{\text {(inv)}}(t), \\ CCR_{j}^{\text {(total)}}(t)& = \frac{V_{j}^{\text {(total)}}(t)}{E_{j}(t)} = \frac{\sum _{i=1}^{m_j} V_{ij}^{\text {(total)}}(t)}{\sum _{i=1}^{m_j} E_{ij}(t)} {\mathop {=}\limits ^{(3.16)}} \frac{\sum _{i=1}^{m_j} \frac{{\bar{p}} - \alpha }{1 - \alpha } E_{ij}(t)}{\sum _{i=1}^{m_j} E_{ij}(t)} \\ &=\frac{{\bar{p}} - \alpha }{1 - \alpha } = CCR_{ij}^{\text {(total)}}(t). \end{aligned} \end{aligned}$$

(4.27)

Hence, the regulatory constraint $CCR_{ij}^{\text {(total)}}(t) \in [100 \%, 125 \%]$ for any single customer is satisfied iff it is satisfied for the cohort-specific constraint $CCR_{j}^{\text {(total)}}(t) \in [100 \%, 125 \%]$ on the collective fund. In summary, under the proposed framework, both collective ratios $CCR_{j}^{\text {(total)}}(t)$ as well as $CCR_{j}^{\text {(inv)}}(t)$ coincide with the individual ratios $CCR_{ij}^{\text {(total)}}(t)$ and $CCR_{ij}^{\text {(total)}}(t)$.

According to the definition of the transition function in Eq. (4.24), if $CCR_{j}^{\text {(total)}}(t^{(k+1)})$ stays inside its pre-defined corridor, the collective cohort pension $P_{(k+1)}$ at time $t^{(k+1)}$ decreases with rate $\lambda _{x(j)}$ (on average) due to the deaths of cohort members. At the same time, the individual pensions of clients that survived until time $t^{(k+1)}$ remain untouched, i.e. stable. Thus, $P_{(k+1)} = e^{- \lambda _{x(j)} (t^{(k+1)} - t^{(k)})} P_{(k)}$ indicates a stable, constant individual pension $P_{ij}(t^{(k+1)}) = P_{ij}(t^{(k)})$ for those clients in the cohort that are still alive at time $t^{(k+1)}$. Using this notation, the Bellman equation in (4.17) and the one-period reward function in (4.19) remain the same.^{Footnote 5} Finally, due to this definition, $E_{j}$ decreases in time (death of cohort members). As there are no bequest payments, this implies that the $CCR_{j}^{\text {(total)}}$ is more likely to cross the $125 \%$-border and less likely to fall short the $100 \%$-border compared to the single-client model if the same investment strategy is applied.

Remark 1

It is remarkable that the probabilities for future reductions of individual customer pensions in the cohort model are smaller than the ones in the single-client model, whereas the probability of future pension enhancements in the cohort model are larger than in the single-client model, if the same investment strategy is applied. The economic reason is that the wealth of a client in the cohort that died in the previous period remains in the collective portfolio and is not paid out to heirs, while the cohort-related collective pension declines. Therefore, the survivors in the cohort benefit from the death of a cohort member.

5 A stationary solution

It was already shown that the discrete-time Problem (4.8) could be solved by backwards induction of the Bellman equation (4.17). However, this procedure shows noteworthy shortfalls: First, it has to be performed for every single client (or cohort) with different initial state $S_{(0)} = (V_{(0)}, P_{(0)})$. Furthermore, the computational effort dramatically increases if a long time horizon ${\tilde{T}}$ is considered. All the mentioned arguments considerably increase the computation time. To find a computationally efficient solution for an arbitrary planning horizon, an arbitrary number of decision periods and an arbitrary initial state (customer), we present an elegant approximate stationary solution next, where the solution to the finite-horizon problem is approximated with the solution to the infinite-horizon problem. The stationary solution will depend on the specific state only, but not on the time point, which makes this approach very practicable and simplifies application and implementation for a wide range of customers with different states. In general, we are now looking for a faster and more efficient algorithm to find the optimal investment decision.

5.1 The infinite-time horizon problem

Let ${\tilde{T}} = \infty $ hold. The idea is that $a_{k}^{\star }(S) \equiv a^{\star }(S) = {\hat{\pi }}^{\star \text {(inv)}}(S)$ for all states S, i.e. the optimal asset allocation decision depends on the current state and is independent of time; we seek for a stationary solution. This leads to the infinite-horizon discrete-time optimization problem

$$\begin{aligned} \begin{aligned}&{\mathcal {V}}(S) = {\mathcal {V}}(S; c_{ij}^{\text {(buffer)}}) = \sup _{a(S) \in {\mathbb {A}}} {\mathcal {J}}(a(S);S, c_{ij}^{\text {(buffer)}}) \\&\text {s.t. } {\mathcal {J}}(a(S);S, c_{ij}^{\text {(buffer)}}) = {\mathbb {E}}\left[ \sum _{i = 0}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(t^{(i)})) du\right] \end{aligned} \end{aligned}$$

(5.1)

with state $S = (V,P)$ and stochastic pension $P = P(t^{(i)})$ at time $t^{(i)}$. Due to ${\tilde{T}} = \infty $, the corresponding Bellman equation to this problem is as follows:

$$\begin{aligned} \begin{aligned} {\mathcal {V}}(S) &= \sup _{a(S) \in {\mathbb {A}}} \left\{ r(S, a(S)) + e^{- (\lambda _{x} + \beta ) \Delta } {\mathbb {E}}\left[ {\mathcal {V}}(T_{B}(S, a(S), Z)) \bigg | S\right] \right\} \\ &= r(S) + e^{- (\lambda _{x} + \beta ) \Delta } \sup _{a(S) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ {\mathcal {V}}(T_{B}(S, a(S), Z)) \bigg | S\right] \right\} , \end{aligned} \end{aligned}$$

(5.2)

where the last equality holds due to $r(S, a(S)) \equiv r(S)$ for the one-period reward. It is obvious that the problem is independent of time and falls into the class of stationary, infinite-horizon Markovian decision problems, also known as Markovian dynamic programming (MDP) problems. The transition function $T_{B}$ for $S \mapsto T_{B}(S, a, Z)$ is given by

$$\begin{aligned} \begin{aligned}&T_{B}: {\mathbb {R}}_{+}^{2} \times {\mathbb {A}} \times {\mathbb {R}}\rightarrow {\mathbb {R}}_{+}^{2},\ \left( S, a, Z\right) \mapsto T_{B}(S, a, Z) = \left( \begin{matrix} T_{B}^{(V)}(S, a, Z) \\ T_{B}^{(P)}(S, a, Z) \end{matrix}\right) , \end{aligned} \end{aligned}$$

(5.3)

where $S = (V,P)$ and

$$\begin{aligned} T_{B}^{(V)}(S, a, Z)&= V + \left[ E(S) + (1 - \alpha ) \left( V - E(S)\right) \right] \\&\quad \times \left[ \left( r + a' (\mu -r {\mathbf {1}})\right) \Delta + a' \sigma \sqrt{\Delta } Z\right] - P \Delta \end{aligned}$$

(5.4)

and^{Footnote 6}

$$\begin{aligned} T_{B}^{(P)}(S, a, Z) = {\left\{ \begin{array}{ll} P, &{} \quad \text { if } \frac{T_{B}^{(V)}(S, a, Z)}{E(S)} \in [100 \%, 125 \%] \\ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x}) T_{B}^{(V)}(S, a, Z),&{} \quad \text { otherwise} \end{array}\right. } \end{aligned}$$

(5.5)

with

$$\begin{aligned} E(S) = \frac{P}{r + \lambda _{x}}. \end{aligned}$$

(5.6)

The gap in the expected utility between the finite- and the infinite-horizon model, i.e. ${\tilde{T}} < \infty $ vs. ${\tilde{T}} = \infty $, is usually very small, but it simplifies calculations a lot. If particularly the survival probability (from time T to time ${\tilde{T}} < \infty $, for instance ${\tilde{T}} = 120$) is close to zero, the error stays rather small which implies that the approximation becomes more reliable and the approach can be justified from a mathematical perspective. In more detail, the crucial object is the absolute error in the infinite-horizon problem (5.1) compared to the finite-horizon problem (4.8):

$$\begin{aligned} \begin{aligned} error_{{\tilde{T}}, \infty }^{abs}&:= \left| {\mathbb {E}}\left[ \sum _{i = 0}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(t^{(i)})) du\right] \right. \\&\quad \left. - {\mathbb {E}}\left[ \sum _{i = 0}^{N_{\Delta } - 1} \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u-T)} {\tilde{U}}(P(t^{(i)})) du\right] \right| \\&= \left| {\mathbb {E}}\left[ \sum _{i = N_{\Delta }}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(t^{(i)})) du\right] \right| \\&= \left| {\mathbb {E}}\left[ \int _{{\tilde{T}}}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(u)) du\right] \right| . \end{aligned} \end{aligned}$$

(5.9)

The relative error is then defined as

$$\begin{aligned} \begin{aligned} error_{{\tilde{T}}, \infty }^{rel}&:= \frac{\left| {\mathbb {E}}\left[ \int _{{\tilde{T}}}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(u)) du\right] \right| }{\left| {\mathbb {E}}\left[ \int _{T}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(u)) du\right] \right| } \\&= \frac{error_{{\tilde{T}}, \infty }^{abs}}{\left| {\mathbb {E}}\left[ \int _{T}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(u)) du\right] \right| }. \end{aligned} \end{aligned}$$

(5.10)

The approximation is more reliable if the relative error is small. Thus, if one desires to use the solution to the infinite-horizon problem as an approximation for the solution to the finite-horizon problem, one needs to ensure that $error_{{\tilde{T}}, \infty }^{rel}$ is sufficiently small to justify the approach.^{Footnote 7}

From now on we consider utility functions with hyperbolic absolute risk aversion (HARA) as intertemporal pension utility function:

$$\begin{aligned} {\tilde{U}}(p)&:= {\hat{a}} \frac{1-b}{b} \left( \frac{1}{1-b} (p-F)\right) ^{b},\ U(t,p) = e^{- \beta (t-T)} {\tilde{U}}(p) \\&= e^{- \beta (t-T)} {\hat{a}} \frac{1-b}{b} \left( \frac{1}{1-b} (p-F)\right) ^{b} \end{aligned}$$

(5.11)

with coefficient of risk aversion $b < 1$, $b \ne 0$ and ${\hat{a}} > 0$, $p > F$ with $F \ge 0$. This utility function is increasing and strictly concave in the argument p and provides a floor F. For the one-period reward r(S) in the Bellman equation (5.2) the choice of a HARA utility function leads to

$$\begin{aligned} r(S) = r((V,P)) \equiv r(P) {\mathop {=}\limits ^{(4.19)}} \frac{1}{\lambda _{x} + \beta } \left( 1 - e^{- (\lambda _{x} + \beta ) \Delta }\right) {\hat{a}} \frac{1-b}{b} \left( \frac{1}{1-b} (P-F)\right) ^{b}. \end{aligned}$$

(5.12)

For ease of exposition, we place the following assumption on the utility function that is to hold from now on.

Assumption 1

Let us consider HARA utility ${\tilde{U}}(P)$ (parameterization in Eq. (5.11)) for $P_{min} \le P \le P_{max}$ with^{Footnote 8}$F< P_{min}< P_{max} < \infty $.

Assumption 1 introduces lower and upper bounds for the pension payment that is to be paid out. We then have to adjust the transition function $T_{B}^{(P)}$ for the pension to become

$$\begin{aligned} T_{B}^{(P)}(S, a, Z) = {\left\{ \begin{array}{ll} P , \text { if } \frac{T_{B}^{(V)}(S, a, Z)}{E(S)} \in [100 \%, 125 \%] \\ \max \left\{ \min \left\{ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x}) T_{B}^{(V)}(S, a, Z), P_{max}\right\} , P_{min}\right\} , \text { otherwise} \end{array}\right. } \end{aligned}$$

(5.13)

in the single-client model.^{Footnote 9} Based on Assumption 1, it can immediately be concluded that ${\tilde{U}}(p)$ is bounded. Furthermore, since the later proposed algorithm will be supposed to perform the optimization on a finite set for the action a(S) (discretization grid for ${\mathbb {A}}$ with $a(S) \in {\mathbb {A}}$), i.e. on a set with a finite number of elements, the following assumption is said to hold true from now on.

Assumption 2

${\mathbb {A}}$ has a finite number of elements.

For instance, one could assume $1 \%$-point steps in ${\mathbb {A}} = \left\{ a \in {\mathbb {A}}_{1}^{N}:\ a' {\mathbf {1}} \le 1\right\} $, ${\mathbb {A}}_{1} := \{0 \%, 1 \%, \ldots , 99 \%, 100 \%\}$, with ${\mathbb {A}} \equiv {\mathbb {A}}_{1}$ in the situation of one risky asset class ($N = 1$). Further note that every function $g : {\mathbb {A}} \rightarrow {\mathbb {R}}$ attains its maximum on the finite set ${\mathbb {A}}$. Hence, from Assumption 2 it immediately follows that the supremum over $a(S) \in {\mathbb {A}}$ turns into its maximum:

$$\begin{aligned} {\mathcal {V}}(S)&{\mathop {=}\limits ^{(5.2)}} r(S) + e^{- (\lambda _{x} + \beta ) \Delta } \sup _{a(S) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ {\mathcal {V}}(T_{B}(S, a(S), Z)) \bigg | S\right] \right\} \\&{\mathop {=}\limits ^{\text {Assumption 2}}} r(S) + e^{- (\lambda _{x} + \beta ) \Delta } \max _{a(S) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ {\mathcal {V}}(T_{B}(S, a(S), Z)) \bigg | S\right] \right\} . \end{aligned}$$

(5.15)

Thus, the maximum is attained. In what follows we present an approach to solve the infinite-horizon problem (5.1) under Assumptions 1 and 2 via the Bellman equation (5.2).

5.2 Definition of the grid

Since the algorithm for solving the Bellman equation for every possible state S will work on grids of the relevant objects, we introduce the respective grid definitions:

1.
Discretization of the state space $S = (V,P)$: We build up the state space grid following the three steps below:
1. (a)
  One-dimensional total wealth grid V: $V_{min}$ (minimal value for V), $V_{max}$ (maximal value for V), $n_{V}$ (number of values for V with equidistant distances):
  $$\begin{aligned} V^{(i)} \in \text {Grid}(V) := \{V_{min}, \ldots , V_{max}\},\ i = 1, \ldots , n_{V}, \end{aligned}$$
  (5.16)
  with cardinality $n_{V}$.
2. (b)
  One-dimensional capital coverage ratio grid CCR: Let $\text {Grid}(CCR)$ be the grid of the capital coverage ratio that ranges from $CCR_{min} = 100 \%$ to $CCR_{max} = 125 \%$ with $n_{CCR}$ number of values in the equidistant grid, for instance $\text {Grid}(CCR) = \{1, 1.01, 1.02, \ldots , 1.23, 1.24, 1.25\}$.
3. (c)
  Two-dimensional state space S: For every $V^{(i)} \in \text {Grid}(V)$ and every $CCR^{(j)} \in \text {Grid}(CCR)$, the pair $(V^{(i)}, P^{(ij)}) \in \text {Grid}(S)$ lies in $\text {Grid}(S)$ for the state space S, where
  $$\begin{aligned} P^{(ij)} := \frac{V^{(i)}}{CCR^{(j)} \frac{1}{r + \lambda _{x}}} \Leftrightarrow CCR^{(j)} = \frac{V^{(i)}}{P^{(ij)} \frac{1}{r + \lambda _{x}}}. \end{aligned}$$
  (5.17)
  Thus, the size of $\text {Grid}(S)$ is $n_{S} = n_{V} \cdot n_{CCR}$. Notice that in view of Assumption 1 it must hold $F < \min _{i,j} P^{(ij)}$ with $\min _{i,j} P^{(ij)} = \frac{\min _{i} V^{(i)}}{\max _{j} CCR^{(j)} \frac{1}{r + \lambda _{x}}} \ge \frac{V_{min}}{1.25 \frac{1}{r + \lambda _{x}}}$ as well as $\max _{i,j} P^{(ij)} < \infty $ with $\max _{i,j} P^{(ij)} = \frac{\max _{i} V^{(i)}}{\min _{j} CCR^{(j)} \frac{1}{r + \lambda _{x}}} \le \frac{V_{max}}{\frac{1}{r + \lambda _{x}}}$.
2.
Discretization of the risk driver Z: We assume $N = 1$ from now on whenever it comes to implementation, i.e. the financial market consists of a single risky asset class that can be interpreted as a mutual fund. The stochastic return or shock is discretized by the following equidistant partition of the probability space [0, 1] for the risk factor $Z \in {\mathbb {R}}= (- \infty , \infty )$: Let $q \in (0,1)$; for instance $q = 5 \%$. The corresponding cumulative probabilities are
$$\begin{aligned} q^{(i)} := q^{(0)} + \Delta ^{(q)} \cdot i,\ i = 0, \ldots , N_{q} \end{aligned}$$
(5.18)
with $\Delta ^{(q)} := q$, $N_{q} := \frac{1 - q}{q} {\mathop {\in }\limits ^{!}} {\mathbb {N}}$ because then $\sum _{i = 0}^{N_{q}} q = (1 + N_{q}) q = 1$. For instance, one could set $q^{(0)} = 5 \%$, $\Delta ^{(q)} = 5 \%$ ($N_{q} = 19$), then $q^{(i)} = 5 \%, 10 \%, \ldots , 95 \%, 100 \%$. The corresponding values or representatives for Z with probability ${\mathbb {P}}(Z = z(q^{(i)})) = q$ and quantile probabilities $q^{(i)}$ are obtained by
$$\begin{aligned} \begin{aligned} z(q^{(0)})& := N^{-1}\left( \frac{0 + q^{(0)}}{2}\right) , \\ z(q^{(i)}) &:= N^{-1}\left( \frac{q^{(i)} + q^{(i+1)}}{2}\right) ,\ i = 1, \ldots , N_{q}-1, \\ z(q^{(N_{q})}) &:= N^{-1}\left( \frac{q^{(N_{q})} + 1}{2}\right). \end{aligned} \end{aligned}$$
(5.19)
With this definition, the z values are stronger centered around zero, with a larger step size for large positive and negative values. Then
$$\begin{aligned} Z \in \text {Grid}(Z) := \{z(q^{(0)}), \ldots , z(q^{(N_{q})})\} = \{Z_{1}, \ldots , Z_{n_{Z}}\} \end{aligned}$$
(5.20)
with cardinality $n_{Z} := N_{q} + 1$ and probabilities q, i.e. $Z_{j} = z(q^{(j-1)})$, $j = 1, \ldots , n_{Z}$.
3.
Discretization of the investment decision $a \in {\mathbb {A}}$: Lastly, we discretize the decision set for the control variable a. Since $a \in {\mathbb {A}} $ with ${\mathbb {A}} = [0,1]$, we split the interval ${\mathbb {A}} = [0,1]$ into a grid with equidistant distances and representatives
$$\begin{aligned} a_{(i)} := a_{(0)} + \Delta ^{(a)} \cdot i,\ i = 0, \ldots , N_{a}, \end{aligned}$$
(5.21)
with $N_{a} := \frac{a_{(N_{a})} - a_{(0)}}{\Delta ^{(a)}} {\mathop {\in }\limits ^{!}} {\mathbb {N}}$. It is natural to select $a_{(0)} = 0$ and $a_{(N_{a})} = 1$, or apply lower and upper bound constraints on the relative risky investment if present. Thus, if for instance $\Delta ^{(a)} = 1 \%$, a can take any integer percentage value, i.e. $a \in \left\{ a_{(0)}, \ldots , a_{(N_{\lambda })}\right\} = \left\{ 0 \%, 1\%, 2 \%, \ldots , 98 \%, 99 \%, 100 \%\right\} $. Therefore,
$$\begin{aligned} a \in \text {Grid}(a) := \{a_{(0)}, \ldots , a_{(N_{a})}\} \end{aligned}$$
(5.22)
with cardinality $n_{a} := N_{a} + 1$. It is clear that the discretization $\text {Grid}(a)$ for ${\mathbb {A}}$ fulfills Assumption 2 on ${\mathbb {A}}$.

We would like to mention that the construction of $\text {Grid}(S)$ is very efficient since it consists of admissible (V, P)-pairs only and rules out non-admissible (V, P)-pairs; admissible pairs fulfill ${V}/{\frac{P}{r + \lambda _{x}}} \in [100 \%, 125 \%]$. Furthermore, by construction we ensure that the CCR values are uniformly spread over the entire corridor $[100 \%, 125 \%]$. If now $T_{B}(S, a, Z) \not \in \text {Grid}(S)$, we select the grid node in the state space grid that provides the smallest sum of squared relative distances to $T_{B}(S, a, Z) \not \in \text {Grid}(S)$ as method for interpolation between grid points.

5.3 Stationary grid solution

The definition of the grids allows us to rewrite the expectation in the Bellman equation for any state $S^{(l)} \in \text {Grid}(S)$:

$$\begin{aligned} \begin{aligned} {\mathcal {V}}(S^{(l)}) &= r(S^{(l)}) + e^{- (\lambda _{x} + \beta ) \Delta } \max _{a(S^{(l)}) \in \text {Grid}(a)} \left\{ \sum _{j = 1}^{n_{Z}} \underbrace{{\mathbb {P}}(Z = Z_{j})}_{\equiv q} {\mathcal {V}}(T_{B}(S^{(l)}, a(S^{(l)}), Z_{j})) \right\} \\ &= r(S^{(l)}) + e^{- (\lambda _{x} + \beta ) \Delta } q \max _{a(S^{(l)}) \in \text {Grid}(a)} \left\{ \sum _{j = 1}^{n_{Z}} {\mathcal {V}}(T_{B}(S^{(l)}, a(S^{(l)}), Z_{j}))\right\} . \end{aligned} \end{aligned}$$

(5.23)

The optimal policy $a^{\star }(S^{(l)})$ for state $S^{(l)}$ is the maximizer

$$\begin{aligned} a^{\star }(S^{(l)}) := {{\,{\text{arg max}}\,}}_{a \in \text {Grid}(a)} \left\{ \sum _{j = 1}^{n_{Z}} {\mathcal {V}}(T_{B}(S^{(l)}, a, Z_{j})) \right\} . \end{aligned}$$

(5.24)

In the following we aim to solve the above Bellman equation for every $S^{(l)} \in \text {Grid}(S)$ and by this determine the optimal decisions or policies $a^{\star }(S^{(l)})$ for all states in the grid. We now treat every $a = a(S^{(l)}) \in \text {Grid}(a)$ as if it was the maximizer of the Bellman equation, and select the optimal $a^{\star }(S^{(l)})$ at the very end by choosing the one that maximizes ${\mathcal {V}}(S^{(l)})$. Hence, consider

$$\begin{aligned} {\mathcal {V}}(S^{(l)}) &= r(S^{(l)}) + e^{- (\lambda _{x} + \beta ) \Delta } q \sum _{j = 1}^{n_{Z}} {\mathcal {V}}(T_{B}(S^{(l)}, a(S^{(l)}), Z_{j})) \end{aligned}$$

(5.25)

for all $S^{(l)} \in \text {Grid}(S)$ and all $a(S^{(l)}) \in \text {Grid}(a)$ which is a linear system of equations since $T_{B}(S^{(l)}, a(S^{(l)}), Z_{j}) \in \text {Grid}(S)$ according to the applied interpolation rule if not already in the grid.

Given a specific $a(S^{(l)}) = a_{(i(l))} \in \text {Grid}(a)$ for some $i(l) \in \{0, \ldots , N_{a}\}$, we solve this linear system in ${\mathcal {V}}(S^{(l)})$ for all $l = 1, \ldots , n_{S}$ by rewriting the right-hand sum using matrix notation with $S = (S^{(1)}, \ldots , S^{(n_{S})})'$ the vector that consists of all state grid points in $\text {Grid}(S)$. Define the matrix $Q \in {\mathbb {N}}^{n_{S} \times n_{S}}$ by

$$\begin{aligned} Q := Q_{1} Q_{2} \end{aligned}$$

(5.26)

with block matrix $Q_{1} \in \{0,1\}^{n_{S} \times (n_{S} \cdot n_{Z})}$ such that

$$\begin{aligned} Q_{1} := \left( \underbrace{\begin{matrix} I_{n_{S}} \cdots I_{n_{S}} \cdots I_{n_{S}} \end{matrix}}_{n_{Z} \text { times } I_{n_{S}}}\right) , \end{aligned}$$

(5.27)

where $I_{n_{S}} \in \{0,1\}^{n_{S} \times n_{S}}$ is the identity matrix with dimension $n_{S}$. Furthermore, $Q_{2} \in \{0,1\}^{(n_{S} \cdot n_{Z}) \times n_{S}}$ is defined as a block matrix such that

$$\begin{aligned} Q_{2} := \left( \begin{matrix} A_{1} \\ \vdots \\ A_{j} \\ \vdots \\ A_{n_{Z}} \end{matrix}\right) \end{aligned}$$

(5.28)

with $A_{j} \in \{0,1\}^{n_{S} \times n_{S}}$ defined by

$$\begin{aligned} (A_{j})_{lm} := {\left\{ \begin{array}{ll} 1, \text { if } T_{B}(S^{(l)}, a_{(i(l))}, Z_{j}) = S^{(m)} \\ 0, \text { otherwise.} \end{array}\right. } \end{aligned}$$

(5.29)

Then it follows

$$\begin{aligned} Q_{l \cdot }& =\left( Q_{1} Q_{2}\right) _{l \cdot } = \left( Q_{1}\right) _{l \cdot } Q_{2} = \left( \begin{matrix} I_{n_{S}} \cdots I_{n_{S}} \cdots I_{n_{S}} \end{matrix}\right) _{l \cdot } \left( \begin{matrix} A_{1} \\ \vdots \\ A_{j} \\ \vdots \\ A_{n_{Z}} \end{matrix}\right) =: d_{l} \end{aligned}$$

(5.30)

with $d_{l} \in {\mathbb {N}}^{1 \times n_{S}}$ such that $(d_{l})_{m}$ is the number of $Z_{j}$, $j = 1, \ldots , n_{Z}$, which leads to a transition from $S^{(l)}$ to $S^{(m)}$. Consequently, it follows for $Q {\mathcal {V}}(S) \in {\mathbb {R}}^{n_{S}}$, where ${\mathcal {V}}(S) = ({\mathcal {V}}(S^{(1)}), \ldots , {\mathcal {V}}(S^{(n_{S})}))' \in {\mathbb {R}}^{n_{S}}$:

$$\begin{aligned} \left( Q {\mathcal {V}}(S)\right) _{l} &= Q_{l \cdot } {\mathcal {V}}(S) = d_{l} {\mathcal {V}}(S) = \sum _{m = 1}^{n_{S}} (d_{l})_{m} {\mathcal {V}}(S^{(m)}) = \sum _{j = 1}^{n_{Z}} {\mathcal {V}}(T_{B}(S^{(l)}, a_{(i(l))}, Z_{j})). \end{aligned}$$

(5.31)

Note $Q = Q(a)$ with $a = (a_{(i(1))}, \ldots , a_{(i(n_{S}))})'$, and therefore,

$$\begin{aligned} {\mathcal {V}}(S^{(l)})&{\mathop {=}\limits ^{(5.25)}} {} r(S^{(l)}) + e^{- (\lambda _{x} + \beta ) \Delta } q \sum _{j = 1}^{n_{Z}} {\mathcal {V}}(T_{B}(S^{(l)}, a_{(i(l))}, Z_{j})) \\&= r(S^{(l)}) + e^{- (\lambda _{x} + \beta ) \Delta } q \left( Q(a) {\mathcal {V}}(S)\right) _{l} \end{aligned}$$

(5.32)

which allows us to rewrite the linear system in the value function in matrix-vector form:

$$\begin{aligned} \begin{aligned} {\mathcal {V}}(S) = r(S) + e^{- (\lambda _{x} + \beta ) \Delta } q Q(a) {\mathcal {V}}(S) \Leftrightarrow \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) {\mathcal {V}}(S) = r(S), \end{aligned} \end{aligned}$$

(5.33)

where $r(S) = (r(S^{(1)}), \ldots , r(S^{(n_{S})}))' \in {\mathbb {R}}^{n_{S}}$. This is a linear equation system in ${\mathcal {V}}(S)$ and can easily be solved; theoretically the solution reads

$$\begin{aligned} {\mathcal {V}}(S) = \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) ^{-1} r(S). \end{aligned}$$

(5.34)

Notice that existence of the inverse of $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)$ is proven in Appendix 1.

Equation (5.33) shows that the problem of solving the Bellman equation can be reduced to finding the fixed point of the Bellman operator $\Gamma $:

$$\begin{aligned} {\mathcal {V}}(S) = \Gamma ({\mathcal {V}}(S)) := r(S) + e^{- (\lambda _{x} + \beta ) \Delta } q \max _{a \in \text {Grid}(a)} \left\{ Q(a) {\mathcal {V}}(S)\right\} . \end{aligned}$$

(5.35)

Now we would like to draw attention to the fact that $Q = Q(a)$ with $a = (a_{(i(1))}, \ldots , a_{(i(n_{S}))})'$: For this reason, we need to repeat the above for all possible combinations $i(l) \in \{0, \ldots , N_{a}\}$, $l = 1, \ldots , n_{S}$, and finally select the combination $a^{\star }(S) = (a^{\star }(S^{(1)}), \ldots , a^{\star }(S^{(n_{S})}))'$ that maximizes ${\mathcal {V}}(S)$ across all $a_{(i)}$ combinations in $\text {Grid}(a)$. The total number of combinations equals $n_{a}^{n_{S}}$. Thus, we have to calculate $n_{a}^{n_{S}}$ times an $n_{S} \times n_{S}$ transition matrix Q(a) (plus additionally the inverse of $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)$). If for instance we consider $n_{V} = 1,000$ grid points for V and $n_{CCR} = 26$ for CCR,^{Footnote 10} then $n_{S} = 26,000$. If additionally the allocation grid is divided into steps of $5 \%$, i.e. $n_{a} = 21$, we would have to calculate $21^{26,000} \approx 10^{34,378}$ times an $26,000 \times 26,000$ matrix, which is a vast number. To overcome this computational problem, we present an alternative by using a policy function iteration algorithm in the following.

5.4 Policy function iteration: the algorithm

The algorithm is a tailored version of Howard’s improvement algorithm, and iterates the policy a(S) until it converges towards its optimal value. For further readings on the policy iteration we refer to [3, 4, 12, 16,17,18] and [20]. Notice that the one-period total discount factor to this problem is $0< e^{- (\lambda _{x} + \beta ) \Delta } < 1$ and is a composite of the one-period utility discount factor $e^{- \beta \Delta }$ and the mortality discount factor $e^{- \lambda _{x} \Delta }$. For n periods, the total discount factor is $e^{- (\lambda _{x} + \beta ) n \Delta }$ and converges to zero as $n \rightarrow \infty $. In what follows we describe the policy function iterating mechanism:

Let $a^{(i)} = a^{(i)}(S)$ denote the decision value at iteration step i. Let $n_{iter}$ denote the number of iterations until the algorithm stops. The terminal $a^{(n_{iter})} = a^{(n_{iter})}(S)$ is regarded as the optimal final decision variable $a^{\star } = a^{\star }(S)$. Inside the algorithm we repeat the policy improvement and policy evaluation until a sufficient, prescribed level of convergence or solution tolerance is achieved.

1.
$i = 0$:
1. (a)
  Select initially $a^{(0)}(S) \in \text {Grid}(a)$ for all states in the grid, for instance $a^{(0)}(S^{(l)}) := 0 \in \text {Grid}(a)$ $\forall l \in \{1, \ldots , n_{S}\}$.
2. (b)
  Define initially ${\mathcal {V}}(S) = \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(0)}(S))\right) ^{-1} r(S)$ for all states in the grid according to Eq. (5.34).
3. (c)
  Select a convergence criterion $\epsilon > 0$.
2.
Iteration $i = 1, 2, \ldots $:
1. 1.
  Policy improvement: For all $S^{(l)} \in \text {Grid}(S)$, $l = 1, \ldots , n_{S}$, find a new policy rule $a^{(i)}(S^{(l)}) \in \text {Grid}(a)$, such that^{Footnote 11}
  $$\begin{aligned} \begin{aligned} a^{(i)}(S^{(l)}) := {{\,{\text{arg max}}\,}}_{a \in \text {Grid}(a)} \left\{ \sum _{j = 1}^{n_{Z}} {\mathcal {V}}^{(i-1)}(T_{B}(S^{(l)}, a, Z_{j}))\right\} \end{aligned} \end{aligned}$$
  (5.36)
  with ${\mathcal {V}}^{(i-1)}(T_{B}(S^{(l)}, a, Z_{j})) = {\mathcal {V}}^{(i-1)}(S^{(m)})$ given from the previous iteration step and according to applied interpolation rule if $S^{(m)}$ not already in the grid.
2. 2.
  Policy evaluation: Having determined $a^{(i)}(S^{(l)})$ for each $S^{(l)} \in \text {Grid}(S)$, $a^{(i)}(S) = (a^{(i)}(S^{(1)}), \ldots , a^{(i)}(S^{(n_{S})}))'$, we update the value function according to Eq. (5.34):
  $$\begin{aligned} {\mathcal {V}}^{(i)}(S) = \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i)}(S))\right) ^{-1} r(S). \end{aligned}$$
  (5.37)
3.
Check the convergence criterion: If $\max _{S \in \text {Grid}(S)}\left\{ \left| a^{(i)}(S) - a^{(i-1)}(S)\right| \right\} \le \epsilon $, then stop and set $n_{iter} := i$ and $a^{\star }(S) := a^{(n_{iter})}(S)$.^{Footnote 12} Otherwise, repeat Step 2. for iteration $i+1$.

When the algorithm stops after $n_{iter}$ iterations, i.e. when the convergence criterion after iteration step $n_{iter}$ is met, the stationary solution to the problem is defined as $a^{\star } = a^{\star }(S) = a^{(n_{iter})}(S)$ for all grid states.

Before we derive the optimal allocations in a case study next, we briefly summarize the benefits that are associated with this policy function iteration procedure:

One can determine and thereafter use the optimal strategy for all states independently of time; thus the iterative approach as a very elegant method enhances the speed and efficiency of the numerical optimization.
The optimal control is independent of the initial state. Therefore, the derived optimals can be used for different initial states. One only has to make sure that the considered initial state lies approximately in the center of the grid, such that a sufficient number of grid nodes still are above and below the starting state. Otherwise it could happen that one remains at the edge of the grid (due to the applied interpolation rule) which would lead to a suboptimal strategy.

In addition, we provide a comment on the speed of convergence of the algorithm. As already mentioned before, the total discount factor equals $e^{- (\lambda _{x} + \beta ) n \Delta }$ for n periods. In the case study in Sect. 6 we will use $\beta = 3 \%$ and $\lambda _{x} = 1.18 \%$. Then the convergence factor has an approximate size of $e^{- (\lambda _{x} + \beta ) \Delta } = 0.9591$ after one iteration (step size $\Delta = 1$ year), $e^{- (\lambda _{x} + \beta ) 10 \Delta } = 0.6584$ after ten iterations and $e^{- (\lambda _{x} + \beta ) 100 \Delta } = 0.0153$ after hundred iterations. This shows that a rather low number of iterations is necessary. In particular, [20] further argue that policy iteration commonly converges to its stationary solution after a small number of iterations.

Every iteration requires the calculation of $Q(a^{(i)}(S))$ plus the calculation of the inverse of $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i)}(S))$ which are both matrices of dimension $n_{S} \times n_{S}$; in total the algorithm requires the calculation of $n_{iter}$ times an $n_{S} \times n_{S}$ (plus an inverse). This is usually dramatically faster than calculating $n_{a}^{n_{S}}$ times an $n_{S} \times n_{S}$ matrix (plus an inverse) in the previous section on the stationary grid solution. Furthermore, one can use and exploit the property of $Q, Q_{1}, Q_{2}$ to be very sparse matrices; in Matlab the functions sparse(m,n) and speye(n) generate the required sparse matrices which saves memory. Additionally the command $x = A\backslash b$ is recommended for solving systems of linear equations of the form $A x = b$ efficiently.

5.5 Policy function iteration: theoretical foundation

The infinite-horizon problem (5.1) is a stationary, infinite-horizon Markovian dynammic programming (MDP) problem in line with the definition in [20]. We now theoretically justify our policy iteration approach for solving Problem (5.1) under Assumptions 1 and 2, where we used that the value function is a fixed point. It is necessary to prove the existence and optimality of a unique fixed point for our policy function iteration algorithm and monotone convergence to such a solution. First, we prove existence of a unique fixed point and optimality of the stationary solution. In what follows we denote S the state space, $s \in S$ a certain state. Further, X is the set of functions that map from S to ${\overline{\mathbb {R}}} := {\mathbb {R}}\cup \{\pm \infty \}$, i.e. $X = \{f : S \rightarrow {\overline{\mathbb {R}}}\}$, or a suitable subset of this set. In accordance with [21] and [22] the notion of a contraction mapping and a fixed point can be found in Appendix 1 with additional background details. [20] further comment that MDP problems are mathematically equivalent to computing the fixed point to the Bellman equation

$$\begin{aligned} {\mathcal {V}} = \Gamma {\mathcal {V}} \end{aligned}$$

(5.38)

with Bellman operator of interest $\Gamma $ (defined in line with Eqs. (5.2) and (5.35))

$$\begin{aligned} \left( \Gamma f\right) (s) := r(s) + {\overline{\beta }} \sup _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \end{aligned}$$

(5.39)

for some function f, where ${\overline{\beta }} := e^{- (\lambda _{x} + \beta ) \Delta } \in (0,1)$. We now follow the line of [19] and prove the existence of a unique fixed point of the Bellman operator $\Gamma $ and optimality of the stationary policy for the infinite-horizon problem, where a stationary policy is formally defined as a sequence $a^{\infty } := \{a,a,a, \ldots \}$ for some decision rule $a = \left( a(s)\right) _{s \in S} \in {\mathbb {A}}$ ( [10, 23]). Let

$$\begin{aligned} X := \left\{ f : S \rightarrow {\overline{\mathbb {R}}}:\ f \text { measurable},\ \left\| f\right\| _{\infty } < \infty \right\} . \end{aligned}$$

(5.40)

We show that the Bellman operator $\Gamma $ is a contraction mapping on X equipped with the sup norm $d := \left\| \cdot \right\| _{\infty }$, i.e. on the metric space (X, d).

Theorem 3

Let Assumptions 1and2be fulfilled. Then, $\Gamma $ is a contraction mapping on X with modulus ${\overline{\beta }} \in (0,1)$.

Proof

The proof can be found in Appendix 1. $\square $

We now come to the main result in [19] about existence of a unique fixed point of $\Gamma $ and optimality of the stationary policy. The respective theorem is Theorem 8 in Appendix 1, where we copied the relevant statements from [19] and omitted the unnecessary conditions. Using this result, we infer the following outcome:

Theorem 4

Let Assumptions 1and 2hold true. Further, let X be defined according to Eq. (5.40) and d be the uniform metric. Then the value function ${\mathcal {V}}$ is the unique fixed point of $\Gamma $ in X and the stationary policy $(a_{{\mathcal {V}}})^{\infty }$ is the optimal solution to the infinite-horizon discrete-time optimization problem.

Proof

The proof can be found in Appendix 1. $\square $

From Theorem 4 we infer that, under Assumptions 1 and 2, there exists a fixed point of $\Gamma $ in X and if one can find a fixed point of $\Gamma $ in X, this fixed point is unique and coincides with the value function ${\mathcal {V}}$. Moreover, the stationary policy $(a_{{\mathcal {V}}})^{\infty }$ is the optimal solution to the infinite-horizon discrete-time optimization problem.

In view of the previously presented policy function iteration algorithm, it remains to show that this algorithm indeed converges to a fixed point ${\mathcal {V}}^{(i)} \rightarrow {\mathcal {V}}$ with corresponding optimal stationary policy $a^{(i)} \rightarrow (a_{{\mathcal {V}}})^{\infty }$. [20] explain that the policy function iteration algorithm can be shown to generate a sequence with ${\mathcal {V}}^{(i+1)} \ge {\mathcal {V}}^{(i)}$ under fairly general conditions. In our setup where the state space S and the values for the risk driver Z come from finite sets or grids, we have the following general monotonicity result for the iterated value function:

Theorem 5

(Monotonicity of ${\mathcal {V}}^{(i)}$) The iteration in the policy function algorithm leads to a monotone increasing sequence $\left( {\mathcal {V}}^{(i)}\right) _{i = 0,1,2,\ldots }$:

$$\begin{aligned} {\mathcal {V}}^{(i+1)}(S) - {\mathcal {V}}^{(i)}(S) \ge {\mathbf {0}}. \end{aligned}$$

(5.41)

Proof

The proof can be found in Appendix 1. $\square $

In view of Theorem 5,, that tells that the iterated value function does not cycle, we conclude the following:

Theorem 6

(Convergence of the policy function iteration algorithm) Let us consider a finite state space $\text {Grid}(S)$ and a finite action space $\text {Grid}(a)$. Then the policy function iteration algorithm converges to the true fixed point for the contraction $\Gamma $, which is the optimal value function ${\mathcal {V}}$ of the problem, within a finite number of iteration steps.

The argument is clear due to monotonicity in Theorem 5 and the finite cardinality of the state and action space, see also [20]. Further readings on convergence results can also be found in [17, 18]. In summary, we have proven that, under Assumptions 1 and 2, our problem admits a unique fixed point solution and our presented policy function iteration algorithm converges to this solution.

6 Case study: policy function iteration for a cohort of clients

We focus on the cohort perspective and consider a cohort of clients with an initial age of $T = 65$ years (retirement entry time). This section aims for solving the discrete-time infinite-horizon optimization problem with the policy function iteration algorithm as well as analyzing the performance of the optimal asset allocation strategy via simulation. First, we introduce the general setting and the parameter choices: For the market we suppose $r = 1 \%$, $\mu = 2.97 \%$, $\sigma = 11.75 \%$ (market parameters: risk-free interest rate; drift and volatility of one risky asset which is interpreted as a buy and hold portfolio that consists of the three asset classes government bonds, corporate bonds and equity with initial weights $\frac{1}{N} = \frac{1}{3}$ each, where we used the numbers from the parameter estimation in [9]). Moreover, the HARA utility function parameters are assumed to be $\beta = 3 \%$ (cf. [24]), $b = -1$, $a = 1$ and^{Footnote 13}$F = 25.8$ which gives

$$\begin{aligned} U(t,p) = e^{- \beta (t-T)} {\tilde{U}}(p) = - e^{- 0.03 (t-T)} \frac{4}{p-25.8}. \end{aligned}$$

(6.1)

Further let $\lambda _{x} = \lambda _{x(j)} = 1.18 \%$ (mortality rate of the cohort), determined such that the survival probability of a 65-year old client to survive one more year coincides with the average survival probability of $99.202956 \%$ (female) and $98.457889 \%$ (male) in Germany, cf. [7], and ${\bar{p}} = 112.5 \%$ (CCR of the investment portfolio at initial time and at every re-set).

For the discretization grids we suppose the following: The time grid is divided into points with distance or step size $\Delta = 1$ (annual rebalancing and adjustments of the pension payments) which implies $t^{(i)} = T + i$, $i = 0, \ldots , \infty $, and thus $t^{(i)} \in \left\{ T, T+1, T+2, \ldots \right\} $. The grid for the risk driver Z follows from Eq. (5.19) with probability intervals of length $q = 2.5 \%$, i.e. $q^{(i)} = q^{(0)} + \Delta ^{(q)} \cdot i,\ i = 0, \ldots , N_{q}$, with $\Delta ^{(q)} = q = 2.5 \%$, $N_{q} = \frac{1 - q}{q} = 39$. The corresponding representatives for Z start with $z(q^{(0)}) = - 2.2414$ and end with $z(q^{(39)}) = 2.2414$. Finally, for the sake of simplicity let us consider steps of five percentage points for the decision interval ${\mathbb {A}}$, i.e. $a_{(0)} = 0 \%$, $a_{(N_{a})} = a_{(20)} = 100 \%$, $N_{a} = 20$ (equivalent to $\Delta ^{(a)} = 5 \%$) which translates to $a \in \left\{ 0 \%, 5 \%, 10 \%, \ldots , 90 \%, 95 \%, 100 \%\right\} $. Finally, let $V_{0} = 10,000$ (initial post-retirement wealth at time $t^{(0)} = T$) and let us define the state space grid by $V_{min} = 20 \% \times V_{0}$, $V_{max} = 500 \% \times V_{0}$, $n_{V} = 1,000$ for $\text {Grid}(V)$. We further select a step size of $1 \%$ for $\text {Grid}(CCR)$, i.e. $CCR \in \text {Grid}(CCR) = \{100 \%, 101 \%, \ldots , 124 \%, 125 \%\}$, and thus $n_{CCR} = 26$. This leads to a total grid size of $n_{S} = 26,000$ states. In particular, Assumptions 1 and 2 are fulfilled.

Let us consider three different values for the buffer parameter, namely $\alpha = (0 \% | 20 \% | 40 \%)$ (no | moderate | pronounced buffer). In what follows we demonstrate the presented policy function iteration algorithm, where we determine the optimal investment decision variables for every state under the infinite-horizon problem (stationary solution). Afterwards, a simulation analysis in the finite-horizon model, where the approximate optimal stationary solution to the infinite-horizon model is applied, provides the most relevant numbers and probabilities and compares the considered strategies for different $\alpha $ values.

6.1 Optimization

We seek for a fixed-point solution to the value function according to the policy function iteration algorithm in Sect. 5. We would like to comment that it only takes seven iterations maximal ($n_{iter} \le 7$) to find the fixed point for each $\alpha = (0 \% | 20 \% | 40 \%)$ and with that the stationary solution to the infinite-horizon optimization problem. Thus, the algorithm converges very quickly.

Figure 1 visualizes the average optimal risky relative asset allocations ${\hat{\pi }}^{\star \text {(inv)}} = a^{\star }$ (investment portfolio) and ${\hat{\pi }}^{\star \text {(total)}}$ (total cohort portfolio) for all $CCR_{j}^{\text {(total)}}$ values in the grid. Hence, for every $CCR \in \text {Grid}(CCR)$, we build the average over the $n_{V}$ values for $a^{\star }$ and ${\hat{\pi }}^{\star \text {(total)}}$ that have an equal CCR value. The pattern comes close to an S-shaped form: It can be seen that a higher buffer parameter $\alpha $, in particular for $\alpha = 40 \%$, leads to a lower relative risky investment for small $CCR_{j}^{\text {(total)}}$ values, but catches up for large $CCR_{j}^{\text {(total)}}$ values. This is a desired behavior, since it implies a lower risk of a pension shortening for small $CCR_{j}^{\text {(total)}}$ values within the range $[100 \%, 110 \%]$, without losing the upside potential of a pension enhancement for $CCR_{j}^{\text {(total)}}$ values close to $125 \%$. Furthermore, except for the region $CCR_{j}^{\text {(total)}} \in [100 \%, 105 \%]$, the average optimal risky relative investment increases with the $CCR_{j}^{\text {(total)}}$ value. This is meaningful since with a higher $CCR_{j}^{\text {(total)}}$ value, one is less exposed to the risk of falling outside the lower boundary of the $CCR_{j}^{\text {(total)}}$ corridor (pension reduction risk). The higher risky investment close to $100 \%$ is also reasonable. Imagine the $CCR_{j}^{\text {(total)}}$ is close to $100 \%$; if now the risky allocation is very small, even some positive return of the underlying asset class cannot compensate for the outflows (cohort-related pensions), which pushes the $CCR_{j}^{\text {(total)}}$ below $100 \%$ with a high probability.

6.2 Simulation study

We next carry out a simulation study with a finite time horizon of ${\tilde{T}} = T + 10 \Delta = T + 10$ years. We start with the initial states $S_{0} = (V_{0}, P_{0}) = (10,000, (277 | 270 | 258))$ for $\alpha = (0 \% | 20 \% | 40 \%)$, where $P_{0}$ comes from Eq. (3.16) (see also Eq. (4.24)). Intuitively, the higher the buffer parameter $\alpha $, the lower is the initial pension $P_{0}$. Moreover, the initial distribution to the investment and the buffer portfolio is $\frac{V_{j}^{\text {(buffer)}}(T)}{V_{0}} = (0 \% | 2.7 \% | 6.9 \%)$, $\frac{V_{j}^{\text {(inv)}}(T)}{V_{0}} = (100 \% | 97.3 \% | 93.1 \%)$ due to Eq. (4.16). The initial capital coverage ratio is by definition $CCR_{j}^{\text {(total)}}(T) {\mathop {=}\limits ^{(4.27)}} \frac{{\bar{p}} - \alpha }{1 - \alpha } = (112.5 \% | 115.6 \% | 120.8 \%)$ (at every re-set time $t_{n}$ as well). We simulate 10, 000 paths of the relevant processes where we use the optimal stationary solution as asset allocation that corresponds to the closest grid point.

We assume that the average mortality (explained in Sect. 4) for the cohort is realized. We look at the optimal relative pension evolution $\frac{P^{\star }(t)}{e^{- \lambda _{x(j)} (t-T)} P_{0}}$, where $P^{\star }(t)$ denotes the cohort pension at time t under the optimal stationary asset allocation strategy $a^{\star } = a^{\star }(S)$. We already explained earlier that $\frac{P^{\star }(t)}{e^{- \lambda _{x(j)} (t-T)} P_{0}} = \frac{P^{\star }(t + \Delta )}{e^{- \lambda _{x(j)} (t+\Delta -T)} P_{0}}$ indicates a stable individual pension for the customers in the cohort from time t to $t + \Delta $, i.e. if $\frac{P^{\star }(t)}{e^{- \lambda _{x(j)} (t-T)} P_{0}}$ is stable then the individual pensions keep stable. Consequently, due to the cohort view, $\frac{P^{\star }(t + \Delta )}{P^{\star }(t) e^{- \lambda _{x(j)} \Delta }} < 100 \%$ indicates an individual pension reduction, $\frac{P^{\star }(t + \Delta )}{P^{\star }(t) e^{- \lambda _{x(j)} \Delta }} = 100 \%$ a stable individual client’s pension development and $\frac{P^{\star }(t + \Delta )}{P^{\star }(t) e^{- \lambda _{x(j)} \Delta }} > 100 \%$ an enhancement of the individual pension of the client members from time t to $t + \Delta $.

In what follows we always look at the individual pension perspective in the cohort. Moreover, let $V^{\star }(t)$ denote the total cohort wealth at time t under $a^{\star } = a^{\star }(S)$. Analogously to the relative pension, we look at the optimal relative total wealth evolution $\frac{V^{\star }(t)}{V_{0}}$. Note that $\frac{P^{\star }(t)}{e^{- \lambda _{x(j)} (t-T)} P_{0}} = 1$ and $\frac{V^{\star }(t)}{V_{0}} = 1$ at initial time $t = T$.

Table 1 illustrates relevant probabilities of pension shortenings and enhancements. Table 2 provides risk and reward numbers for the relative pension and the total wealth. In general, we observe that a higher buffer parameter $\alpha $ significantly improves the probabilities in Table 1 from a client’s perspective. In particular, the probability that the average individual pension that is to be paid out over the entire period is larger than the initial pension level $P_{0}$ and the probability that there are more pension enhancements than reductions are quite high, especially for $\alpha = 40 \%$. However, both the (relative) risk in terms of volatility and Value-at-Risk and the (relative) reward in terms of expected value do not suffer, which is remarkable. Actually the opposite is the case: A higher buffer parameter $\alpha $ leads to a higher average of the relative pension level and a lower standard deviation (lower standard deviation of relative pension means a more stable pension development). Moreover, the worst case relative pensions in the tail (Value-at-Risk) also exceed the ones for smaller $\alpha $. The single exception is the volatility of the pension, where $\alpha = 20 \%$ shows a slightly smaller number than $\alpha = 40 \%$. Those benefits of the $\alpha > 0 \%$ portfolios comes at the cost of an initially lower pension level $P_{0} = P_{0}(\alpha )$, which represents a tradeoff between the initial pension level and future pension properties. The selection of the case-specific optimal $\alpha $ value, named $\alpha ^{\star }$, depends on the respective target or criterion. If for instance the probability of at least one pension shortening shall coincide with a pre-defined probability $p_{\text {red}}$, $\alpha ^{\star }$ can be selected such that the corresponding probability comes closest to $p_{\text {red}}$. Alternatively, $\alpha ^{\star }$ could be selected such that the expectation of the sum of pension cash flows gets maximized.

In summary in terms of the relative individual cohort pension, one can see that $\alpha = 40 \%$ outperforms the $\alpha = (0 \% | 20 \%)$ strategies, and the $\alpha = 20 \%$ outperforms the $\alpha = 0 \%$ strategy. The higher the buffer parameter $\alpha $, the more the downside risk is limited, and even the upside potential is enhanced.

We draw the conclusion that our proposed model, where we divide our total wealth into an investment and a buffer portfolio, leads to a sophisticated optimal dynamic asset allocation policy that is performance seeking while reducing downside risks and improving probabilities; hence provides remarkable and meaningful benefits to clients.

Table 1 Probabilities of pension rate changes for $\alpha = (0 \% | 20 \% | 40 \%)$

Full size table

Table 2 Relative performance numbers for $\alpha = (0 \% | 20 \% | 40 \%)$ under 10, 000 simulations

Full size table

Finally, we simulate the optimal strategy $a^{\star }$, the pension $P^{\star }$ and the wealth $V^{\star }$ evolution under three different scenarios: a bullish, a bearish and a non-directional market. In each simulation we need to generate the risk driver Z for every period. Figure 2 provides the corresponding underlying risky asset class price processes, denoted by $V_{Z}(t)$, that correspond to the development of Z. Next, Fig. 3 illustrates the evolution of the relative pension, Fig. 4 visualizes the very same but for the total wealth. From Fig. 3 we infer that

1.
the individual pensions increase more often for higher $\alpha $ and even end up with a higher terminal pension (relative to $P_{0}$) in a bullish market,
2.
the individual pensions decrease only once for $\alpha = 40 \%$ but twice for the remaining ($\alpha = (0 \% | 20 \%)$) in a bearish market,
3.
and the individual pensions do not decline for $\alpha = 40 \%$ but do decrease and behave very unstable and volatile for the remaining ($\alpha = (0 \% | 20 \%)$) in a non-directional market.

In total, the number of pension reductions for $\alpha > 0 \%$ (with buffer) never exceeds the respective number for $\alpha = 0 \%$ (no buffer) in the considered representative scenarios.

Figure 5 complements the former figures on the pension and wealth evolution with a visualization of the $CCR_{j}^{\text {(total)}}(t)$ development. While the $CCR_{j}^{\text {(total)}}(t)$ values for $\alpha > 0 \%$ (with buffer) generally do not fall short the respective values for $\alpha = 0 \%$ (no buffer), the $\alpha > 0 \%$ portfolios need less pension shortenings to keep the $CCR_{j}^{\text {(total)}}(t)$ inside its target corridor. Therefore, with selecting a higher $\alpha \%$ value, one can improve the management of the wealth such that the $CCR_{j}^{\text {(total)}}(t)$ remains more stable in its corridor without reducing the pension.

In addition, Figs. 6 and 7 show the optimal asset allocation policies $a^{\star }(t)$ for the investment wealth and ${\hat{\pi }}^{\star \text {(total)}}(t)$ for the total wealth. One can observe that the optimal strategy for $\alpha = 40 \%$ frequently behaves opposed to the optimal strategy for $\alpha = 0 \%$. Moreover, Fig. 8 illustrates the kernel density estimates for the path-wise average pensions and wealths. Note that for one path, a higher path-wise average pension automatically implies a higher total sum of pension cash flows received by the customer. The figure points out that although the distributions of the wealths are rather close among all considered $\alpha $ values (see also expected values and volatilities in Table 2), the distributions of the relative pensions differ. The pension distribution for $\alpha = 40 \%$ has lower probability on the left end and is more shifted to the right; this is also reflected in Table 2. Thus, a pension fund client that follows the $\alpha = 40 \%$ strategy benefits in terms of the pension distribution since lower pensions compared to the initial pension level $P_{0}$ are on average less likely. However, as already explained, these benefits come at the cost of an initially lower pension level $P_{0}$. We would like to comment that the averages over all simulated $a^{\star }(t)$ and ${\hat{\pi }}^{\star \text {(total)}}(t)$ values are very close to each other among the three considered buffer parameters $\alpha $. However, as analyzed above, the relative performance and characteristics of the optimal portfolios with a buffer ($\alpha > 0 \%$) are superior over the optimal portfolio without a buffer ($\alpha = 0 \%$). This shows that the dynamics and the structure of the asset allocation plays a crucial role.

Closing this numerical case study, we provide a brief discussion about the fund behavior in good and bad times of the financial market: In good times, as the buffer represents a certain percentage of the difference between the assets and the liabilities, the buffer increases if the fund develops nicely. In this way, the development of the Geometric Brownian Motion is damped compared to the system without a buffer and the potential for pension’s increase is reduced. However, we observe in Fig. 4a that a pension increase happens more often for $\alpha > 0\%$ than for $\alpha = 0\%$, even though the growth in the pension rates is smaller. This behavior is reasonable as a higher buffer (higher $\alpha $) goes hand in hand with a higher funding at the beginning and at every reset time (which comes at the cost of a smaller initial pension rate), compared to the system without a buffer ($\alpha = 0\%$). Moreover, when $\alpha > 0\%$, then the fund wealth increase is dampened but the buffer portfolio gets increased compared to the $\alpha = 0\%$ case, which can be regarded as a profit lock-in feature. If ${\bar{p}}$ would increase, then a pension increase becomes more likely for the $\alpha = 0\%$ case as well, but as the pension increases happen more frequently and the funding after a reset of the system becomes higher, the difference between the pension rates before and after adjustments will be rather small. Generally, Eq. (3.19) visualizes that if ${\bar{p}}$ approaches the maximum value of $125\%$, the possible $\alpha $ values approach $0\%$. From a risk management perspective, when ${\bar{p}}$ is already very high (and thus the probability of pension reductions rather small), then the additional buffering through $\alpha $ might not be that beneficial anymore.

Let us assume the fund is having bad times, but $CCR_{j}^{\text {(total)}}(t)$ is close to but still above $100 \%$. In this case, the buffer account is almost empty. And therefore, in such a scenario where the fund recently went down, one keeps almost everything of the wealth in the fund. But still the fund remains well-funded and there is no need for a pension reduction. Therefore, the implication on the fund in such bad times is actually not that bad. If the fund decreases further and $CCR_{j}^{\text {(total)}}(t)$ falls below $100\%$, then the system gets adjusted and the buffer is refilled. A nice feature of the buffer approaching zero if $CCR_{j}^{\text {(total)}}(t)$ approaches $100\%$ is the interpretation of a market re-entry component. As traditional strategic asset allocations realize big losses in V-markets due to falling short in timing market re-entries after market declines, the fund in the paper still holds risky assets in bad times and hence stays invested and can participate if the market recovers. The fund’s wealth is not shifted to the buffer account as long as the fund stays well-funded ($CCR_{j}^{\text {(total)}}(t) \ge 100\%$). Hence, the definition of the buffer portfolio allows to stay invested during preceding bad times.

7 Conclusion

In this article a possible post-retirement phase implementation of an innovative pension plan without guarantees, currently under discussion in Germany, was studied. We transferred the product rules into a mathematical model and solved the resulting portfolio selection problem via the discrete-time Bellman equation, to the best of our knowledge for the first time. We draw the following conclusions:

Section 3 modeled the complex mechanism of the product in the decumulation phase with ingredients buffer balance and pension adjustments. In particular we proposed a special buffer rule. The resulting optimization problem with finite-horizon was derived and solved via Bellman’s equation in Sect. 4. Moreover, we provided a possible aggregation from a single-client to a cohort perspective. The stationary asset allocation solution approach to the infinite-horizon problem in Sect. 5 further provided an elegant approximate solution to the problem. Therein, we introduced an efficient policy function iteration algorithm that converges to the unique stationary solution. A case study in Sect. 6 showed several meaningful benefits to customers. The following conclusions apply in the scope of the tested buffer parameters: First, a more pronounced buffer parameter can significantly improve the probabilities of interest (pension shortenings and pension level evolution). Furthermore, the higher the buffer parameter, the more the downside risk was limited while even the upside potential was enhanced, both relative to the initial pension. Of course, these benefits came at the cost of an initially smaller pension payment. In summary, there was a tradeoff between relative outperformance (more pronounced buffer) and initial pension level (less pronounced buffer).

Overall, we detected that our proposed model leads to a sophisticated optimal dynamic asset allocation policy that is performance seeking while reducing downside risks and improving the probabilities of interest; hence provides remarkable and meaningful benefits to clients. Possible future research studies on such pension products could generally elaborate on alternative buffer processes or consider a more advanced mortality model with a mortality rate that is exposed to (unexpected) shocks such as the paper by [8] which studies a mortality model with mortality improvement ratio in the framework of pricing variable annuities with guaranteed minimum repayments. Furthermore, future research could consider alternative models compared to our approach, for a discussion we refer to [6], and could therefore for instance deal with the design, the modeling and the optimal management of a pension fund plan that belongs to an entire collective of investors, where the wealth is managed identically for all clients instead of a cohort-specific treatment.

Notes

The considered pension scheme is named “Nahles–Rente” or “Sozialpartnermodell” in Germany, and is also known under “reine Beitragszusage”. It is regulated by “Bundesanstalt für Finanzdienstleistungsaufsicht (BaFin)” and its political starting point is the so-called “Betriebsrentenstärkungsgesetz (BRSG)” which came into force on January 1st, 2018. Some more details, information and current status can be found in [1] and [15].
One could also think about a deposit account with interest rate $r \ge 0$. A zero interest rate is assumed here as it strongly simplifies later calculations. In addition, as the buffer account may need to be adjusted on a more frequent basis than the capital allocation to the risk-less asset, higher internal or external account (management) costs could justify a lower interest rate.
Values of ${\bar{p}}$ very close to $100 \%$ or $125 \%$ are not suitable for practical purposes because it would require far too many adjustments over time.
Mathematically spoken, let $t \ge T$ and
$$\begin{aligned} n_{-}(t) := \sup \left\{ n \in {\mathbb {N}}_{0}: t_{n} \le t\right\} , \\ n_{+}(t) := \inf \left\{ n \in {\mathbb {N}}_{0}: t_{n} > t\right\} , \end{aligned}$$
then we have
$$\begin{aligned} P_{ij}(t) \equiv P_{ij}(t_{n_{-}(t)}),\ \forall t \in [t_{n_{-}(t)}, t_{n_{+}(t)}). \end{aligned}$$
(3.17)
Note: The rate $\lambda _{x(j)}$ decreases the average total sum of pensions $P_{(k+1)}$ that is to be paid in the next period because some fraction of the clients in the cohort died during the previous period. Alternatively, one could use this rate to increase the individual pensions $P_{ij}(t^{(k+1)})$ of the clients that survived step-by-step while keeping $P_{(k+1)}$ of the cohort constant.
The definition corresponds to the transition for a single client. If one aims for determining the optimal investment decisions for a cohort of customers, then we define
$$\begin{aligned} T_{B}^{(P)}(S, a, Z) = {\left\{ \begin{array}{ll} e^{- \lambda _{x(j)} \Delta } P,&{}\quad \text { if } \frac{T_{B}^{(V)}(S, a, Z)}{e^{- \lambda _{x(j)} \Delta } E(S)} \in [100 \%, 125 \%] \\ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x(j)}) T_{B}^{(V)}(S, a, Z),&{}\quad \text { otherwise} \end{array}\right. } \end{aligned}$$
(5.7)
with
$$\begin{aligned} E(S) = \frac{P}{r + \lambda _{x(j)}}. \end{aligned}$$
(5.8)
To get a rough idea about the size of the relative error, let the pension payments stay constant over time. Then, the relative error simplifies to $error_{{\tilde{T}}, \infty }^{rel} = \frac{\int _{{\tilde{T}}}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} du}{\int _{T}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} du} = e^{- (\lambda _{x} + \beta ) ({\tilde{T}} - T)}$. If T, ${\tilde{T}}$ and $\lambda _{x}$ are given, then one can write and interpret the relative error as a function of $\beta $, i.e. $error_{{\tilde{T}}, \infty }^{rel} = error_{{\tilde{T}}, \infty }^{rel}(\beta )$. If exemplarily $T = 65$, ${\tilde{T}} = 120$ and $\lambda _{x} = 1.18 \%$ (compare the later case study in Sect. 6), then $error_{{\tilde{T}}, \infty }^{rel} = (10.0 \% | 3.3 \% | 1.1 \% )$ for the personal discount factors $\beta = (3 \% | 5 \% | 7 \% )$ which shows reasonable numbers.
In the case of a positive coefficient of risk aversion $0< b < 1$, the lower bound $P_{min}$ can be neglected ($P_{min} := F$). In the case of a negative coefficient of risk aversion $b < 0$, the upper bound $P_{max}$ can be neglected ($P_{max} := \infty $).
For the cohort model, the transition function for the pension needs to be modified to
$$\begin{aligned} T_{B}^{(P)}(S, a, Z) = {\left\{ \begin{array}{ll} \max \left\{ e^{- \lambda _{x(j)} \Delta } P, P_{min}\right\} &{} , \text { if } \frac{T_{B}^{(V)}(S, a, Z)}{e^{- \lambda _{x(j)} \Delta } E(S)} \in [100 \%, 125 \%] \\ \max \left\{ \min \left\{ \frac{1 - \alpha }{{\bar{p}} - \alpha } (r + \lambda _{x(j)}) T_{B}^{(V)}(S, a, Z), P_{max}\right\} , P_{min}\right\} &{} , \text { otherwise}. \end{array}\right. } \end{aligned}$$
(5.14)
.
This coincides with the applied setup in Sect. 6.
If $a^{(i)}(S^{(l)})$ is not unique, then we select the smallest value among all maximizers and thereby follow the most defensive strategy.
As long as $\epsilon < \Delta ^{(a)}$, it holds $a^{(n_{iter})}(S) = a^{(n_{iter} - 1)}(S)$.
It has to hold $P > F$ for all applied pensions P. We select $F = 10 \% \times P_{(0)}$ which will result in $F = 25.8$.

References

aba and IVS (2017) Die reine Beitragszusage gemäß dem Betriebsrentenstärkungsgesetz. Tech. rep., aba Arbeitsgemeinschaft für betriebliche Altersversorgung e. V. and IVS—Institut der Versicherungsmathematischen Sachverständigen für Alterversorgung e. V., Berlin
Bellman R (1952) On the theory of dynamic programming. Proc Natl Acad Sci 38:716–719
Article MathSciNet MATH Google Scholar
Bellman R (1955) Functional equations in the theory of dynamic programming. V. Positivity and quasi-linearity. Proc Natl Acad Sci USA 41(10), 743–746
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
MATH Google Scholar
Bellman R (1958) Dynamic programming and stochastic control processes. Inf Control 1:228–239
Article MathSciNet MATH Google Scholar
Boado-Penas MC, Eisenberg J, Helmert A, Krühner P (2020) A new approach for satisfactory pensions with no guarantees. Eur Actuar J 10(1):3–21
Article MathSciNet MATH Google Scholar
Bundesamt Statistisches (2019) Sterbetafeln 2016/2018: Ergebnisse aus der laufenden Berechnung von Periodensterbetafeln für Deutschland und die Bundesländer. Tech. rep. Statistisches Bundesamt (Destatis), Wiesbaden
Google Scholar
Escobar M, Krayzler M, Ramsauer F, Saunders D, Zagst R (2016) Incorporation of stochastic policyholder behavior in analytical pricing of GMABs and GMDBs. Risks 4(4):1–36
Article Google Scholar
Escobar M, Kriebel P, Wahl M, Zagst R (2019) Portfolio optimization under Solvency II. Ann Oper Res 281(1–2):193–227
Article MathSciNet MATH Google Scholar
Hinderer K, Rieder U, Stieglitz M (2016) Dynamic optimization: deterministic and stochastic models. Springer International Publishing AG. Universitext
Horn RA, Johnson CR (2013) Matrix analysis, 2nd edn. Cambridge University Press, New York
Howard RA (1960) Dynamic programming and Markov processes. MIT Press, Cambridge
MATH Google Scholar
Karatzas I, Shreve SE (1998) Methods of mathematical finance. Springer, New York
Book MATH Google Scholar
Nisio M (2015) Stochastic control theory: dynamic programming principle. In: Probability theory and stochastic modelling, vol. 72, 2nd edn. Springer Japan, Tokyo. The first edition was published in the series ISI Lecture Notes, No 9, by MacMillan India Limited publishers, Delhi, 1981
Pohl D (2019) Erstes Sozialpartnermodell in Sicht. Portfolio Institutionell 12:30–31
Puterman ML (1977) Optimal control of diffusion processes with reflection. J Optim Theory Appl 22(1):103–116
Article MathSciNet MATH Google Scholar
Puterman ML (1981) On the convergence of policy iteration for controlled diffusions. J Optim Theory Appl 33(1):137–144
Article MathSciNet MATH Google Scholar
Puterman ML, Brumelle SL (1979) On the convergence of policy iteration in stationary dynamic programming. Math Oper Res 4(1):60–69
Article MathSciNet MATH Google Scholar
Rieder U (1988) Bayessche Kontrollmodelle . Universität Ulm, Lecture notes WS 1987/88
Santos MS, Rust J (2004) Convergence properties of policy iteration. SIAM J Control Optim 42(6):2094–2115
Article MathSciNet MATH Google Scholar
Searcóid MO (2007) Metric Spaces. Springer Undergraduate Mathematics Series. Springer, London
Google Scholar
Stokey NL, Lucas Jr RE (1999) Recursive methods in economic dynamics, 5th printing edn. Harvard University Press, Cambridge. With Edward C. Prescott
Wakuta K (1992) Optimal stationary policies in the vector-valued Markov decision process. Stoch Process Appl 42:149–156
Article MathSciNet MATH Google Scholar
Ye J (2008) Optimal life insurance, consumption and portfolio: a dynamic programming approach. 2008 American control conference, pp 356–362
Young NJ (1981) The rate of convergence of a matrix power series. Linear Algebra Appl 35:261–278
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors acknowledge the financial support of the ERGO Center of Excellence in Insurance at Technical University of Munich promoted by ERGO Group. This article originated from a research project with ERGO Group and the authors thank the concerned colleagues at ERGO Group for the beneficial discussions and insights from a practical viewpoint.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Mathematics, Technical University of Munich, Munich, Germany
Andreas Lichtenstern & Rudi Zagst

Authors

Andreas Lichtenstern
View author publications
You can also search for this author in PubMed Google Scholar
Rudi Zagst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rudi Zagst.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Policy function iteration: theoretical foundation

In this appendix section we provide further background related to and required for the theoretical foundation of the proposed policy function iteration algorithm. First, a contraction mapping and a fixed point are defined as follows.

Definition 1

(Contraction mapping ([21, Definition 9.9.1, p. 160], [22, p. 50])) Suppose (X, d) is a metric space. A map $\Gamma : X \rightarrow X$ is called a (strong) contraction mapping with modulus $\beta $ if, and only if, there exists $\beta \in [0,1)$ such that $d(\Gamma (f), \Gamma (g)) \le \beta \cdot d(f,g)$ for all $f, g \in X$.

Definition 2

(Fixed point [21, Definition 10.10.1, p. 180]) Suppose X is a non-empty set and $\Gamma : X \rightarrow X$. A point $f \in X$ is called a fixed point for $\Gamma $ if and only if $\Gamma (f) = f$.

[21] argues that “Strong contractions on a metric space, when iterated, tend to pull all the points of the space together into a single point”. The underlying theory is the Contraction Mapping Theorem (or Banach’s Fixed-Point Theorem).

Theorem 7

(Contraction Mapping Theorem ([21, Theorem 10.10.3, p. 181], [22, Theorem 3.2, p. 50])) Suppose (X, d) is a non-empty complete metric space and $\Gamma : X \rightarrow X$ is a (strong) contraction mapping with modulus $\beta \in (0,1)$. Then:

1.
$\Gamma $ has a unique fixed point $f \in X$; and
2.
for any $f_{0} \in X$, the sequence $(\Gamma ^{n}(f_{0}))$ converges to f with
$$\begin{aligned} d(\Gamma ^{n}(f_{0}), f) \le \beta ^{n} \cdot d(f_{0}, f),\ n = 0,1,2,\ldots \end{aligned}$$
(A.1)

Theorem 7 particularly ensures existence of a unique fixed point for a (strong) contraction mapping.

Let ${\mathcal {V}}$ and $\Gamma $ be defined according to Eqs. (5.38) and (5.39). As ${\mathcal {V}}(s)$, respectively the reward r(s) or the utility ${\tilde{U}}(p)$, is not necessarily bounded in general, we need the notion of an upper barrier function, adjusted to our framework.

Definition 3

(Upper barrier function [19, Chapter 1]) A measurable function $b_{u} : S \rightarrow {\mathbb {R}}_{+}$ is called upper barrier function if there exist constants $c_{1}, c_{2} \ge 0$ such that

1.
$r(s) \le c_{1} b_{u}(s)$ for all $s \in S$.[1.]
2.
${\mathbb {E}}\left[ b_{u}(T_{B}(s,a,Z)) | s\right] = \int Q(s,a;dz) b_{u}(T_{B}(s,a,z)) \le c_{2} b_{u}(s)$ for all $s \in S$ and $a \in {\mathbb {A}}$.

Q denotes the transition probability measure with $Q(s,a;\cdot )$ being a probability measure for all $(s,a) \in S \times {\mathbb {A}}$.

Furthermore, we later need the notion of a maximisator which we define next.

Definition 4

(Maximisator [19, Chapter 1]) Let $f \in X$. The policy $a_{f} = a_{f}(s)$ is called a maximisator for f if it maximizes

$$\begin{aligned} \left( \Gamma f\right) (s) {\mathop {=}\limits ^{(5.39)}} r(s) + {\overline{\beta }} \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \end{aligned}$$

(A.2)

for all $s \in S$, i.e. if

$$\begin{aligned} a_{f}(s) = {{\,{\text{arg max}}\,}}_{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} . \end{aligned}$$

(A.3)

We already explained earlier that, due to Assumption 2, the maximum of any arbitrary function over the finite set ${\mathbb {A}}$ is attained. For this reason in particular for the function $g : {\mathbb {A}} \rightarrow {\mathbb {R}}$, $a \mapsto g(a) = {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] $ for any given $s \in S$, and thus that the maximisator $a_{f}$ for every $f \in X$ exists. Moreover, from Assumption 1 we already inferred that ${\tilde{U}}(p)$ is bounded, i.e. $\exists 0< K_{{\tilde{U}}} < \infty $ with $\left| {\tilde{U}}(p)\right| < K_{{\tilde{U}}}$. This immediately implies that also r(s) is bounded with $0< K_{r} := \frac{1}{\lambda _{x} + \beta } \left( 1 - e^{- (\lambda _{x} + \beta ) \Delta }\right) K_{{\tilde{U}}} < \infty $:

$$\begin{aligned} \left| r(s)\right| {\mathop {=}\limits ^{(5.12)}} {}&\frac{1}{\lambda _{x} + \beta } \left( 1 - e^{- (\lambda _{x} + \beta ) \Delta }\right) \underbrace{\left| {\tilde{U}}(p)\right| }_{< K_{{\tilde{U}}}}< \frac{1}{\lambda _{x} + \beta } \left( 1 - e^{- (\lambda _{x} + \beta ) \Delta }\right) K_{{\tilde{U}}} < K_{r}. \end{aligned}$$

(A.4)

In view of Definition 3, it is thus clear that $b_{u} \equiv 1$ is an upper barrier function. In line with [19] we further define the set

$$\begin{aligned}&{\mathbb {B}}_{b_{u}} := \left\{ f : S \rightarrow {\overline{\mathbb {R}}}:\ f \text { measurable},\ f(s) \le c b_{u}(s) \text { for all } s \in S,\ \text {for one } c \in {\mathbb {R}}_{+}\right\} \\&\quad \subset \{f : S \rightarrow {\overline{\mathbb {R}}}\} \end{aligned}$$

(A.5)

and the weighted sup norm

$$\begin{aligned} \left\| f\right\| _{b_{u}} := \sup _{s \in S}\frac{|f(s)|}{b_{u}(s)} \end{aligned}$$

(A.6)

which turns in our setting ($b_{u} \equiv 1$) to the usual sup norm

$$\begin{aligned} \left\| f\right\| _{b_{u}} = \sup _{s \in S}|f(s)| = \left\| f\right\| _{\infty }. \end{aligned}$$

(A.7)

Then ${\mathbb {B}}_{b_{u}}$ becomes

$$\begin{aligned} {\mathbb {B}} := {\mathbb {B}}_{1} = \left\{ f : S \rightarrow {\overline{\mathbb {R}}}:\ f \text { measurable},\ \left\| f^{+}\right\| _{\infty } < \infty \right\} , \end{aligned}$$

(A.8)

where $f^{+}$ denotes the positive part of f. Hence ${\mathbb {B}}$ denotes the measurable functions mapping from S to ${\overline{{\mathbb {R}}}}$ that have an upper bound. In addition, the boundedness of ${\tilde{U}}(p)$ not only implies boundedness of r(s) but also of ${\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}})$ and ${\mathcal {V}}(s)$ in Problem (5.1), since

$$\begin{aligned} \begin{aligned} \left| {\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}}) \right|&= {} \left| {\mathbb {E}}\left[ \sum _{i = 0}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} {\tilde{U}}(P(t^{(i)})) du\right] \right| \\&\le {} {\mathbb {E}}\left[ \sum _{i = 0}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} \underbrace{\left| {\tilde{U}}(P(t^{(i)})) \right| }_{< K_{{\tilde{U}}}} du\right] \\&< {} K_{{\tilde{U}}} {\mathbb {E}}\left[ \sum _{i = 0}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} du\right] \\&= K_{{\tilde{U}}} \sum _{i = 0}^{\infty } \int _{t^{(i)}}^{t^{(i+1)}} e^{- (\lambda _{x} + \beta ) (u - T)} du \\&= K_{{\tilde{U}}} \int _{T}^{\infty } e^{- (\lambda _{x} + \beta ) (u - T)} du = \frac{K_{{\tilde{U}}}}{\lambda _{x} + \beta } =: K_{{\mathcal {J}}} \\&\Rightarrow \exists 0< K_{{\mathcal {J}}}< \infty : {} \left| {\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}}) \right| < K_{{\mathcal {J}}} \end{aligned} \end{aligned}$$

(A.9)

which leads to

$$\begin{aligned} \begin{aligned} \left| {\mathcal {V}}(s) \right|&= \left| \sup _{a(s) \in {\mathbb {A}}} {\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}}) \right| {\mathop {=}\limits ^{{\mathbb {A}} \text { finite}}} \left| \max _{a(s) \in {\mathbb {A}}} {\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}}) \right| \\&\le \max _{a(s) \in {\mathbb {A}}} \underbrace{\left| {\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}}) \right| }_{< K_{{\mathcal {J}}}}< \max _{a(s) \in {\mathbb {A}}} K_{{\mathcal {J}}} = K_{{\mathcal {J}}} =: K_{{\mathcal {V}}} \\&\Rightarrow \exists 0< K_{{\mathcal {V}}}< \infty : {} \left| {\mathcal {V}}(s) \right| < K_{{\mathcal {V}}}. \end{aligned} \end{aligned}$$

(A.10)

In line with these observations, we define the set $X \subset {\mathbb {B}}$ to contain bounded functions only:

$$\begin{aligned} X := \left\{ f : S \rightarrow {\overline{\mathbb {R}}}:\ f \text { measurable},\ \left\| f\right\| _{\infty } < \infty \right\} . \end{aligned}$$

(A.11)

From the above calculations it clearly follows ${\mathcal {V}}(s) \in X$ as well as ${\mathcal {J}}(a(s);s, c_{ij}^{\text {(buffer)}}) \in X$ for all $a(s) \in {\mathbb {A}}$.

We now come to the main result in [19] about existence of a unique fixed point of $\Gamma $ and optimality of the stationary policy. We copy the theorem and omit the unnecessary conditions (due to $b_{u} \equiv 1$).

Theorem 8

[19, Satz 1.5] Let (X, d) be a non-empty complete metric space with $X \subset {\mathbb {B}}$ and let the following hold:

1.
For all $f \in X$ there exists a maximisator $a_{f}$ of f.
2.
$\Gamma $ is a contraction on X.
3.
$0 \in X$.

Then the following claims hold true:

(a)
${\mathcal {V}} \in X$, $\Gamma {\mathcal {V}} = {\mathcal {V}}$ and ${\mathcal {V}}$ is the unique fixed point of $\Gamma $ in X.
(b)
The stationary policy $(a_{{\mathcal {V}}})^{\infty }$ is the optimal solution to the infinite-horizon discrete-time optimization problem.

Appendix B: Some useful matrix properties

We define the notion of a (strictly) diagonally dominant matrix.

Definition 5

(Diagonally dominant matrix [11, Definition 6.1.9]) A matrix $A = \left( a_{ij}\right) _{i,j = 1,\ldots ,n} \in {\mathbb {R}}^{n \times n}$, $n \in {\mathbb {N}}$, is said to be diagonally dominant if

$$\begin{aligned} \left| a_{ii} \right| \ge \sum _{j = 1, j \ne i}^{n} \left| a_{ij} \right| ,\quad \forall i \in \{1, \ldots , n\}. \end{aligned}$$

(B.1)

If the inequality is strict for all $i \in \{1, \ldots , n\}$, the matrix A is called strictly diagonally dominant.

It can be shown that every strictly diagonally dominant matrix is invertible.

Theorem 9

[11, Theorem 6.1.10, part (a)] Let $A = \left( a_{ij}\right) _{i,j = 1,\ldots ,n} \in {\mathbb {R}}^{n \times n}$ be strictly diagonally dominant. Then A is non-singular.

Furthermore, we provide a specific eigenvalue result for stochastic matrices which are also known as probability or transition matrices.

Definition 6

(Stochastic matrix [11, p. 547]) A matrix $A = \left( a_{ij}\right) _{i,j = 1,\ldots ,n} \in {\mathbb {R}}^{n \times n}$, $n \in {\mathbb {N}}$, $a_{ij} \ge 0$, is a (row) stochastic matrix if $A {\mathbf {1}} = {\mathbf {1}}$, i.e. if all row sums of A are equal to one.

Definition 7

(Eigenvalue and eigenvector [11, Definition 1.1.2]) Let $A \in {\mathbb {R}}^{n \times n}$, $n \in {\mathbb {N}}$. If a scalar $\lambda \in {\mathbb {R}}$ and a vector ${\mathbf {v}} \in {\mathbb {R}}^{n}$, ${\mathbf {v}} \not \equiv {\mathbf {0}}$, satisfy the equation

$$\begin{aligned} A {\mathbf {v}} = \lambda {\mathbf {v}}, \end{aligned}$$

(B.2)

then $\lambda =: \lambda (A)$ is called an eigenvalue of A and ${\mathbf {v}}$ is called an eigenvector of A associated with $\lambda $.

We have the following result for the eigenvalues of a stochastic matrix.

Theorem 10

The maximal absolute eigenvalue of a stochastic matrix A is equal to one, i.e. $\max |\lambda (A)| = 1$.

Proof

We prove that any stochastic matrix A has the eigenvalue $\lambda (A) = 1$ and that the absolute value of any eigenvalue $\lambda (A)$ of A is less than or equal to one.

1.
Existence of eigenvalue $\lambda (A) = 1$: The vector ${\mathbf {1}}$ that consists of ones is an eigenvector to the eigenvalue $\lambda (A) = 1$ for any stochastic matrix $A = \left( a_{ij}\right) _{i,j = 1,\ldots ,n}$ because the rows of A sum up to one:
$$\begin{aligned} A {\mathbf {1}} = \left( \begin{matrix} a_{ij} &{}\quad a_{12} &{}\quad \cdots &{}\quad a_{1n} \\ \vdots &{} \quad \vdots &{} \quad \ddots &{}\quad \vdots \\ a_{n1} &{}\quad a_{n2} &{}\quad \cdots &{}\quad a_{nn} \\ \end{matrix}\right) \left( \begin{matrix} 1 \\ \vdots \\ 1 \\ \end{matrix}\right) = \left( \begin{matrix} a_{ij} + a_{12} + \cdots + a_{1n} \\ \vdots \\ a_{n1} + a_{n2} + \cdots + a_{nn} \\ \end{matrix}\right) = \left( \begin{matrix} 1 \\ \vdots \\ 1 \\ \end{matrix}\right) = 1 \cdot {\mathbf {1}}. \end{aligned}$$
(B.3)
2.
Eigenvalue bound $|\lambda (A)| \le 1$: Let $\lambda (A)$ be an eigenvalue of the stochastic matrix A and let ${\mathbf {v}} = \left( \begin{matrix} v_{1} \\ \vdots \\ v_{n} \\ \end{matrix}\right) \ne {\mathbf {0}}$ be the corresponding eigenvector, i.e.
$$\begin{aligned} A {\mathbf {v}} = \lambda (A) {\mathbf {v}} \end{aligned}$$
(B.4)
When we compare the i-th row of both sides of the equality, we obtain
$$\begin{aligned} \sum _{j = 1}^{n} a_{ij} v_{j} = \lambda (A) v_{i},\ i = 1, \ldots , n. \end{aligned}$$
(B.5)
Further let
$$\begin{aligned} m := {{\,{\text{arg max}}\,}}_{j \in \{1, \ldots , n\}}\{|v_{j}|\} \end{aligned}$$
(B.6)
and thus $v_{m}$ denotes the entry of the eigenvector ${\mathbf {v}}$ with the maximal absolute value: $|v_{m}| \ge |v_{j}|$ $\forall j \in \{1, \ldots , n\}$. Due to ${\mathbf {v}} \not \equiv {\mathbf {0}}$ it is $|v_{m}| > 0$. Inserting $i = m$ in Eq. (B.5) while considering the absolute value leads to
$$\begin{aligned} |\lambda (A)| \cdot |v_{m}| = |\lambda (A) v_{m}|&{\mathop {=}\limits ^{(B.5): \ i = m}}\left| \sum _{j = 1}^{n} a_{mj} v_{j}\right| {\mathop {\le }\limits ^{\text {triangle inequality}}} \sum _{j = 1}^{n} \left| a_{mj} v_{j}\right| {\mathop {=}\limits ^{a_{mj} \ge 0}} \sum _{j = 1}^{n} a_{mj} \left| v_{j} \right| \\&{\mathop {\le }\limits ^{|v_{j}| \le |v_{m}|}}\sum _{j = 1}^{n} a_{mj} \left| v_{m}\right| = \left| v_{m}\right| \sum _{j = 1}^{n} a_{mj} = \left| v_{m}\right| . \end{aligned}$$
(B.7)
Hence, as $|v_{m}| > 0$, we must have $|\lambda (A)| \le 1$ for any arbitrary eigenvalue $\lambda (A)$.

In total, this shows that $\max |\lambda (A)| = 1$ for any stochastic matrix A. $\square $

Appendix C: Technical appendix

Proof

(Theorem 3) Notice that $\Gamma $ clearly maps from X to X because for any arbitrary $f \in X$ with $\left| f(s) \right| < K_{f}$ for some $0< K_{f} < \infty $, there exists $0< K_{\Gamma f} < \infty $ such that $\left| \left( \Gamma f\right) (s) \right| < K_{\Gamma f}$ for all $s \in S$:

$$\begin{aligned} \begin{aligned} \left| \left( \Gamma f\right) (s) \right|&= {} \left| r(s) + {\overline{\beta }} \sup _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right| \\&{\mathop {=}\limits ^{{\mathbb {A}} \text { finite}}}\left| r(s) + {\overline{\beta }} \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right| \\&\le \underbrace{\left| r(s) \right| }_{< K_{r}} + {\overline{\beta }} \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ \underbrace{\left| f(T_{B}(s, a(s), Z)) \right| }_{< K_{f}} \bigg | s\right] \right\} < K_{\Gamma f} \end{aligned} \end{aligned}$$

(C.1)

for every $K_{\Gamma f} \ge K_{r} + {\overline{\beta }} K_{f}$. Consequently, $\Gamma f \in X$. We further prove the claim that $d(\Gamma f, \Gamma g) \le {\overline{\beta }} \cdot d(f,g)$ for all $f, g \in X$. We deduce

$$\begin{aligned} d(\Gamma f, \Gamma g)&= \Vert \Gamma f - \Gamma g\Vert _{\infty } = \sup _{s \in S} \left| \left( \Gamma f - \Gamma g\right) (s)\right| \\&= \sup _{s \in S} \Bigg \{ \Bigg |r(s) + {\overline{\beta }} \sup _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \\&\quad - r(s) - {\overline{\beta }} \sup _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \Bigg | \Bigg \} \\&= {\overline{\beta }} \sup _{s \in S} \left\{ \left| \sup _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right. \right. \\&\quad \left. \left. - \sup _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right| \right\} \\&{\mathop {=}\limits ^{{\mathbb {A}} \text { finite}}}{\overline{\beta }} \sup _{s \in S} \left\{ \left| \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right. \right. \\&\quad \left. \left. - \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right| \right\} . \end{aligned}$$

(C.2)

Denote $a_{f}(s) := {{\,{\text{arg max}}\,}}_{a(s) \in {\mathbb {A}}} {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] $ the corresponding maximisator. For the function inside the first supremum we use the following inequality, assume w.l.o.g. $\max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \ge \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} $:

$$\begin{aligned} \begin{aligned}&\left| \underbrace{\max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} - \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} }_{\ge 0}\right| \\&\quad = \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} - \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \\&\quad = {\mathbb {E}}\left[ f(T_{B}(s, a_{f}(s), Z)) \bigg | s\right] - \underbrace{\max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} }_{\ge {\mathbb {E}}\left[ g(T_{B}(s, a_{f}(s), Z)) \bigg | s\right] } \\&\quad \le {\mathbb {E}}\left[ f(T_{B}(s, a_{f}(s), Z)) \bigg | s\right] - {\mathbb {E}}\left[ g(T_{B}(s, a_{f}(s), Z)) \bigg | s\right] \\&\quad = {\mathbb {E}}\left[ f(T_{B}(s, a_{f}(s), Z)) - g(T_{B}(s, a_{f}(s), Z)) \bigg | s\right] . \end{aligned} \end{aligned}$$

(C.3)

Inserting this back, gives

$$\begin{aligned} \begin{aligned} d(\Gamma f, \Gamma g)&= {\overline{\beta }} \sup _{s \in S} \left\{ \left| \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ f(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right. \right. \\&\quad \left. \left. - \max _{a(s) \in {\mathbb {A}}} \left\{ {\mathbb {E}}\left[ g(T_{B}(s, a(s), Z)) \bigg | s\right] \right\} \right| \right\} \\&\le {} {\overline{\beta }} \sup _{s \in S} \left\{ {\mathbb {E}}\left[ \underbrace{f(T_{B}(s, a_{f}(s), Z)) - g(T_{B}(s, a_{f}(s), Z))}_{\le \sup _{s \in S} |f(s) - g(s)|} \bigg | s\right] \right\} \\&\le {} {\overline{\beta }} \sup _{s \in S} \left| \left( f(s) - g(s)\right) \right| = {\overline{\beta }} \Vert f - g\Vert _{\infty } = {\overline{\beta }} \cdot d(f,g) \end{aligned} \end{aligned}$$

(C.4)

which was to be shown. $\square $

Proof of Theorem 4

We prove that the conditions 1.-3. in Theorem 8 are fulfilled on (X, d). Hence, the conclusions a) and b) in Theorem 8 hold true and the statement in Theorem 4 is verified.

Let Assumptions 1 and 2 be satisfied and let X and d be as defined above. First, (X, d) is clearly non-empty and also complete as every Cauchy sequence converges within X. We prove 1.–3.:

1.
For all $f \in X$ there exists a maximisator $a_{f}$ of f: It was already shown below Definition 4 that the maximisator $a_{f}$ for every $f \in X$ exists due to ${\mathbb {A}}$ being finite according to Assumption 2.
2.
$\Gamma $ is a contraction on X: This was proven in Theorem 3.
3.
$0 \in X$: The zero function clearly satisfies $0 \in X$.

$\square $

Proof of Theorem 5

First, we have

$$\begin{aligned}&{\mathcal {V}}^{(i+1)}(S) - {\mathcal {V}}^{(i)}(S) \\&\quad {\mathop {=}\limits ^{(5.37): \text { policy evaluation step}}} \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{-1} r(S) - {\mathcal {V}}^{(i)}(S) \\&\quad = \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{-1} \\&\qquad \times \left[ r(S) - \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) {\mathcal {V}}^{(i)}(S)\right] \\&\quad = \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{-1} \\&\qquad \times \left( r(S) + e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S)) {\mathcal {V}}^{(i)}(S) - {\mathcal {V}}^{(i)}(S)\right) \\&\quad {\mathop {=}\limits ^{(5.36): \text { policy improvement step}}} \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{-1} \\&\qquad \times \left( \max _{a(S) \in \text {Grid}(a)} \left\{ r(S) + e^{- (\lambda _{x} + \beta ) \Delta } q Q(a(S)) {\mathcal {V}}^{(i)}(S)\right\} - {\mathcal {V}}^{(i)}(S)\right) . \end{aligned}$$

(C.5)

For the second term we observe

$$\begin{aligned}&\max _{a(S) \in \text {Grid}(a)} \left\{ r(S) + e^{- (\lambda _{x} + \beta ) \Delta } q Q(a(S)) {\mathcal {V}}^{(i)}(S)\right\} - {\mathcal {V}}^{(i)}(S) \\&\qquad \ge \left\{ r(S) + e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i)}(S)) {\mathcal {V}}^{(i)}(S)\right\} - {\mathcal {V}}^{(i)}(S) \\&\qquad {\mathop {=}\limits ^{(5.37): \text { policy evaluation step}}} {\mathcal {V}}^{(i)}(S) - {\mathcal {V}}^{(i)}(S) = {\mathbf {0}}. \end{aligned}$$

(C.6)

Hence every entry in the second term is non-negative. The same holds for the first term by the following argument: First, it is $\left( I_{n} - H\right) ^{-1} = \sum _{k = 0}^{\infty } H^{k} = I_{n} + \sum _{k = 1}^{\infty } H^{k}$ for any matrix $H \in {\mathbb {R}}^{n \times n}$ such that the power series $\sum _{k = 1}^{\infty } H^{k}$ converges due to

$$\begin{aligned} \left( I_{n} - H\right) \sum _{k = 0}^{\infty } H^{k} = \sum _{k = 0}^{\infty } H^{k} - \sum _{k = 1}^{\infty } H^{k} = I_{n} = \left( I_{n} - H\right) \left( I_{n} - H\right) ^{-1}. \end{aligned}$$

(C.7)

According to [25], the matrix power series $\sum _{k = 1}^{\infty } H^{k}$ (also called “Neumann series”) converges if for every eigenvalue $\lambda (H)$ of the matrix H it holds $|\lambda (H)| < 1$, i.e. $\max |\lambda (H)| < 1$.

Set $G := q Q(a^{(i+1)}(S))$ and $H := e^{- (\lambda _{x} + \beta ) \Delta } G$. The matrix G is a (row) stochastic matrix (all rows sum up to one and all entries are non-negative) since it represents the transition matrix that contains the transition probabilities from one state to another as entries. With

$$\begin{aligned} G x = \lambda (G) x \Leftrightarrow H x = \lambda (H) x,\ \lambda (H) := e^{- (\lambda _{x} + \beta ) \Delta } \lambda (G), \end{aligned}$$

(C.8)

it follows that $\lambda (G)$ is an eigenvalue to matrix G if and only if $\lambda (H) = e^{- (\lambda _{x} + \beta ) \Delta } \lambda (G)$ is an eigenvalue to matrix H. From Theorem 10 in Appendix 1 it follows that $\max |\lambda (G)| = 1$ as G is a stochastic matrix. Thus,

$$\begin{aligned} \max |\lambda (H)| = \max |e^{- (\lambda _{x} + \beta ) \Delta } \lambda (G)| = e^{- (\lambda _{x} + \beta ) \Delta } \underbrace{\max |\lambda (G)|}_{= 1} = e^{- (\lambda _{x} + \beta ) \Delta } < 1. \end{aligned}$$

(C.9)

We conclude that

$$\begin{aligned} \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{-1} = I_{n_{S}} + \sum _{k = 1}^{\infty } \left( e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{k}, \end{aligned}$$

(C.10)

where all entries of $I_{n_{S}}$ and $\left( e^{- (\lambda _{x} + \beta ) \Delta } q Q(a^{(i+1)}(S))\right) ^{k}$ are non-negative since $e^{- (\lambda _{x} + \beta ) \Delta } > 0$ and $G = q Q(a^{(i+1)}(S))$ is a stochastic matrix.

In summary, we multiply a matrix of non-negative entries (first term) with a vector of non-negative entries (second term) and therefore finally receive a vector of non-negative entries which implies monotonicity in the value function:

$$\begin{aligned} {\mathcal {V}}^{(i+1)}(S) - {\mathcal {V}}^{(i)}(S) \ge {\mathbf {0}}. \end{aligned}$$

(C.11)

$\square $

Proof

(Existence of inverse matrix of $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)$) First notice that $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)$ is a strictly diagonally dominant matrix according to the Definition 5 in Appendix 1: First, the matrix qQ(a) is a (row) stochastic matrix (all rows sum up to one and all entries are non-negative) as it represents the transition matrix which contains the transition probabilities from one state to another as entries. Hence, by construction we have $\left( q Q(a)\right) _{ij} \ge 0$, $\forall i,j \in \{1, \ldots , n_{S}\}$, and

$$\begin{aligned} \sum _{j = 1}^{n_{S}} \left( q Q(a)\right) _{ij} = 1,\ \forall i \in \{1, \ldots , n_{S}\}. \end{aligned}$$

(C.12)

Together with $\left( I_{n_{S}}\right) _{ii} = 1$ and $\left( I_{n_{S}}\right) _{ij} = 0$ for $i \ne j$, this automatically implies that

$$\begin{aligned} \left| \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) _{ii} \right|&= \left| \underbrace{\left( I_{n_{S}}\right) _{ii}}_{= 1} - e^{- (\lambda _{x} + \beta ) \Delta } \underbrace{\left( q Q(a)\right) _{ii}}_{= 1 - \sum _{j = 1, j \ne i}^{n_{S}} \left( q Q(a)\right) _{ij}} \right| \\&= \left| 1 - e^{- (\lambda _{x} + \beta ) \Delta } \left( 1 - \sum _{j = 1, j \ne i}^{n_{S}} \left( q Q(a)\right) _{ij}\right) \right| \\&= \left| \underbrace{1 - e^{- (\lambda _{x} + \beta ) \Delta } + e^{- (\lambda _{x} + \beta ) \Delta } \sum _{j = 1, j \ne i}^{n_{S}} \left( q Q(a)\right) _{ij}}_{\ge 0} \right| \\&= 1 - e^{- (\lambda _{x} + \beta ) \Delta } + \left| e^{- (\lambda _{x} + \beta ) \Delta } \sum _{j = 1, j \ne i}^{n_{S}} \left( q Q(a)\right) _{ij} \right| \\&{\mathop {=}\limits ^{\left( q Q(a)\right) _{ij} \ge 0}} 1 - e^{- (\lambda _{x} + \beta ) \Delta } + \sum _{j = 1, j \ne i}^{n_{S}} \left| e^{- (\lambda _{x} + \beta ) \Delta } \left( q Q(a)\right) _{ij} \right| , \end{aligned}$$

(C.13)

where

$$\begin{aligned} \sum _{j = 1, j \ne i}^{n_{S}} \left| e^{- (\lambda _{x} + \beta ) \Delta } \left( q Q(a)\right) _{ij} \right|&= \sum _{j = 1, j \ne i}^{n_{S}} \left| - e^{- (\lambda _{x} + \beta ) \Delta } \left( q Q(a)\right) _{ij} \right| \\&{\mathop {=}\limits ^{\left( I_{n_{S}}\right) _{ij} = 0,\ j \ne i}} \sum _{j = 1, j \ne i}^{n_{S}} \left| \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) _{ij} \right| . \end{aligned}$$

(C.14)

Thus it holds

$$\begin{aligned} \begin{aligned} \left| \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) _{ii} \right|& = \underbrace{1 - e^{- (\lambda _{x} + \beta ) \Delta }}_{> 0} + \sum _{j = 1, j \ne i}^{n_{S}} \left| \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) _{ij} \right| \\ &> \sum _{j = 1, j \ne i}^{n_{S}} \left| \left( I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)\right) _{ij} \right| \end{aligned} \end{aligned}$$

(C.15)

which implies that the matrix $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)$ is strictly diagonally dominant. Therefore, the inverse of $I_{n_{S}} - e^{- (\lambda _{x} + \beta ) \Delta } q Q(a)$ always exists in view of Theorem 9 in Appendix 1. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lichtenstern, A., Zagst, R. Optimal investment strategies for pension funds with regulation-conform dynamic pension payment management in the absence of guarantees. Eur. Actuar. J. 12, 647–700 (2022). https://doi.org/10.1007/s13385-021-00298-7

Download citation

Received: 14 March 2021
Revised: 08 August 2021
Accepted: 05 October 2021
Published: 29 October 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s13385-021-00298-7

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Optimal investment strategies for pension funds with regulation-conform dynamic pension payment management in the absence of guarantees

Abstract

Similar content being viewed by others

A combined stochastic programming and optimal control approach to personal finance and pensions

New Challenges in Pension Industry: Proposals of Personal Pension Products

Rule-based strategies for dynamic life cycle investment

1 Introduction

2 The financial market model

3 The decumulation phase mathematical model

3.1 System at re-adjustment times

3.2 Dynamics between re-adjustment times

4 The decumulation phase portfolio selection problem

4.1 Continuous-time optimization problem

4.2 Discrete-time dynamic optimization

4.3 Bellman equation

4.4 Extension to a single-cohort model

Remark 1

5 A stationary solution

5.1 The infinite-time horizon problem

Assumption 1

Assumption 2

5.2 Definition of the grid

5.3 Stationary grid solution

5.4 Policy function iteration: the algorithm

5.5 Policy function iteration: theoretical foundation

Theorem 3

Proof

Theorem 4

Proof

Theorem 5

Proof

Theorem 6

6 Case study: policy function iteration for a cohort of clients

6.1 Optimization

6.2 Simulation study

7 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Policy function iteration: theoretical foundation

Definition 1

Definition 2

Theorem 7

Definition 3

Definition 4

Theorem 8

Appendix B: Some useful matrix properties

Definition 5

Theorem 9

Definition 6

Definition 7

Theorem 10

Proof

Appendix C: Technical appendix

Proof

Proof of Theorem 4

Proof of Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation