1 Introduction

This paper considers stochastic interest models, which are state-wise deterministic dependent on an underlying finite state-space Markov process. The spot rate r(u) at time u is assumed to be of the form

$$\begin{aligned} r(u) = r_{X(u)}(u), \end{aligned}$$
(1)

where \(\{ X(u)\}_{u\ge 0}\) denotes a time-inhomogeneous Markov jump process on a p-dimensional state-space, and \(r_i(u)\), \(i=1,\ldots ,p\), are deterministic functions. Assuming an arbitrage free bond market, a zero-coupon bond with terminal date T can then be defined in terms of its prices by

$$\begin{aligned} B(t,T) = {\mathbb {E}}^{{\mathbb {Q}}}\! \left( \left. \text {e}^{-\int _t^T r_{X(u)}(u) \text {d}u}\, \right| \, {{\mathcal {F}}}(t) \right) \!, \ \ 0\le t \le T , \end{aligned}$$
(2)

where \({{\mathcal {F}}}(t) = \sigma (X(u): 0\le u \le t)\) is the \(\sigma \)-algebra generated by \(\{ X(u)\}_{u\ge 0}\). The expectation is taken under some risk-neutral measure \({\mathbb {Q}}\) (see, e.g., [6, 14]).

If all \(r_i(u)\ge 0\), a key result of the paper is that, conditionally on \(X(t), T\rightarrow B(t,T)\) equals the survival function of an inhomogeneous phase-type distribution.

In the presence of negative interest rates, this is no longer certain since B(tT) may be larger than one and non-monotone. However, assuming that the negative interest rates are bounded from below by a number \(-\rho <0\), we get from (2) that

$$\begin{aligned} \text {e}^{-\rho (T-t)}B(t,T) = {\mathbb {E}}^{{\mathbb {Q}}} \!\left( \left. \text {e}^{-\int _t^T (r_{X(u)}(u)+\rho )\text {d}u}\, \right| \, {{\mathcal {F}}}(t) \right) \end{aligned}$$
(3)

then equals a survival function of an inhomogeneous phase-type distribution.

The interpretation that the bond prices are (possibly scaled) phase-type survival functions enables us to fit (calibrate) the transition rates of \(\{ X(u)\}_{u\ge 0}\) from the observed bond prices by using a maximum likelihood approach. Since phase-type distributions are dense, i.e. can approximate any distribution with a sufficient number of phases, we may then fit a PH to the observed survival function (equivalent to a histogram) such that all observations (bond prices) are hit. The last point of observation may be considered right censored. All fitted transition rates are under a risk-neutral measure \({\mathbb {Q}}\).

The functional form of the state-wise price of the bond was noted already in [26, (3.17)], though its relation to phase-type theory was not mentioned, and its potential was not further explored. We also believe that the “bond price representation” (2) of a phase-type survival function is unknown to the phase-type community.

In the context of multi-state life insurance, modelling stochastic interest rates also play a crucial role. The literature varies from SDE based models, see e.g. [2, 4, 10, 22, 28], to the finite state-space Markov chain models of [24, 25] of the form (1). In the SDE-based methods, one often relies on an independence assumption between interest rates and biometric risk so that available forward rate curves can be used for valuation; an exception is [10], where dependence between interest rates and biometric risk is incorporated. In either case, the SDE-based models do not integrate into classic Thiele and Hattendorff type of results, which limits time-dynamic valuations based on these traditional methods.

The spot rate model (1), however, can be wholly incorporated into Thiele and Hattendorff type of differential equations for reserves and higher order moments, as shown by [24, 25] and further explored in [26]. These observations allow for dependency between interest rates and transitions in life insurance, as well as time-dynamic valuations, without altering the traditional methods. The latter refers to the model (1) as the Markov chain market while [18] refers to it as Markovian interest intensities.

In this paper, we work with an extended version of the bond prices,

$$\begin{aligned} {\mathbb {E}}^{{\mathbb {Q}}}\!\!\left( \left. 1\{ X(T) =j \} \textrm{e}^{-\int _t^T r_{X(u)}(u)\textrm{d}u}\, \right| \, {{\mathcal {F}}}(t) \right) \!\!, \ \ \ j=1,\ldots ,p , \end{aligned}$$
(4)

which in an insurance context are the discounting factors on the event that the terminal state will be j. Providing a matrix-representation for (4), we then find how it naturally integrates into the matrix framework of [8]. The extension is convenient from a mathematical point of view and also relates to the partial ([8]) and retrospective reserves in single states [23, Sec. 5E]. The treatment of the latter, however, is outside the scope of the current paper. We restate the results of the former framework in the context of stochastic interest rates. The proofs, and parts of the exposition, will differ from that of [8].

Markov jump processes in finance are often used in connection with regime switching models or where the different states are used to alter the parameters of usually SDE-driven processes. Here transitions can take place under some physical measure and may have a real-world interpretation. The Markov chain model for interest rates (1) can be thought of as a regime-switching model under a risk-neutral measure, particularly if the interest rates for each state are known a priori.

The Markov jump process approach can approximate bond price modelling in terms of diffusions. Formal constructions have been made in [5, 19,20,21]. Since phase-type distributions form a dense class of distributions on the positive reals, this paper will offer an alternative and parsimonious way to approximate any zero-coupon bond (arbitrarily close) by a bond of the form (2).

The paper is organised as follows. Section 2 introduces some background and notation. Bond price modelling using phase-type distribution is developed in Sect. 3. In Sect. 4, we develop estimation of the Markovian interest rate model, both with and without restricted interest rates, and we provide examples of calibration to diffusion models and real data. In Sect. 5 we adjust the life-insurance framework of [8] to allow for stochastic interest rates of the form (1). It contains examples of how to set up a model using the fitted bond parameters of Sect. 3 as well as a matrix-based method for calculating the equivalence premium, either via Newton’s method or as an explicit formula. In Sect. 6 we present a numerical example. For the sake of exposition, the proofs are deferred to Appendix B.

2 Background

2.1 Notation

Unless otherwise stated, row vectors are denoted by bold Greek lowercase letters (e.g., \(\varvec{{\pi }}\)) and column vectors by bold lowercase Roman letters (e.g., \(\varvec{{v}}\)). Elements of vectors are denoted by the same unbold, indexed letters (like \(\varvec{{v}}=(v_1,\ldots ,v_p)^\prime \)). The vector \(\varvec{{e}}_i\) is the column vector which is 1 at index i and zero otherwise whereas \(\varvec{{e}}=(1,1,\ldots ,1)^\prime \).

Matrices are denoted by bold capital letters (Greek or Roman) and their elements by their corresponding lowercase indexed letters (e.g.\(\varvec{{A}}=\{ a_{ij} \}\)). If \(\varvec{{v}}\) is a vector (row or column), then \(\varvec{{\Delta }}(\varvec{{v}})\) denotes the diagonal matrix, which has \(\varvec{{v}}\) as diagonal.

2.2 The product integral

Consider a time-inhomogeneous Markov jump process \(X = \{X(t)\}_{t\ge 0}\) taking values in a finite state space \(E = \{1,\ldots ,p\}\), with intensity matrix (functions) \(\varvec{{M}}(t) =\{\mu _{ij}(t)\}_{i,j\in E}\). Denote by \(\varvec{{P}}(s,t)=\{ p_{ij}(s,t) \}\), \(s\le t\), the corresponding transition matrix, the elements of which are the transition probabilities \(p_{ij}(s,t)={\mathbb {P}} (X(t)=j\,|\,X(s)=i)\) for \(i,j\in E\). The transition matrix \(\varvec{{P}}(s,t)\) then satisfies Kolmogorov’s forward and backward differential equations,

$$\begin{aligned} \begin{aligned} \frac{\partial }{\partial t}\varvec{{{P}}}(s,t)&=\varvec{{{P}}}(s,t)\varvec{{{M}}}(t), \quad \ \ \varvec{{{P}}}(s,s) = \varvec{{{I}}}, \\ \frac{\partial }{\partial s}\varvec{{{P}}}(s,t)&=-\varvec{{{M}}}(s)\varvec{{{P}}}(s,t), \quad \varvec{{{P}}}(t,t)=\varvec{{{I}}} . \end{aligned} \end{aligned}$$
(5)

The solution to (5), which in general is not explicitly available, will be denoted by

$$\begin{aligned} \varvec{{P}}(s,t) = \prod _s^t \left( \varvec{{I}} + \varvec{{M}}(x)\textrm{d}x \right) \end{aligned}$$
(6)

and referred to as the product integral of \(\varvec{{M}}(x)\) from s to t. This is also true for general matrix functions \(\varvec{{M}}(t)\), which satisfy (5) but are not intensity matrices. Product integrals have several nice properties, see, e.g., [8, Section 2] for an overview.

Remark 2.1

The idea behind the notation of the product integral comes from a Riemann type of construction using step-functions. If we approximate \(\varvec{{M}}(x)\) by a piecewise constant matrix function taking values \(\varvec{{M}}(x_i)\) on \([x_i,x_i+\Delta x_i)\) for \(s=x_0<x_1<\cdots <x_N=t\) and where \(\Delta x_i=x_{i+1}-x_i\), then by [8, (2.5)] the product integral over \([x_i,x_i+\Delta x_i)\) equals the matrix exponential

$$\begin{aligned} \textrm{e}^{\varvec{{M}}(x_i)\Delta x_i} = \varvec{{I}} + \varvec{{M}}(x_i) \Delta x_i + O(\Delta x_i^2) . \end{aligned}$$

By letting \(\Delta x_i \rightarrow 0\) and using [8, (2.2)] we then arrive at the notation (6). \(\triangle \)

We briefly outline some additional properties that are relevant for the present paper. If \(\varvec{{A}}(x)\) and \(\varvec{{B}}(y)\) commute for all xy, then

$$\begin{aligned} \prod _s^t (\varvec{{I}} + (\varvec{{A}}(x)+\varvec{{B}}(x))\textrm{d}x ) = \prod _s^t (\varvec{{I}} + \varvec{{A}}(x)\textrm{d}x )\prod _s^t (\varvec{{I}} + \varvec{{B}}(x)\textrm{d}x ) . \end{aligned}$$
(7)

Now, let

$$\begin{aligned} \varvec{{C}}(s,t) = \prod _s^t (\varvec{{I}} + \varvec{{A}}(x)\textrm{d}x) \otimes \varvec{{I}} , \end{aligned}$$

where \(\otimes \) denotes the Kronecker product. The Kronecker product between a \(p_1\times q_1\) matrix \(\varvec{{A}}=\{a_{ij}\} \) and a \(p_2\times q_2\) matrix \(\varvec{{B}} = \{ b_{ij} \}\) is defined as the \(p_1p_2\times q_1 q_2\) matrix

$$\begin{aligned} \varvec{{A}}\otimes \varvec{{B}} = \{ a_{ij}\varvec{{B}} \}_{i=1,\ldots ,p_1,j=1,\ldots ,q_1} = \{ a_{ij}b_{k\ell } \} . \end{aligned}$$

Using that \((\varvec{{A}}\otimes \varvec{{B}}) (\varvec{{C}}\otimes \varvec{{D}}) =(\varvec{{A}}\varvec{{C}})\otimes (\varvec{{B}}\varvec{{D}})\), we get

$$\begin{aligned} \frac{\partial }{\partial t} \varvec{{{C}}}(s,t)= \displaystyle{} & {} \prod \limits _s^t (\varvec{{{I}}} + \varvec{{{A}}}(x)\text {d}x)\varvec{{{A}}}(t)\otimes \varvec{{{I}}} \\ ={} & {} \left( \displaystyle \prod \limits _s^t (\varvec{{{I}}} + \varvec{{{A}}}(x)\text {d}x)\otimes \varvec{{{I}}}\right) \!\left( \varvec{{{A}}}(t)\otimes \varvec{{{I}}} \right) \\ ={} & {} \varvec{{{C}}}(s,t)\left( \varvec{{{A}}}(t)\otimes \varvec{{{I}}} \right) \!, \end{aligned}$$

and we conclude that

$$\begin{aligned} \varvec{{C}}(s,t) = \prod _s^t (\varvec{{I}} + (\varvec{{A}}(x) \otimes \varvec{{I}})\textrm{d}x) . \end{aligned}$$
(8)

A similar argument gives that

$$\begin{aligned} \varvec{{I}} \otimes \prod _s^t (\varvec{{I}} + \varvec{{A}}(x)\textrm{d}x) = \prod _s^t (\varvec{{I}} + (\varvec{{I}}\otimes \varvec{{A}}(x))\textrm{d}x) . \end{aligned}$$
(9)

Finally, if \(\varvec{{A}}(t)\) and \(\varvec{{B}}(t)\) are Riemann integrable matrix functions of dimensions \(q\times q\) and \(p\times p\) respectively, then

$$\begin{aligned} \prod _s^t (\varvec{{I}}+ (\varvec{{A}}(x)\oplus \varvec{{B}}(x))\textrm{d}x) =\prod _s^t (\varvec{{I}}+ \varvec{{A}}(x)\textrm{d}x)\otimes \prod _s^t (\varvec{{I}}+ \varvec{{B}}(x)\textrm{d}x), \end{aligned}$$
(10)

where \(\oplus \) denotes the Kronecker sum, defined by \(\varvec{{{A}}}(t)\oplus \varvec{{{B}}}(t) = \varvec{{{A}}}(t)\otimes \varvec{{{I}}} +\varvec{{{I}}}\otimes \varvec{{{B}}}(t)\), and where the first \(\varvec{{I}}\) has the dimension of \(\varvec{{B}}(t)\) and the second \(\varvec{{I}}\) has the dimension of \(\varvec{{A}}(t)\). To see this, we notice that \(\varvec{{A}}(t)\otimes \varvec{{I}}\) and \(\varvec{{I}}\otimes \varvec{{B}}(t)\) commute, so by (7) we get that

$$\begin{aligned} \prod _s^t (\varvec{{{I}}}+ (\varvec{{{A}}}(x)\oplus \varvec{{{B}}}(x))\text {d}x)= & {} {} \displaystyle \prod \limits _s^t (\varvec{{{I}}}+ (\varvec{{{A}}}(x)\otimes \varvec{{{I}}})\text {d}x) \displaystyle \prod \limits _s^t (\varvec{{{I}}}+ (\varvec{{{I}}}\otimes \varvec{{{B}}}(x))\text {d}x) \\ {}= & {} {} \left[ \displaystyle \prod \limits _s^t (\varvec{{{I}}}+ \varvec{{{A}}}(x)\text {d}x) \otimes \varvec{{{I}}} \right] \! \left[ \varvec{{{I}}} \otimes \displaystyle \prod \limits _s^t (\varvec{{{I}}}+ \varvec{{{B}}}(x)\text {d}x) \right] \\ {}= & {} {} \displaystyle \prod \limits _s^t (\varvec{{{I}}}+ \varvec{{{A}}}(x)\text {d}x) \otimes \prod _s^t (\varvec{{{I}}}+ \varvec{{{B}}}(x)\text {d}x) . \end{aligned}$$

For further details on Kronecker products and sums, we refer to [15]

2.3 Phase-type distributions

Consider a (time-inhomogeneous) Markov jump process \(\{ Y(t) \}_{t\ge 0}\), where state \(p+1\) is absorbing and \(1,\ldots ,p\) are transient. The intensity matrix \(\varvec{{M}}(x)\) for \(\{ Y(t) \}_{t\ge 0}\) is then of the form

$$\begin{aligned} \varvec{{{M}}}(x) = \begin{pmatrix} \varvec{{{T}}}(x) &{}{} \varvec{{{t}}}(x) \\ \varvec{{{0}}} &{}{} 0 \end{pmatrix}\!, \end{aligned}$$
(11)

where \(\varvec{{T}}(x)\) is a \(p \times p\) matrix consisting of transition rates between transient states, and \(\varvec{{t}}(x) = -\varvec{{T}}(x)\varvec{{e}}\) is a column vector of exit rates, i.e. rates for jumping to the absorbing state. The matrix \(\varvec{{T}}(x)\) is referred to as a sub-intensity matrix, which has non-positive row sums.

Then by [8, Lemma 2] , the transition matrix for \(\{ Y(t)\}_{t\ge 0}\) is given by

$$\begin{aligned} \varvec{{{P}}}(s,t) = \prod _s^t \!\left( \varvec{{{I}}} {+} \begin{pmatrix} \varvec{{{T}}}(u) &{}{} \varvec{{{t}}}(u) \\ \varvec{{{0}}} &{}{} 0 \end{pmatrix}\! \text {d}u\right) {=}\begin{pmatrix} \displaystyle \prod _s^t (\varvec{{{I}}} {+} \varvec{{{T}}}(u)\text {d}u) &{}{} \quad \varvec{{{e}}} - \displaystyle \prod _s^t (\varvec{{{I}}} {+}\varvec{{{T}}}(u)\text {d}u)\varvec{{{e}}} \\ \varvec{{{0}}} &{}{}1 \end{pmatrix}\!. \end{aligned}$$

Hence \(\prod _s^t (\varvec{{I}} + \varvec{{T}}(u)\textrm{d}u) \) is the matrix which contains the transition probabilities between the transient states from times s to t.

We assume that \({\mathbb {P}}(Y(0)=p+1)=0\), and define \(\pi _i={\mathbb {P}}(Y(0)=i)\). Hence \(\varvec{{\pi }}=(\pi _1,\ldots ,\pi _{p})\) satisfies that \(\varvec{{\pi }}\varvec{{e}}=\sum _i \pi _i=1 \), so that \(\varvec{{\pi }}\) is the initial distribution for \(\{ Y(t)\}_{t\ge 0}\) concentrated on the transient states only. Then

$$\begin{aligned} \left( {\mathbb {P}}(Y(t)=1),{\mathbb {P}}(Y(t)=2),\ldots ,{\mathbb {P}}(Y(t)=p) \right) =\varvec{{\pi }}\prod _0^t (\varvec{{I}} + \varvec{{T}}(u)\textrm{d}u) \end{aligned}$$
(12)

is a row vector that contains the probabilities of the process being in the different transient states at time t.

Now let

$$\begin{aligned} \tau = \inf \{ t>0 : Y(t) = p+1 \} \end{aligned}$$

denote the time until absorption. Then from (12) we immediately get that

$$\begin{aligned} {\mathbb {P}} (\tau >t) = \varvec{{\pi }}\prod _0^t (\varvec{{I}} +\varvec{{T}}(u)\textrm{d}u) \varvec{{e}} \end{aligned}$$
(13)

since the right-hand side equals the probability of the process belonging to any of the transient states by time t, i.e., absorption has not yet occurred. Differentiating (13) and using (5) we see that \(\tau \) has a density of the form

$$\begin{aligned} f_\tau (x) = \varvec{{\pi }}\prod _0^x (\varvec{{I}} + \varvec{{T}}(u)\textrm{d}u) \varvec{{t}}(x) . \end{aligned}$$
(14)

Definition 2.2

The distribution of \(\tau \) is called an inhomogeneous phase-type distribution, and we write \(\tau \sim \textrm{IPH}(\varvec{{\pi }},\varvec{{T}}(x))\), where the indexation of \(\varvec{{T}}(x)\) is over \(x\ge 0\).

We do not need to specify \(\varvec{{t}}(x)\) since it is implicitly given by \(\varvec{{T}}(x)\). Indeed, since row sums of intensity matrices (and hence of (11)) are zero, we have that \(\varvec{{t}}(x) = -\varvec{{T}}(x)\varvec{{e}}\). If \(\varvec{{T}}(x)\equiv \varvec{{T}}\), then we simply write \(\tau \sim \textrm{PH}(\varvec{{\pi }},\varvec{{T}})\). This corresponds to the underlying Markov jump process being time-homogeneous.

We also notice \(\varvec{{T}}(x) + \varvec{{\Delta }}(\varvec{{t}}(x))\) defines an intensity matrix (without the absorbing state).

The class of phase-type distributions (both PH and IPH) is dense (in the sense of weak convergence) in the class of distributions on the positive reals, implying that any distribution with support \({\mathbb {R}}_+\) may be approximated arbitrarily close by a phase-type distribution. This result is also of considerable practical importance since phase-type distributions can be fitted both to data and distributions using a maximum likelihood approach. For the time-homogenous case, PH, see [3] while for IPH we refer to [1].

3 Phase-type representations of bond prices

Consider the stochastic interest rate model of (1), and let \(E = \{1,\ldots ,p\}\) denote the state-space of the Markov jump process \(X = \{ X(t)\}_{t\ge 0}\) with intensity matrix \(\varvec{{M}}(t)=\{ \mu _{ij}(t) \}_{i,j\in E}\). Let \(\varvec{{r}}(t) = \left( r_1(t),\ldots ,r_p(t)\right) '\) be the column vector which contains the interest rate functions.

The main result of this section is the following result.

Theorem 3.1

For \(i,j\in E\), let

Then the matrix \(\varvec{{D}}(s,t)=\{ d_{ij}(s,t) \}_{i,j\in E}\) has the following representation

$$\begin{aligned} \varvec{{{D}}}(s,t) = \prod _s^t \left( \varvec{{{I}}} + \left[ \varvec{{{M}}}(u) -\varvec{{{\Delta }}}(\varvec{{{r}}}(u)) \right] \text {d}u \right) \!. \end{aligned}$$
(15)

Proof

Conditioning on the state of \(s+\textrm{d}s\), we get that

$$\begin{aligned} d_{ij}(s,t) {}= & {} (1+\mu _{ii}(s)\text {d}s) d_{ij}(s+\text {d}s,t)(1-r_{i}(s)\text {d}s )\\ {}{} & {} + \sum _{k\ne i} \mu _{ik}(s)\text {d}s d_{kj}(s+\text {d}s,t)(1-r_i(s)\text {d}s) \\{}= & {} d_{ij}(s+\text {d}s,t)(1-r_{i}(s)\text {d}s ) + \mu _{ii}(s) \text {d}s d_{ij}(s+\text {d}s,t)\\{}{} & {} + \sum _{k\ne i} \mu _{ik}(s) \text {d}s d_{kj}(s+\text {d}s,t), \end{aligned}$$

so that

$$\begin{aligned} -\frac{\partial }{\partial s} d_{ij}(s,t) = -r_i(s)d_{ij}(s,t) + \sum _k \mu _{ik}(s)d_{kj}(s,t) . \end{aligned}$$
(16)

In matrix form, this amounts to

$$\begin{aligned} \frac{\partial }{\partial s} \varvec{{{D}}}(s,t) = -\left( \varvec{{{M}}}(s) -\varvec{{{\Delta }}}(\varvec{{{r}}}(s)) \right) \! \varvec{{{D}}}(s,t). \end{aligned}$$
(17)

Noting that \(\varvec{{D}}(t,t) = \varvec{{P}}(t,t) = \varvec{{I}}\), we hence conclude that (15) holds. \(\square \)

Remark 3.2

The quantities \(d_{ij}(s,t)\) in Theorem 3.1 are introduced as \(\varvec{{r}}\)-deflated transition probabilities in [12, Appendix A], where the authors derive the differential equation (17). While they give a martingale-based proof, we provide a probabilistic sample path argument and give a product integral representation. \(\square \)

Remark 3.3

Multiplying both sides of (17) with \(\varvec{{e}}\) from the right, we recover the differential equation for the state-wise discount factors obtained in [25, (4.4)].\(\square \)

Assume that all \(r_i(x)\) are bounded from below, and let

$$\begin{aligned} \rho = \max \!\left( 0, -\min _{i\in E} \inf _{x\ge 0} r_i(x) \right) \!. \end{aligned}$$

Then \(\rho =0\) if all interest rates are non-negative, and otherwise \(-\rho \) provides a lower bound for all of them. Then we have the following result.

Theorem 3.4

The price of the zero-coupon bond (2) satisfies

$$\begin{aligned} B(t,T) = {\mathbb {E}}^{{\mathbb {Q}}}\! \left( \left. \exp \left( -\int _t^T r_{X(u)}(u) \text {d}u \right) \right| X(t) \right) =\varvec{{{e}}}_{X(t)}^\prime \varvec{{{D}}}(t,T)\varvec{{{e}}}. \end{aligned}$$
(18)

Conditional on \(X(t)=i\),

$$\begin{aligned} T\mapsto \text {e}^{-\rho (T-t)} B(t,T) \end{aligned}$$

is the survival function for an IPH distributed random variable, \(\tau (t)\), with initial distribution \(\varvec{{e}}_i^\prime \) and intensity matrices \(\varvec{{M}}(x+t)-\varvec{{\Delta }}(\varvec{{r}}(x+t))-\rho \varvec{{I}}\), \(x\ge 0\).

In particular, if all interest rates are non-negative, then \(\rho =0\) and the price itself, \( T\rightarrow B(t,T),\) becomes the survival function.

Proof

The formula (18) follows directly from the construction of the \(\varvec{{D}}(t,T)\) matrix by summing out over j in \(d_{ij}(t,T)\), which corresponds to post-multiplying \(\varvec{{D}}(t,T)\) by \(\varvec{{e}}\). Next, we notice that

$$\begin{aligned} e^{-\rho (T-t)}\prod _t^T \left( \varvec{{I}} + \left[ \varvec{{M}}(u) -\varvec{{\Delta }}(\varvec{{r}}(u)) \right] \textrm{d}u \right) =\prod _t^T \left( \varvec{{I}} + \left[ \varvec{{M}}(u) -\varvec{{\Delta }} (\varvec{{r}}(u)) - \rho \varvec{{I}} \right] \textrm{d}u \right) \!\!, \end{aligned}$$

which follows from (7). The matrix \(\varvec{{M}}(x)-\varvec{{\Delta }}(\varvec{{r}}(x))-\rho \varvec{{I}}\) is a sub-intensity matrix, which together with the distribution for X(t) defines a phase-type representation \((\varvec{{\pi }}_t, \varvec{{M}}(x+t)-\varvec{{\Delta }} (\varvec{{r}}(x+t)-\rho \varvec{{I}})\), \(x\ge 0\) (starting at time t). \(\square \)

The forward rate f(tT) is defined by

$$\begin{aligned} f(t,T) = -\frac{\partial }{\partial T}\log B(t,T) . \end{aligned}$$

Using Theorem 3.4, we may write

$$\begin{aligned} B(t,T) = \textrm{e}^{\rho (T-t)}\bar{F}_{\tau (t)}(T) , \end{aligned}$$

where \(\bar{F}_{\tau (t)}(T)=1-F_{\tau (t)}(T) \) denotes the survival function for \(\tau (t)\sim \text{ IPH } (\varvec{{e}}_{X(t)}^\prime ,\varvec{{M}}(x+t)-\varvec{{\Delta }} (\varvec{{r}}(x+t))-\rho \varvec{{I}})\). Then

$$\begin{aligned} -\frac{\partial }{\partial T}\log B(t,T) = -\rho +\frac{f_{\tau (t)}(T)}{1-F_{\tau (t)}(T)} , \end{aligned}$$

where \(f_{\tau (t)}\) denotes the density function for \(\tau (t)\). Hence we have proved the following result.

Corollary 3.5

Conditional on \(X(t)=i\), the forward rate f(tT) equals the hazard rate at T for the random variable \(\tau (t) \sim \textrm{IPH}(\varvec{{e}}_i,\varvec{{M}}(x+t)-\varvec{{\Delta }}(\varvec{{r}}(x+T))-\rho \varvec{{I}})\) less \(\rho \), i.e.

$$\begin{aligned} f(t,T) = \frac{f_{\tau (t)}(T)}{1-F_{\tau (t)}(T)} - \rho . \end{aligned}$$
(19)

Another immediate consequence of Theorem 3.4 is the following.

Corollary 3.6

Assume that all interest rates are non-negative. Then conditional on \(X(t)=i\), the random variable \(\tau (t) \sim \textrm{IPH}(\varvec{{e}}_i^\prime ,\varvec{{M}}(t+x)-\varvec{{\Delta }}(\varvec{{r}}(t+x)))\), \(x\ge 0\), then has a c.d.f. given by

$$\begin{aligned} F_{\tau (t)} (T)=1-B(t,T) = {\mathbb {E}}^{{\mathbb {Q}}}\! \left( \left. \int _t^T r_{X(y)}(y)\text {e}^{-\int _t^y r_{X(u)}(u)\textrm{d}u} \textrm{d}y\,\right| X(t) =i \right) \!. \end{aligned}$$

Proof

This follows from Theorem 3.4 with \(\rho =0\) and

$$\begin{aligned} f_{\tau (t)}(y)=&{} -\frac{\partial }{\partial y}B(t,y) ={\mathbb {E}}^{{\mathbb {Q}}}\!\left( \left. r_{X(y)}(y) \text {e}^{-\int _t^y r_{X(u)}(u)\text {d}u}\, \right| X(t)=i \right) \!. \end{aligned}$$

Integrating the expression then yields the result. \(\square \)

For the case where \(t=0\), the above results are reduced to the following.

Corollary 3.7

Assume that all interest rates are non-negative. Let \(\tau \sim \textrm{IPH}(\varvec{{\pi }},\varvec{{M}}(x)-\varvec{{\Delta }}(\varvec{{r}}(x)))\) and let \(\varvec{{\pi }}=(\pi _1,\ldots ,\pi _p)^\prime \) denote the (initial) distribution of X(0). Then

$$\begin{aligned} {\mathbb {P}} (\tau > T )= & {} {} {\mathbb {E}}^{{\mathbb {Q}}} \!\left( \exp \left( -\int _0^T r_{X(u)}(u) \textrm{d}u \right) \right) \!, \end{aligned}$$
(20)
$$\begin{aligned} F_{\tau } (T)= & {} {} {\mathbb {E}}^{{\mathbb {Q}}}\!\left( \int _0^T r_{X(y)}(y) \textrm{e}^{-\int _0^y r_{X(u)}(u)\textrm{d}u} \textrm{d}y \right) \!, \end{aligned}$$
(21)
$$\begin{aligned} f(0,T)= & {} {} \frac{f_{\tau }(T)}{1-F_{\tau }(T)} . \end{aligned}$$
(22)

Remark 3.8

The density \(f_\tau (t)\) has the interpretation of being the expected present value of the current interest rate accumulated in a small time interval around t, and \(F_\tau (T)\) is the present value of the total accumulated interest rate during [0, T]. \(\square \)

Example 3.1

Assume that all interest rates are non-negative. If \(\{ X(t)\}_{t\ge 0}\) is time-homogeneous and \(\varvec{{r}}(t)=\varvec{{r}}=(r_1,\ldots ,r_p)\), then we also have that

$$\begin{aligned} {\mathbb {E}}^{{\mathbb {Q}}}\!\left( \int _0^T \text {e}^{-\int _0^y r_{X(u)}(u)\text {d}u} \text {d}y\right) ={} & {} {} \int _0^T {\mathbb {E}}^{{\mathbb {Q}}}\! \left( \text {e}^{-\int _0^y r_{X(u)}\text {d}u} \right) \text {d}y \\={} & {} {} \int _0^T {\mathbb {P}}(\tau>y)\,\text {d}y \\={} & {} {} \int _0^T \varvec{{{\pi }}} \text {e}^{(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}}))y} \varvec{{{e}}}\, \text {d}y \\={} & {} {}\ \varvec{{{\pi }}}(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}}))^{-1} \text {e}^{(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}}))T}\varvec{{{e}}} -\varvec{{{\pi }}}(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}}))^{-1}\varvec{{{e}}}\\={} & {} {}\ \mu \! \left[ 1 - \tilde{\varvec{{{\pi }}}}\text {e}^{(\varvec{{{M}}} -\varvec{{{\Delta }}}(\varvec{{{r}}}))T}\varvec{{{e}}} \right] \\={} & {} {}\ \mu {\mathbb {P}}(\tilde{\tau }>T) , \end{aligned}$$

where \(\mu = \varvec{{\pi }} \left[ -(\varvec{{M}}-\varvec{{\Delta }}(\varvec{{r}})) \right] ^{-1}\varvec{{e}}\) is the expectation of \(\tau \),

$$\begin{aligned} \tilde{\varvec{{\pi }}} = \frac{\varvec{{\pi }} \left[ -(\varvec{{M}} -\varvec{{\Delta }}(\varvec{{r}})) \right] ^{-1}}{\varvec{{\pi }} \left[ -(\varvec{{M}}-\varvec{{\Delta }}(\varvec{{r}})) \right] ^{-1}\varvec{{e}}} \end{aligned}$$

is the stationary distribution of a phase-type renewal process with inter-arrivals being \(\textrm{PH}(\varvec{{\pi }}, \varvec{{M}}-\varvec{{\Delta }}(\varvec{{r}}))\), see [7, Th. 5.3.4], and \(\tilde{\tau }\sim \textrm{PH}(\tilde{\varvec{{\pi }}},\varvec{{M}}-\varvec{{\Delta }}(\varvec{{r}})))\). Hence the swap rate \(\rho \) can be expressed as

$$\begin{aligned} \rho&= \frac{{\mathbb {E}}^{{\mathbb {Q}}}\!\left( \int _0^T r_{X(y)}(y) \text {e}^{-\int _0^y r_{X(u)}(u)\text {d}u} \text {d}y\right) }{{\mathbb {E}}^{{\mathbb {Q}}}\! \left( \int _0^T \text {e}^{-\int _0^y r_{X(u)}(u)\text {d}u} \text {d}y\right) } =\frac{F_\tau (T)}{\mu {\mathbb {P}}(\tilde{\tau }>T) } \\ {}&= \frac{1 - \varvec{{{\pi }}}\text {e}^{(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}}))T} \varvec{{{e}}} }{\varvec{{{\pi }}}\left[ -(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}})) \right] ^{-1}\text {e}^{(\varvec{{{M}}}-\varvec{{{\Delta }}}(\varvec{{{r}}}))T}\varvec{{{e}}}}. \qquad \qquad \qquad \qquad \qquad \qquad \square \end{aligned}$$

4 Estimation

Time-homogeneous phase-type distributions or inhomogeneous phase-type distributions where the sub-intensity matrices are of the form

$$\begin{aligned} \varvec{{T}}(x) = \lambda _\theta (x) \varvec{{T}} , \end{aligned}$$

for some parametric function \(\lambda _\theta (x)\), can be estimated in terms of an EM algorithm.

Repeated data (absorption times) of course result in the same conditional expectations given their data. This carries over to weighted data as well, and hence the EM algorithm may efficiently estimate data in histograms. In particular, we may estimate to theoretical distributions by treating their discretised density as a histogram. This provides the link to fitting the intensity matrix of \(\{ X(t) \}_{t\ge 0}\) in (1) through bond prices, (2) or (3), either in terms of observed data or to a theoretical model.

Indeed, consider bond prices \(B(0,T_i)\) available at different maturities \(T_1,T_2,\ldots ,T_n\). Then according to Theorem 3.4 we have that

$$\begin{aligned} B(0,T_i) = \varvec{{\pi }}\varvec{{D}}(0,T_i)\varvec{{e}} = \textrm{e}^{\rho T_i}{\mathbb {P}}(\tau > T_i), \ \ i=1,2,\ldots ,n , \end{aligned}$$

for some \(\rho >0\) and where \(\tau \sim \text{ IPH }(\varvec{{\pi }},\varvec{{M}}(u) -\varvec{{\Delta }}(\varvec{{r}}(u))-\rho \varvec{{I}} )\). Then \(\rho \) must satisfy that

$$\begin{aligned} \textrm{e}^{-\rho T_i} B(0,T_i) \le 1, \ \ \ i=1,2,\ldots ,n. \end{aligned}$$

This can be achieved by choosing

$$\begin{aligned} \rho = \max _{i\in \{1,\ldots ,n\}}\! \left( \frac{\log B(0,T_i)}{T_i} \right) \!. \end{aligned}$$

In the life-insurance context in Denmark, by regulation the bond prices (discounting factors) must be computed from discrete forward rates, \(f_d(0,T_i)\), published by the Danish Financial Supervisory Authority. Thus

$$\begin{aligned} B(0,T_i) = \left( 1 + f_d(0,T_i) \right) ^{-T_i} \end{aligned}$$

from which

$$\begin{aligned} \frac{\log B(0,T_i)}{T_i} = -\log (1+f_d(0,T_i)) . \end{aligned}$$

Hence

$$\begin{aligned} \rho = \max _i \left( -\log (1+f_d(0,T_i)) \right) = -\min _i \log (1+f_d(0,T_i)). \end{aligned}$$
(23)

Hence calibrating to data \(B(0,T_i)\), \(i=1,\ldots ,n\), can be done by fitting PH or IPH distributions to \(\textrm{e}^{-\rho T_i}B(0,T_i)\) using an EM algorithm. The possible interest rates can either be picked by the EM algorithm (referred to as unrestricted interest rates), or we can fix the possible rates to values (or functions) of our choice (restricted interest rates).

In the former case, we obtain a maximum likelihood estimate \((\hat{\varvec{{\pi }}},\hat{\varvec{{T}}}(x))\) for the parameters. The estimate for \(\varvec{{M}}(x)\) is then readily obtained from

$$\begin{aligned} \hat{\varvec{{M}}}(x) = \hat{\varvec{{T}}}(x) + \varvec{{\Delta }}(\varvec{{t}}(x)) . \end{aligned}$$

To find the induced interest rates, we also have from Theorem 3.4 that

$$\begin{aligned} \hat{\varvec{{T}}}(x) = \hat{\varvec{{M}}}(x) - \varvec{{\Delta }}(\varvec{{r}}(x)) -\rho \varvec{{I}} \end{aligned}$$

so we conclude that the estimated exit rates \(\varvec{{t}}(x)\) must satisfy

$$\begin{aligned} \varvec{{t}}(x) = \varvec{{r}}(x) + \rho \varvec{{e}}, \end{aligned}$$

where \(\varvec{{e}}\) is the vector of ones. Hence the induced interest rates are given by

$$\begin{aligned} \varvec{{r}}(x) = \varvec{{t}}(x) - \rho \varvec{{e}} . \end{aligned}$$

Neither the transition rates nor the interest rates are unique, but the resulting discount factor (bond price) is invariant under different representations, which is all that matters regarding reserving in the insurance context.

If, in turn, we decide to choose the possible range of interest rates \(r_i(x)\) ourselves, then the EM algorithm is modified not to update the exit rates. This modification is easily dealt with by simply removing updates of the latter in the original EM algorithm of [3] or [1]. See Appendix A for details. In this case, the exit rates will be fixed at

$$\begin{aligned} \varvec{{t}}(x) = \varvec{{r}}(x) + \rho \varvec{{e}} \end{aligned}$$

so

$$\begin{aligned} \hat{\varvec{{M}}}(x) = \hat{\varvec{{T}}}(x) + \varvec{{\Delta }}(\varvec{{r}}(x)) + \rho \varvec{{I}} . \end{aligned}$$

While the parametrisation of the transition rates may not be unique, the interest rates remain fixed.

We now present two examples of fitting to real data and one example to a theoretical model. The estimation is computed using the R-package matrixdist.

Example 4.1

(Fitting to observed bond prices with restricted interest rates) Bond prices B(0, T) as of 31/12/2003 (time zero) with maturities \(T=1,2,\ldots ,30\) years are available from the Danish Financial Supervisory Authority. This corresponds to an empirical survival distribution to which we can fit phase-type distributions of different dimensions. Regarding the discretisation, we let \(0.5+i\), \(i=0,\ldots ,29\), denote the data points with probability mass \(B(i)-B(i+1)\), where \(B(0)=1\), and a right censored data point at 30 with probability mass \(B(30)=0.1994495\). Since all observed bond prices are less than one, we have \(\rho = 0\), corresponding to an environment with non-negative interest rates.

We used \(p=2,3,4,5,10\) and 15 phases, with state-wise interest rates being \(r_i^p=i/(10p)\), \(i=1,\ldots ,p\), for the different dimensions p. Underlying this choice is the assumption that the interest rates fluctuate between \(1\%\) and \(10\%\), and the \(r_i\)’s are obtained as the points that divide the interval [0, 0.1] into p, including the right endpoint. The vectors \(\varvec{{r}}^p=(r_1^p,\ldots ,r_p^p)^\prime \) will serve as exit rate vectors of the phase-type distributions to be fitted.

In Fig. 1, we have plotted the phase-type fits against the empirical density, yield and survival curve. At dimension 3, we obtain a decent fit and excellent fits for dimensions 4 and 5. For dimensions above this, the fits are indistinguishable, and we may therefore conclude that dimension 4 or 5 will suffice to approximate the bond prices.

Fig. 1
figure 1

Fitted phase-type densities (left), corresponding yield curves (middle) and bond prices (right) for dimensions \(p=2, 3, 4, 5, 10, 15\) based on bond price data as of 31/12/2003

The estimates of the sub-intensity matrix \(\varvec{{M}}-\varvec{{\Delta }}(\varvec{{r}})\) (under a risk neutral measure \({\mathbb {Q}}\)) for dimensions \(p=3,4,5\) are given by

$$\begin{aligned} {}&{} \left( \begin{array}{ccc} -0.13 &{}{} 0.1 &{}{} 0 \\ 0 &{}{} -0.41 &{}{} 0.34 \\ 0.14 &{}{} 0 &{}{} -0.24 \\ \end{array}\right) \!,\, \left( \begin{array}{cccc} -0.25 &{}{} 0.22 &{}{} 0.01 &{}{} 0 \\ 0.14 &{}{} -1.11 &{}{} 0.75 &{}{} 0.18 \\ 0.06 &{}{} 0.29 &{}{} -0.63 &{}{} 0.2 \\ 0.09 &{}{} 0.22 &{}{} 0.65 &{}{} -1.05 \end{array}\right) \!,\\{}&{} \left( \begin{array}{ccccc} -0.26 &{}{} 0.02 &{}{} 0.06 &{}{} 0.07 &{}{} 0.08 \\ 0.07 &{}{} -1.68 &{}{} 0.69 &{}{} 0.23 &{}{} 0.65 \\ 0.19 &{}{} 0.32 &{}{} -1.99 &{}{} 0.93 &{}{} 0.48 \\ 0.04 &{}{} 0.35 &{}{} 0.27 &{}{} -1.2 &{}{} 0.46 \\ 0.07 &{}{} 0.82 &{}{} 0.07 &{}{} 0.8 &{}{} -1.85 \end{array} \right) \!. \end{aligned}$$

To fit the bond prices, the initial distributions of Markov processes were all of the form \((1,0,\ldots ,0)\) of appropriate dimension, i.e., initiation in state 1. \(\square \)

Example 4.2

(Fitting to 2019 bond prices with unrestricted interest rates) To illustrate the applicability of our methods also in the case of a negative interest rate environment, we can instead fit to bond prices as of 31/12/2019 from the Danish Financial Supervisory Authority; this dataset consists of maturities of \(T=1,2,\ldots ,120\) years. In this case, we let the EM algorithm choose the necessary positive and negative interest rates.

The first five years have bond prices above one and given by 1.00231736, 1.00403337, 1.00445679, 1.00382807, and 1.00197787, which reflects the (slightly) negative interest rate environment at the time. From (23), we get \(\rho = 0.002314677\) as the exponential factor to down-scale prices to below one.

Fig. 2
figure 2

Fitted phase-type densities (left), corresponding yield curves (middle) and bond prices (right) for dimensions \(p=5, 10, 15\) based on bond price data as of 31/12/2019

In Fig. 2, we show the phase-type fits to the bond prices. We have used the subclass of time-homogeneous Coxian distributions, where initiation is always in state 1, and the only possible transitions are from a state, i say, to the following, \(i+1\), or to exit to the absorbing state.

If the primary purpose is using the fits as a discounting factor in a life insurance model, then probably all fits could be used (right plot). If the yield curve fitting is the concern, then only dimensions 10 and 15 seem to catch the appropriate curvature. Regarding the probability density of the phase-type, the 15-dimensional fit is the best. To exemplify, we consider the ten dimensional fit. The fitted intensity matrix, \(\hat{\varvec{{M}}}\), for \(\{ X(u)\}_{u\ge 0}\), is given by

$$\begin{aligned} \left( \begin{array}{rrrrrrrrrr} -0.5212 &{} 0.5212 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} -0.5212 &{} 0.5212 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} -0.5185 &{} 0.5185 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} -0.5161 &{} 0.5161 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} -0.5152 &{} 0.5152 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} -0.4664 &{} 0.4664 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} -0.3099 &{} 0.3099 &{} 0.0000 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} -0.3099 &{} 0.3099 &{} 0.0000 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} -0.3099 &{} 0.3099 \\ 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 &{} 0.0000 \\ \end{array}\right) \!. \end{aligned}$$

The matrix contains six different parameter values. The matrix structure is carried over from the phase-type fit to the (discounted) bond prices. The blocks with the same parameters correspond to Erlang blocks, i.e. convolution of exponential distributions with the same parameter. The induced (estimated) interest rates (in \(\%\)) are, respectively,

$$\begin{aligned} -\rho , -\rho , 0.03468739, 0.28218594, -\rho , 4.64627655,-\rho , -\rho , -\rho , 3.86252219 . \end{aligned}$$

These should also be counted as parameters. \(\square \)

Example 4.3

(Fitting to initial bond prices of a two-factor Vasicek model) In this example we consider the two-factor Vasicek short rate model G2++ (see [9]) with an initial negative interest rate. Here the bond prices as of time zero are given by

$$\begin{aligned} B(0,T) = \exp \left\{ \psi (T)+\frac{1}{2} V^{2}(0, T)\right\} \!\!, \end{aligned}$$

where

$$\begin{aligned} V^{2}(0, T)&= \sum _{i=1}^2 \frac{\sigma _{i}^{2}}{k_{i}^{2}} \left( T-B_{k_{i}}(0, T)-\frac{k_{i}}{2} B_{k_{i}}^{2}(0, T)\right) \\ {}&\quad +\frac{2 \sigma _{1} \sigma _{2} \sigma _{12}}{k_{1} k_{2}} \left( T-B_{k_{1}}(0, T)-B_{k_{2}}(0, T)+B_{k_{1}+k_{2}}(0, T)\right) \!,\\ B_{k}(0, T)&=\frac{1-\text {e}^{-kT}}{k} \ \ \text { and } \ \ \ \psi (T) = \frac{(\theta -r_0) (1+\text {e}^{-k_1 T}) + k_1\theta T}{k_1}. \end{aligned}$$

We chose the same parameters as in [13], Fig. 3, apart from the initial interest rate \(r_0\), which was set to \(-1\%\). Hence the parameters are

$$\begin{aligned}{} & {} r_0{} {} = -0.01, k_1 = 0.401, k_2 = 0.178, \sigma _1 = 0.0378, \sigma _2 =0.0372, \theta = 0.01297, \\ {}{} & {} \sigma _{12}{} {} = -0.996. \end{aligned}$$

We fitted 3, 4 and 5 dimensional time-homogeneous phase-type distributions with a Coxian structure to the discounted bond prices \(\textrm{e}^{-\rho T}B(0,T)\). Here \(\rho = 0.005955398\) and the intensity matrix for \(\varvec{{M}}\) based on 4 phases is given by

$$\begin{aligned} \hat{\varvec{{M}}} = \left( \begin{array}{rrrr} -0.17 &{} 0.17 &{} 0.00 &{} 0.00 \\ 0.00 &{} -0.66 &{} 0.66 &{} 0.00 \\ 0.00 &{} 0.00 &{} -0.61 &{} 0.61 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array}\right) \end{aligned}$$

with corresponding interest rates \( -\rho ,-\rho , 0.078298752 0.006307674\), while for 5 phases, we get

$$\begin{aligned} \hat{\varvec{{{M}}}} = \left( \begin{array}{rrrrr} -0.65 &{}{} 0.65 &{}{} 0.00 &{}{} 0.00 &{}{} 0.00 \\ 0.00 &{}{} -1.79 &{}{} 1.79 &{}{} 0.00 &{}{} 0.00 \\ 0.00 &{}{} 0.00 &{}{} -1.89 &{}{} 1.89 &{}{} 0.00 \\ 0.00 &{}{} 0.00 &{}{} 0.00 &{}{} -0.12 &{}{} 0.12 \\ 0.00 &{}{} 0.00 &{}{} 0.00 &{}{} 0.00 &{}{} 0.00 \\ \end{array}\right) \!. \end{aligned}$$
Fig. 3
figure 3

Fitted phase-type densities (left) and corresponding yield curves (middle) and bond prices (right) for dimensions \(p=3, 4, 5\) based on bond prices from the two-factor Vasicek G2++ model

The corresponding (estimated) interest rates are \(-\rho ,-\rho ,-\rho , 0.012793967, 0.006280658\). A total of six parameters specify the four-dimensional model, while seven parameters determine the five-dimensional. \(\square \)

5 Applications to life insurance

In this section, we incorporate the stochastic interest rate model of the previous sections to life insurance valuations. We consider the model introduced by [24, 25] and extend their results on reserves and higher order moments to so-called partial reserves and higher order moments, that is, corresponding results on events of the terminal state. Partial reserves and moments play important roles when dealing with so-called retrospective reserves in single states (cf. [5, Section 5.E]), which, however is outside the scope of the present paper. We provide this extension following the matrix approach of [8] so that these types of results are extended to allow for stochastic interest rates of the form (1). The extensions of the results of these papers are pointed out in a series of remarks throughout the section.

5.1 A Life insurance model with stochastic interest rates

Let \(X=\{ X(t) \}_{t\ge 0}\) be a time-inhomogeneous Markov jump process with a finite state-space E and intensity matrix \(\varvec{{\Lambda }}(t)=\{ \lambda _{ij}(t) \}_{i,j\in E}\). Then we define a payment process \(\{ B(t) \}_{t\ge 0}\) by

$$\begin{aligned} \textrm{d}B(t) = \sum _{i\in E} 1\{ X(t-)=i\} \bigg (b_{i}(t)\textrm{d}t + \sum _{j\in E} b_{ij}(t)\textrm{d}N_{ij}(t)\bigg ), \end{aligned}$$
(24)

where \(b_i(t)\) are continuous payment rates (negative if premiums) and \(b_{ij}(t)\) lump sum payments, which occur according to the counting measure \(N_{ij}(t)\). The intensity matrix is decomposed into

$$\begin{aligned} \varvec{{\Lambda }}(t) = \varvec{{\Lambda }}^0(t) + \varvec{{\Lambda }}^1(t), \end{aligned}$$
(25)

where \(\varvec{{\Lambda }}^1(t)\) is a non-negative matrix and, consequently, \(\varvec{{\Lambda }}^0(t)\) a sub-intensity matrix, i.e. row sums are non-positive. The counting process is linked to the transitions of X in the following way. Upon transition from i to j, \(i\ne j\), in X at time t, a lump sum payment of \(b_{ij}(t)\) will be triggered with probability

$$\begin{aligned} \frac{\lambda ^1_{ij}(t)}{\lambda ^0_{ij}(t)+\lambda ^1_{ij}(t)} . \end{aligned}$$
(26)

If \(i=j\), then \(N_{ii}(t)\) denotes an inhomogeneous Poisson process with intensity \(\lambda _{ii}(t)\), and a lump sum during a sojourn in state i will then be triggered in \([t,t+\textrm{d}t)\) with probability \(\lambda ^1_{ii}(t)\textrm{d}t\).

Finally, we assume that the spot interest rates in state i follow a deterministic function \(r_i(t)\). Hence the interest rates follow the model (1).

Remark 5.1

The classic Markov chain life insurance setting of, e.g., [16, 23], is recovered if \(r_i(t)\equiv r(t)\), \(b_{ii}(t)=0\), and if the probabilities (26) are either zero or one. Extending the classic setting to allow for different interest rates in the different states was considered in [24, 25], where Thiele type of differential equations for the reserves and higher order moments were derived. \(\square \)

For the purpose of computing reserves and higher order moments, [8, (3.8)–(3.11)], we let \(\varvec{{b}}(t)=(b_i(t))_{i\in E}\) denote the vector containing the continuous rates, and define matrices

$$\begin{aligned} \varvec{{B}}(t)&= \left\{ b_{ij}(t)\right\} _{i,j\in E}, \end{aligned}$$
(27)
$$\begin{aligned} \varvec{{R}}(t)&= \varvec{{\Lambda }}^1(t)\bullet \varvec{{B}}(t) +\varvec{{\Delta }}(\varvec{{b}}(t)), \end{aligned}$$
(28)
$$\begin{aligned} \varvec{{C}}^{(k)}(t)&= \varvec{{\Lambda }}^1(t)\bullet \varvec{{B}}^{\bullet k}(t), \quad k\ge 2, \end{aligned}$$
(29)

where \(\varvec{{\Delta }}(\varvec{{b}}(t))\) denotes the diagonal matrix with \(\varvec{{b}}(t)\) as diagonal. The operator \(\bullet \) denotes Schur (entrywise) matrix product, defined by \(\varvec{{A}}\bullet \varvec{{B}} = \{ a_{ij}b_{ij} \}\) for matrices \(\varvec{{A}}=\{a_{ij}\}\) and \(\varvec{{B}}=\{ b_{ij} \}\).

Hence \(\varvec{{B}}(t)\) is the matrix containing the lump payments at transitions and at Poisson arrivals during sojourns, \(\varvec{{R}}(t)\) is the matrix whose ij’th element is the expected reward accumulated during \([t,t+\textrm{d}t)\) upon transition from i to j, or during a sojourn in state i if \(i=j\). The \(\varvec{{C}}^{(k)}(t)\) matrix is more technical to be used when dealing with higher order moments.

Finally, we let

$$\begin{aligned} \varvec{{{r}}}(t) = (r_{i}(t))_{i\in E} \end{aligned}$$

denote the vector of interest rates.

Now assume that the interest rate process is modelled and fitted using bond prices like in Sect. 3. Accordingly there is a Markov jump process \(X_r=\{ X_r(t)\}_{t\ge 0}\) with state-space \(E_r=\{1,2,\ldots ,p\}\) and intensity matrix \(\varvec{{{\Lambda }}}_r(t)=\{ \lambda _{ij}^r(t)\}_{i,j\in E_r}\), say, such that the corresponding bond prices B(tT) are given as in Theorem 3.4. Similarly, we let \(X_b=\{ X_b(t)\}_{t\ge 0}\) denote the Markov jump process governing the transition between the biometric states with the state-space \(E_b=\{1,2,\ldots ,q\}\) and intensity matrix \(\varvec{{{\Lambda }}}_b(t)=\{ \lambda _{ij}^b(t)\}_{i,j\in E_b}\). Hence the Markov jump process appearing in (24) can be written on the form

$$\begin{aligned} X(t)=(X_b(t), X_r(t)) \end{aligned}$$
(30)

with state-space \(E=E_b\times E_r\). The product space will be ordered lexicographically, which means that state (ij) is identified with state \(k(i,j)=i(p-1)+j\), \(j=1,\ldots ,p\), \(i=1,\ldots ,q\), of the state-space \(\{1,2,\ldots ,pq \}\) and \((i,j)<(\tilde{i},\tilde{j})\) iff \(k(i,j)<k(\tilde{i},\tilde{j})\) (fig. 4).

Fig. 4
figure 4

Lexicographical ordering: for each biometric state (blue), several sub-states (orange) define the underlying interest rate level

The processes \(X_b\) and \(X_r\) may or may not be independent, and the payment processes (24) likewise may or may not be independent of \(X_r\). In the independent case the processes \(X_b\) and \(X_r\) are defined on each their state-space, and the common state-space will be the product set of the two. If the processes are sharing states, with the possibility of having simultaneous jumps, then we obtain dependency of the processes. Such a case could, e.g., be a rise in the interest rate causing an increased intensity of jumping to surrender or free-policy states (see, e.g., [10]).

In the following example, we consider the simplifications in the representations when assuming independence.

Example 5.1

(Independence) If the processes \(X_b\) and \(X_r\) are independent, then the transition intensity matrix of X is of the form (see e.g. [7], p. 56)

$$\begin{aligned} \varvec{{\Lambda }}(t) = \varvec{{\Lambda }}_b(t) \oplus \varvec{{\Lambda }}_r(t) =\varvec{{\Lambda }}_b(t)\otimes \varvec{{I}}_p + \varvec{{I}}_q \otimes \varvec{{\Lambda }}_r(t) , \end{aligned}$$

where \(\oplus \) denotes the Kronecker sum, and where \(\varvec{{I}}_n\) denotes the identity matrix of dimension \(n\times n\). We recall that the Kronecker product, \(\otimes \), is defined by \(\varvec{{A}}\otimes \varvec{{B}} = \{ a_{ij}\varvec{{B}} \}\), where \(\varvec{{A}}=\{ a_{ij}\}\). The interest rates are not influenced by the biometric states so

$$\begin{aligned} \varvec{{r}}(t) = \varvec{{e}}\otimes (r_1(t),\ldots ,r_p(t)), \end{aligned}$$

where \(\varvec{{e}}=(1,1,\ldots ,1)^\prime \). If the payment process (24) is also independent of interest rate process \(X_r\), then

$$\begin{aligned} \varvec{{B}}(t) = \varvec{{B}}^b(t) \otimes \varvec{{I}}, \, \ \ \ \varvec{{b}}(t) = \varvec{{b}}^b(t)\otimes \varvec{{e}}, \end{aligned}$$

for some

$$\begin{aligned} \varvec{{b}}^b(t) = \left( b_1^b(t),\ldots ,b_q^b(t)\right) ' \qquad \textrm{and}\qquad \varvec{{B}}(t) =\left\{ b_{ij}^b(t)\right\} _{i,j\in E_b}. \qquad \qquad \qquad \square \end{aligned}$$

5.2 Reserves

We now consider the valuation of the payment process B. Introduce the matrix of partial state-wise prospective reserves,

Due to the stochastic interest rates, this is an extension of [8]. With \(\varvec{{D}}(s,t)\), introduced in (15), modified to the setup of this section as

$$\begin{aligned} \varvec{{{D}}}(s,t) = \prod _s^t \left( \varvec{{{I}}} + \left[ \varvec{{{\Lambda }}}(u) -\varvec{{{\Delta }}}(\varvec{{{r}}}(u)) \right] \text {d}u \right) \!, \end{aligned}$$

we have the following result.

Theorem 5.2

The matrix of partial state-wise prospective reserves \(\varvec{{V}}(s,t)\) has the following integral representation:

$$\begin{aligned} \varvec{{V}}(s,t)=\int _s^t \varvec{{D}}(s,x)\varvec{{R}}(x)\varvec{{P}}(x,t)\textrm{d}x, \end{aligned}$$
(31)

where \(\varvec{{R}}(x)\) is the reward matrix (28) and \(\varvec{{P}}(x,t)\) denotes the transition probability matrix of the Markov process \(\{ X(t)\}_{t\ge 0}\).

For a formal proof, see Appendix B. The intuition behind, however, is pretty straighforward. The matrix \(\varvec{{D}}(s,x)\) provides the transition probabilities penalized by the interest rates. Thus the integral is simply a weighted sum of the instantaneous expected rewards, \(\varvec{{R}}(x)\), the weights being the penalized transition probabilities times a probability for terminating at the correct state at time t.

The actual computation of the reserves can be effectively executed using the following Van-Loan type of formula, which avoids integration.

Corollary 5.3

\(\varvec{{V}}(s,t)\) can be extracted from the relation

$$\begin{aligned} \prod _s^t \left( \varvec{{{I}}} + \begin{pmatrix} \varvec{{{\Lambda }}}(u) -\varvec{{{\Delta }}}(\varvec{{{r}}}(u)) &{}{} \varvec{{{R}}}(u) \\ \varvec{{{0}}} &{}{} \varvec{{{\Lambda }}}(u) \end{pmatrix} \text {d}u\right) = \begin{pmatrix} \varvec{{{D}}}(s,t) &{}{} \varvec{{{V}}}(s,t) \\ \varvec{{{0}}} &{}{} \varvec{{{P}}}(s,t) \end{pmatrix}\!. \end{aligned}$$

Finally, we state and prove Thiele’s differential equations for partial reserves with stochastic interest rates.

Theorem 5.4

(Thiele)

$$\begin{aligned} \frac{\partial }{\partial s}\varvec{{{V}}}(s,t) =-\left[ \varvec{{{\Lambda }}}(s) - \varvec{{{\Delta }}} (\varvec{{{r}}}(s)) \right] \! \varvec{{{V}}}(s,t) - \varvec{{{R}}}(s)\varvec{{{P}}}(s,t), \end{aligned}$$

where \(\varvec{{V}}(t,t)=\varvec{{0}}\). For the conventional state-wise prospective reserves, \(\varvec{{V}}^{Th}(t)=\varvec{{V}}(t,T)\varvec{{e}}\), this has the form

$$\begin{aligned} \frac{\partial }{\partial t} \varvec{{V}}^{Th}(t) = \varvec{{\Delta }} (\varvec{{r}}(t))\varvec{{V}}^{Th}(t) - \varvec{{\Lambda }}(t)\varvec{{V}}^{Th} (t) - \varvec{{R}}(t)\varvec{{e}} , \end{aligned}$$

where \(\varvec{{V}}^{Th}(T) = \varvec{{0}}\).

Proof

See Appendix B. \(\square \)

Remark 5.5

Writing out the elements of the differential equation for \(\varvec{{V}}^{Th}\), we get for \(i\in E\),

$$\begin{aligned} \frac{\partial }{\partial t} V^{Th}_{i}(t)&= r_{i}(t)V^{Th}_{i}(t) - b_{i}(t) - \sum _{j\in E} \lambda _{ij}(t)\! \left( b_{ij}(t) + V^{Th}_{j}(t) - V^{Th}_{i}(t)\right) \!,\\ V_{i}^{Th}(T)&= 0, \end{aligned}$$

which is the differential equation obtained in [24, (3.2)] in the case of a first-order moment. \(\square \)

5.3 Equivalence premium

Assume that \(\varvec{{R}}(t)=\varvec{{R}}(t;\theta )\) such that \(\theta \) is a parameter of either \(\varvec{{B}}(t)\) and/or \(\varvec{{\Delta }}(\varvec{{b}}(t))\) only. Hence, \(\theta \) could, e.g., be a premium rate in state 1 or a transition payment between some states. We then write \(\varvec{{V}}(t)=\varvec{{V}}(t;\theta )\) so that

$$\begin{aligned} \varvec{V}(t;\theta )=\int _{t}^{T} \prod _{t}^{u} \left( \varvec{I}+[\varvec{\Lambda }(s) -\varvec{{\Delta }}(\varvec{{r}}(s))] \textrm{d} s\right) \varvec{R}(u;\theta ) \prod _{u}^{T}(\varvec{I} +\varvec{\Lambda }(s) \textrm{d} s) \textrm{d} u. \end{aligned}$$

If the interest rates satisfy \(\varvec{{\Delta }}(\varvec{{r}}(s))\ge \varvec{{0}}\), then \(\varvec{\Lambda }(s)-\varvec{{\Delta }}(\varvec{{r}}(s))\) is a sub-intensity matrix, so that \(\prod _{t}^{u}(\varvec{I} +[\varvec{\Lambda }(s)-\varvec{{\Delta }}(\varvec{{r}}(s))] \textrm{d} s) \) is a sub-probability matrix, i.e.

$$\begin{aligned} 0\le \prod _{t}^{u}\left( \varvec{I}+[\varvec{\Lambda }(s) -\varvec{{\Delta }}(\varvec{{r}}(s))] \textrm{d} s\right) \varvec{{e}} \le \varvec{{e}}. \end{aligned}$$

If \(\varvec{{R}}(\cdot ;\theta )\) is continuously differentiable and \(\varvec{{\Lambda }}\) and \(\varvec{{r}}\) are continuous, then by Leibniz’ integral rule

$$\begin{aligned} \frac{\partial }{\partial \theta } \varvec{{V}}(t;\theta ) =\int _{t}^{T} \prod _{t}^{u}\left( \varvec{I} +[\varvec{\Lambda }(s)-\varvec{{\Delta }}(\varvec{{r}}(s))] \textrm{d} s\right) \frac{\partial }{\partial \theta } \varvec{R}(u;\theta ) \prod _{u}^{T}(\varvec{I} +\varvec{\Lambda }(s) \textrm{d} s) \textrm{d} u . \end{aligned}$$

Hence we get from the Van Loan formula [8, Lemma 2]

$$\begin{aligned} \prod _t^T \left( \varvec{{{I}}} +\begin{pmatrix} \varvec{{{\Lambda }}}(u)-\varvec{{{\Delta }}}(\varvec{{{r}}}(u)) &{}{} \frac{\partial }{\partial \theta }\varvec{{{R}}}(u;\theta ) \\ \varvec{{{0}}} &{}{} \varvec{{{\Lambda }}}(u) \end{pmatrix} \text {d}u \right) = \begin{pmatrix} \varvec{{{D}}}(t,T) &{}{} \frac{\partial }{\partial \theta }\varvec{{{V}}}(t;\theta ) \\ \varvec{{{0}}} &{}{} \varvec{{{P}}}(t,T) \end{pmatrix}\!. \end{aligned}$$
(32)

Remark 5.6

Similar kinds of derivatives as those of (32) are considered in [17], where differential equations for reserves concerning valuation elements and payments are derived. The formulas presented here may thus be seen as corresponding matrix representations. \(\square \)

If state \(i\in E\) is the starting state, we can formulate the equivalence principle by finding the \(\theta \) that solves

$$\begin{aligned} V_i^{Th}(0;\theta )=\varvec{{e}}_i^\prime \varvec{{V}}(0;\theta )\varvec{{e}} = 0 \end{aligned}$$

using Newton’s method:

$$\begin{aligned} \theta _{n+1}=\theta _{n} - \frac{ \varvec{{e}}_i^\prime \varvec{{V}}(0;\theta )\varvec{{e}}}{\varvec{{e}}_i^\prime \varvec{{V}}_{\theta }(0;\theta )\varvec{{e}}} , \end{aligned}$$

where \(\varvec{{V}}_{\theta }\) denotes the partial derivative wrt. \(\theta \). For example, if \(\theta \) is a constant premium (rate) such that

$$\begin{aligned} \varvec{{R}}_\theta (t;\theta ) = \varvec{{A}}(t) , \end{aligned}$$

i.e. a matrix function not depending on \(\theta \), then \(\varvec{{V}}_{\theta }(t;\theta ) = \varvec{{V}}_{\theta }(t)\) will not depend on \(\theta \) either, so we conclude that the map \(\theta \mapsto V^{Th}_i(t;\theta )\) is linear (for fixed t), so that in particular

$$\begin{aligned} V^{Th}_i(0;\theta ) = a\theta + b \end{aligned}$$

for some constants ab. Then b can be computed from \(b=V^{Th}_i(0;0) = \varvec{{e}}_i^\prime \varvec{{V}}(0;0)\varvec{{e}}\) and \(a=\varvec{{e}}_i^\prime \varvec{{V}}_{\theta }(0;0)\varvec{{e}}\). Hence, Newton’s method converges in one iteration, and the \(\theta \) which fulfils the equivalence principle is given by

$$\begin{aligned} \theta = -\frac{ \varvec{{e}}_i^\prime \varvec{{V}}(0;0) \varvec{{e}}}{\varvec{{e}}_i^\prime \varvec{{V}}_{\theta }(0;0)\varvec{{e}}}. \end{aligned}$$
(33)

Hence, this formula can compute the equivalence premium if it is assumed to be (piecewise) constant over time, which is often the case in practical examples. However, the formulation in terms of derivatives is usually not seen, with [17, (3.5)] being one of few exceptions. If the constancy assumption is not satisfied, a parametrised expression in terms of \(\theta \) can be calculated by Newton’s method.

5.4 Higher order moments

In this section, we consider the computation of higher order moments. While a few lower moments could be of direct interest in actuarial practice, higher moments are of theoretical importance in the construction of distribution of the benefits in Sect. 5.5. Given a theoretical model, the moments can be computed to a desired accuracy within the usual limits of numerical precision and computational capacity.

Moments obtained from fitted models are less reliable and should be treated with more care. In particular, higher order moments are notoriously difficult to estimate and usually require many data points, in particular from the tail area. For this reason maximum likelihood estimated models based on relatively few data points may not result in adequate estimates of the moments.

Consider the matrix of partial state-wise higher order moments of future payments, given by, for \(k\in {\mathbb {N}}\) (see [8, (3.6)–(3.7)]),

$$\begin{aligned} \varvec{{{V}}}^{(k)}(t,T)&=\left\{ V^{(k)}_{ij}(t,T) \right\} _{i,j\in E}, \\ V^{(k)}_{ij}(t,T)&= {\mathbb {E}}\!\left. \left( 1\{ X(T)=j\} \left( \int _t^T \text {e}^{-\int _t^x r_{X(u)}(u)\text {d}u} \,\text {d}B(x)\right) ^{\!k} \, \right| X(t)=i \right) \!, \end{aligned}$$

and introduce what we shall term the reduced partial state-wise higher order moments:

$$\begin{aligned} \varvec{{V}}^{(k)}_r(t,T) = \frac{\varvec{{V}}^{(k)}(t,T)}{k!}. \end{aligned}$$

Since all payment functions and transition rates are deterministic, results for these higher-order moments are now straightforward to obtain by using the undiscounted result,

$$\begin{aligned} \varvec{{m}}^{(k)}_r(t,T)= & {} \int _t^T \varvec{{P}}(t,x)\varvec{{R}}(x) \varvec{{m}}_r^{(k-1)}(x,T)\,\textrm{d}x \\{} & {} + \sum _{m=2}^k \int _t^T \varvec{{P}}(t,x) \varvec{{C}}_r^{(m)}(x) \varvec{{m}}_r^{(k-m)}(x,T)\,\textrm{d}x , \end{aligned}$$

where \(\varvec{{m}}^{(k)}_r(t,T)\), \(k\in {\mathbb {N}}\), contains the partial state-wise k’th moment, normalised by k!, of the undiscounted future payments (see [8, (7.4)]), i.e. \(\varvec{{{V}}}^{(k)}_r(t,T)\) with no interest rate. Indeed, rates \(b_{i}(t)\) and lump sums \(b_{ij}(t)\) must be replaced by the discounted versions with discounting factor, \(\exp (-\int _s^t r_{X(u)}(u)\textrm{d}u)\) (for fixed \(s\le t\)). Powers of lumps sums like \(b_{ij}(t)^m\), \(m\in {\mathbb {N}}\), are discounted by \(\exp (-m\int _s^t r_{X(u)}(u)\textrm{d}u)\). Denoting

$$\begin{aligned} \varvec{{D}}^{(m)}(s,t) = \prod _s^t (\varvec{{I}} +\left[ \varvec{{\Lambda }}(u) - m\varvec{{\Delta }}(\varvec{{r}}(u)) \right] \textrm{d}u ), \quad m\in {\mathbb {N}}, \end{aligned}$$

we then obtain the following version of Hattendorff’s theorem for partial reserves with stochastic interest rate.

Theorem 5.7

The matrix of reduced partial state-wise higher order moments satisfies the integral equation, for \(k\in {\mathbb {N}}\),

$$\begin{aligned} \varvec{{V}}^{(k)}_r(t,T)= & {} \int _t^T \varvec{{D}}^{(k)}(t,x) \varvec{{R}}(x)\varvec{{V}}_r^{(k-1)}(x,T)\textrm{d}x \\{} & {} +\sum _{m=2}^k \int _t^T \varvec{{D}}^{(k)}(t,x) \varvec{{C}}_r^{(m)}(x) \varvec{{V}}_r^{(k-m)}(x,T)\textrm{d}x . \end{aligned}$$

Proof

See Appendix B. \(\square \)

Defining

$$\begin{aligned} \varvec{{F}}_U^{(k)}(x)=\left( \begin{array}{ccccccc} \varvec{{\Lambda }}(x)-k\varvec{{\Delta }}(\varvec{{r}}(x)) &{} \varvec{{R}}(x) &{} \varvec{{C}}_r^{(2)}(x) &{} \cdots &{} \varvec{{C}}_r^{(k-1)}(x) &{} \varvec{{C}}_r^{(k)}(x) \\ \varvec{{0}} &{} \varvec{{\Lambda }}(x)-(k-1)\varvec{{\Delta }}(\varvec{{r}}(x)) &{} \varvec{{R}}(x) &{} \cdots &{} \varvec{{C}}_r^{(k-2)}(x) &{}\varvec{{C}}_r^{(k-1)}(x) \\ \vdots &{} \vdots &{} \vdots &{} \vdots \vdots \vdots &{} \vdots &{}\vdots \\ \varvec{{0}} &{}\varvec{{0}} &{}\varvec{{0}} &{} \cdots &{} \varvec{{\Lambda }}(x)-\varvec{{\Delta }}(\varvec{{r}}(x)) &{}\varvec{{R}}(x) \\ \varvec{{0}} &{}\varvec{{0}} &{}\varvec{{0}} &{} \cdots &{}\varvec{{0}} &{} \varvec{{\Lambda }}(x) \end{array}\right) \end{aligned}$$

we get by Van Loan that

$$\begin{aligned} \prod _t^T (\varvec{{{I}}} + \varvec{{{F}}}_U^{(k)}(x)\, \text {d}x) =\left( \begin{array}{lllllll} * &{}{} * &{}{} * &{}{} * &{}{} \cdots &{}{}* &{}{} \varvec{{{V}}}_r^{(k)}(t) \\ * &{}{} * &{}{} * &{}{} * &{}{} \cdots &{}{}* &{}{} \varvec{{{V}}}_r^{(k-1)}(t) \\ * &{}{} * &{}{} * &{}{} * &{}{} \cdots &{}{}* &{}{} \varvec{{{V}}}_r^{(k-2)}(t) \\ \vdots &{}{} \vdots &{}{} \vdots &{}{}\vdots &{}{} \vdots \vdots \vdots &{}{} \vdots &{}\qquad {} \vdots \\ * &{}{} * &{}{} * &{}{} * &{}{} \cdots &{}{}* &{}{} \varvec{{{V}}}_r^{(1)}(t) \\ * &{}{} * &{}{} * &{}{} * &{}{} \cdots &{}{}* &{}{} \varvec{{{P}}}(t,T) \\ \end{array} \right) \!. \end{aligned}$$
(34)

Differentiation of (34) then gives the following slight extension of a classical result.

Theorem 5.8

The matrix of reduced partial state-wise higher order moments satisfies the system of differential equations, for \(k\in {\mathbb {N}}_0\),

$$\begin{aligned} \frac{\partial }{\partial t}\varvec{{{V}}}_r^{(k)}(t) =\left( k\varvec{{{\Delta }}}(\varvec{{{r}}}(t)) -\varvec{{{\Lambda }}}(t) \right) \! \varvec{{{V}}}_r^{(k)}(t) - \varvec{{{R}}}(t)\varvec{{{V}}}_r^{(k-1)}(t) - \sum _{i=2}^k \varvec{{{C}}}_r^{(i)}(t)\varvec{{{V}}}_r^{(k-i)}(t) , \end{aligned}$$

with terminal condition \(\varvec{{V}}_r^{(k)}(T)=1_{(k=0)}\varvec{{I}}\).

Remark 5.9

A martingale-based proof for the corresponding (unreduced) state-wise moments, \(k!\varvec{{V}}^{(k)}_r(t)\varvec{{e}}\), can be found in [24]. \(\square \)

Example 5.2

(Independence continued) We can continue our decompositions from the independence case of Example 5.1 to reserves and higher-order moments. Indeed, since

$$\begin{aligned} \varvec{{\Lambda }}_b(u) \oplus \varvec{{\Lambda }}_r(u) - k\varvec{{\Delta }} (\varvec{{e}}\otimes \varvec{{r}}(u))= & {} \varvec{{\Lambda }}_b(u)\otimes \varvec{{I}} + \varvec{{I}}\otimes (\varvec{{\Lambda }}_r(u) - k\varvec{{\Delta }}( \varvec{{r}}(u)) )\\= & {} \varvec{{\Lambda }}_b(u) \oplus (\varvec{{\Lambda }}_r(u) - k\varvec{{\Delta }} (\varvec{{r}}(u)), \end{aligned}$$

we get from (10) that

$$\begin{aligned}{} & {} \prod _s^t \left( \varvec{{I}} + \left( \varvec{{\Lambda }}_b(u) \oplus \varvec{{\Lambda }}_r(u) - k\varvec{{\Delta }}(\varvec{{e}}\otimes \varvec{{r}}(u)) \textrm{d}u \right) \right) \\{} & {} \quad = \prod _s^t (\varvec{{I}}+\varvec{{\Lambda }}_b(u)\textrm{d}u) \otimes \prod _s^t (\varvec{{I}}+ (\varvec{{\Lambda }}_r(u) - k\varvec{{\Delta }}( \varvec{{r}}(u))\textrm{d}u) \\{} & {} \quad = \prod _s^t (\varvec{{I}}+\varvec{{\Lambda }}_b(u)\textrm{d}u) \otimes \varvec{{D}}^{(k)}(s,t) \end{aligned}$$

Thus, each diagonal block element can be computed using these representations when setting up the matrix \(\varvec{{F}}_U\) for the computation of these higher order moments.

In particular, for partial state-wise reserves (i.e. \(k=1\)), we obtain a more direct expression. Assuming that the initial biometric state is \(i\in E_b\), the terminal \(j\in E_b\) and that the initial distribution of the fitted interest rate phase-type distribution is \(\varvec{{\pi }}\), then

$$\begin{aligned} V_{ij}(t,T)= & {} {} (\varvec{{{e}}}_i^\prime \otimes \varvec{{{\pi }}}) \int _t^T \left( \displaystyle \prod \limits _t^x (\varvec{{{I}}}+\varvec{{{\Lambda }}}_b(u)\text {d}u) \otimes \varvec{{{D}}}(t,x) \right) \left( \varvec{{{R}}}(x)\otimes \varvec{{{I}}} \right) \\{}{} & {} {} \times \left( \displaystyle \prod \limits _x^T (\varvec{{{I}}}+\varvec{{{\Lambda }}}_b(u)\text {d}u) \otimes \displaystyle \prod \limits _x^T (\varvec{{{I}}}+\varvec{{{\Lambda }}}_r(u)\text {d}u) \right) \text {d}x\, (\varvec{{{e}}}_j\otimes \varvec{{{e}}}) \\ {}= & {} {} \int _t^T \varvec{{{\pi }}}\varvec{{{D}}}(t,x)\varvec{{{e}}} \varvec{{{e}}}_i^\prime \varvec{{{P}}}_b(t,x)\varvec{{{R}}}(x)\varvec{{{P}}}_b(x,T)\varvec{{{e}}}_j\text {d}x \\ {}= & {} {} \int _t^T{\mathbb {E}}^{{\mathbb {Q}}}\left( \left. \text {e}^{-\int _t^x r_{X_r(u)}(u)\text {d}u } \right| {{\mathcal {F}}}(t) \right) \varvec{{{e}}}_i^\prime \varvec{{{P}}}_b(t,x)\varvec{{{R}}}(x)\varvec{{{P}}}_b(x,T) \varvec{{{e}}}_j\text {d}x, \end{aligned}$$

which is consistent with similar expressions obtained in [25]. \(\square \)

5.5 Distributions of future payments based on reduced moments

In this section, we briefly comment on the implementation of the Gram–Charlier series for the density and distribution functions based on reduced moments, following along the lines of [8]; for an approach based on PDEs and integral equations (though not implemented numerically), we refer to [27, Section 5].

The goal is to approximate the distribution of

$$\begin{aligned} X = \int _0^T \text {e}^{-\int _0^x r_{X(u)}(u)\text {d}u} \,\text {d}B(x) \end{aligned}$$

using a Gram–Charlier series expansion. In [8], it was shown that under suitable regularity conditions, the density f for X can be approximated by

$$\begin{aligned} f(x) \approx f^*(x)\sum _{n=0}^N c_n p_n(x) , \end{aligned}$$

where \(f^*\) is a reference density, \(p_n(x)\) an orthonormal basis of polynomials for the Hilbert space \(L^2(f^*)\), and \(c_n = {\mathbb {E}}(p_n (X))\). The reference distribution \(f^*\) can be chosen arbitrarily as long as \(f/f^*\in L^2(f^*)\). Hence it is advisable to choose \(f^*\) as close to f as possible.

For a given reference density \(f^*\), the polynomials

$$\begin{aligned} q_n(x) =\left| \begin{array}{cccc} a_0 &{} \cdots &{} a_{n-1} &{} 1 \\ a_1 &{} \cdots &{} a_n &{} x \\ &{} &{} \ddots &{} \\ a_n &{} \cdots &{} a_{2n-1} &{} x^n \end{array}\right| , \end{aligned}$$

where

$$\begin{aligned} a_n = \int _a^b x^n f^*(x)\textrm{d}x , \ \ \ n=0,1,\ldots \end{aligned}$$

defines an orthogonal basis for the Hilbert space \(L^2(f^*)\) with inner product

$$\begin{aligned} \langle g,h \rangle = \int _a^b g(x)h(x)f^*(x)\textrm{d}x . \end{aligned}$$

With the Hankel determinants

$$\begin{aligned} A_{-1}=1, \ \ \ \ \ A_n = \left| \begin{array}{cccc} a_0 &{}{} \cdots &{}{} a_{n-1} &{}{} a_n \\ a_1 &{}{} \cdots &{}{} a_n &{}{} a_{n+1} \\ {} &{}{} &{}{} \ddots &{}{} \\ a_n &{}{} \cdots &{}{} a_{2n-1} &{}{} a_{2n} \end{array}\right| ,\quad n=0,1,\ldots . \end{aligned}$$

it can then be shown that

$$\begin{aligned} p_n(x) = \frac{q_n(x)}{\sqrt{A_{n-1} A_{n}}}, \ \ n=0,1,\ldots \end{aligned}$$

is an orthonormal basis (ONB) in \(L^2(f^*)\). Also, it is immediate that

$$\begin{aligned} c_n={\mathbb {E}}(p_n(X)) = \frac{1}{\sqrt{A_{n-1}A_n}} \left| \begin{array}{cccc} a_0 &{} \cdots &{} a_{n-1} &{} 1 \\ a_1 &{} \cdots &{} a_n &{} {\mathbb {E}}(x) \\ &{} &{} \ddots &{} \\ a_n &{} \cdots &{} a_{2n-1} &{} {\mathbb {E}}(x^n) \end{array}\right| . \end{aligned}$$

If \(f^*\) is chosen to be the standard normal distribution, the corresponding polynomials \(p_n\) are the (probabilists) Hermite polynomials. While the Hermite polynomials were used in [8] up to very high orders, their use in the following example fails already at low orders. This is likely caused by the tail of the normal distribution being too light. We propose a class of reference distributions based on a shifted beta distribution closely related to the Jacobi polynomials as an alternative. This distribution will have finite support but a much heavier tail. Finite support is usually not a problem in a life insurance context.

Define a reference distribution \(f^*\) with support on a finite interval [ab] by

$$\begin{aligned} f^*(x){}&{} =\frac{\Gamma (\alpha +\beta +2)}{\Gamma (\alpha +1)\Gamma (\beta +1)} (b-a)^{-\alpha -\beta -1} (b-x)^\alpha (x-a)^\beta , \ \ x \in [a,b],\\{}&{} \alpha , \beta >-1. \end{aligned}$$

Thus we need to find an orthonormal basis for \(L^2(f^*)\). The starting point is the weight function

$$\begin{aligned} w^{\alpha ,\beta }(x) = (1-x)^\alpha (1+x)^\beta . \end{aligned}$$

The space \(L^2(w)\) has an orthogonal basis of Jacobi polynomials given by

$$\begin{aligned} q_{n}^{(\alpha , \beta )}(x)=\frac{(\alpha +1)_{n}}{n !} \sum _{k=0}^{n} \frac{(\alpha +\beta +1+n)_{k}(-n)_{k}}{(\alpha +1)_{k} k !} \left( \frac{1-x}{2}\right) ^{k} , \end{aligned}$$

where \((a)_n=a(a+1)\cdots (a+n-1)\) denotes the Pochammer symbol.

By normalizing the weight function into a density on \([-1,1]\) and then transforming it into a density on [ab], we obtain an ONB for \(f^*\) of polynomials given by

$$\begin{aligned} p_n^{\alpha ,\beta }(x)=\sqrt{\frac{n !(2 n+\alpha +\beta +1) (\alpha +\beta +1)_{n}}{ (\alpha +1)_n (\beta +1)_n (\alpha +\beta +1) } } q_{n}^{(\alpha , \beta )}\! \left( \frac{2x -a-b}{b-a} \right) \!. \end{aligned}$$

So for given ab, we need to compute

$$\begin{aligned} c_n={\mathbb {E}}\left( p_{n}^{(\alpha , \beta )}(X) \right) =\sqrt{\frac{n !(2 n{+}\alpha {+}\beta {+}1) (\alpha {+}\beta {+}1)_{n}}{(\alpha {+}1)_n (\beta {+}1)_n (\alpha {+}\beta {+}1) } } {\mathbb {E}}\left( q_{n}^{(\alpha , \beta )}\!\left( \frac{2X{-}a{-}b}{b{-}a}\right) \right) \!. \end{aligned}$$

Here

$$\begin{aligned} {\mathbb {E}}\left( q_{n}^{(\alpha , \beta )} \left( \frac{2X{-}a{-}b}{b{-}a}\right) \right) {=} \frac{(\alpha {+}1)_{n}}{n!} \sum _{k{=}0}^{n} \frac{(\alpha {+}\beta {+}1{+}n)_{k}({-}n)_{k}}{(\alpha {+}1)_{k}} \frac{1}{k!} {\mathbb {E}}\left( \left( \frac{1{-} (2X{-}a{-}b)/(b{-}a)}{2} \right) ^{\!k} \right) \!, \end{aligned}$$

where the inner expectation is computed as

$$\begin{aligned} \frac{1}{k!}{\mathbb {E}}\left( \left( \frac{1- (2X-a-b)/(b-a) }{2} \right) ^{\!k} \right) =\frac{1}{(b-a)^k}\sum _{i=0}^k (-1)^i \frac{b^{k-i}}{(k-i)!} \frac{{\mathbb {E}}(X^i)}{i!} . \end{aligned}$$

Finally, the approximation is then given by

$$\begin{aligned} f(x)\approx f^*(x) \sum _{n=0}^N c_n p_{n}^{(\alpha , \beta )}(x) . \end{aligned}$$
(35)

Concerning the corresponding distribution function, we integrate the above equation to obtain

$$\begin{aligned} F(y)\approx & {} {} F^*(y) -\frac{b-a}{4} \left( 1 - \left( \frac{2y-a-b}{b-a} \right) ^2 \right) \! f^*(y)\\{}{} & {} {} \times \sum _{n=1}^Nc_n\sqrt{\frac{1}{n} \,\frac{(2+\alpha +\beta )(\alpha +\beta +3)}{(1+\alpha ) (1+\beta )(\alpha +\beta +n+1)(\alpha +\beta +n+2)}}\\{}{} & {} {} p_{n-1}^{(\alpha +1, \beta +1)}\left( \frac{2y-a-b}{b-a}\right) \!. \end{aligned}$$

Hence, these formulas can be used to approximate the density and distribution via these Jacobi types of polynomials.

6 Numerical Example

We now present a numerical example based on Example 5.1, where interest rates and biometric risk are assumed independent, where we carry over the estimation of interest transition rates from the calibrated bond prices of Sect. 3.

Consider the numerical example of [11] as the model for the biometric risk and corresponding life insurance contract. That is, the states of the insured \(X_b\) is modelled as a time-inhomogeneous Markov jump process taking values on \(E_b=\{1,2,3\}\), the three-state disability model depicted in Fig. 5.

Fig. 5
figure 5

The classic three-state disability model with reactivation

We consider a 40-year-old male today (at time 0) with a retirement age of 65 and the following life insurance contract:

  • A disability annuity of rate 1 while disabled until the retirement of age 65.

  • A life annuity of rate 1 while alive until the retirement of age 65.

  • A constant premium rate \(\theta \) paid while active until the retirement of age 65, priced under the equivalence principle at time 0.

The maximum contract time is \(T = 70\), corresponding to a maximum age of 110 years. The transition rates are given by

$$\begin{aligned} \lambda ^b_{12}(s)&= \left( 0.0004+10^{4.54+0.06(s+40)-10}\right) \! 1_{(s\le 25)}, \\ \lambda _{21}^b(s)&= \left( 2.0058 e^{-0.117(s+40)}\right) \! 1_{(s\le 25)}, \\ \lambda _{13}^b(s)&= 0.0005+10^{5.88+0.038(s+40)-10}, \\ \lambda _{23}^b(s)&= \lambda _{13}^b(s)\!\left( 1+1_{(s\le 25)}\right) \!. \end{aligned}$$

The payment matrices for this product combination corresponds to having \(\varvec{{B}}(t) = \varvec{{\Lambda }}^1(t) = \varvec{{0}}\), and

$$\begin{aligned} \varvec{{{b}}}(t; \theta ) = {\left\{ \begin{array}{ll} (\theta , 1, 0), \qquad &{}{}\text{ for } \ t\le 25, \\ (1,1,0), &{}{}\text{ for } \ t>25. \end{array}\right. } \end{aligned}$$

For the stochastic interest rate model, we take the fitted bond prices from Example 4.1 with \(p = 4\) phases, so that the interest rates are given as \(r(t) = r_{X_r(t)}\), with

$$\begin{aligned} \varvec{{r}} = (0.025, 0.050, 0.075, 0.100), \end{aligned}$$

and where \(X_r\) is a time-homogeneous Markov jump process taking values on the finite state space \(E_r = \{1,2,3,4\}\) with initial distribution \(\varvec{{\pi }} = (1,0,0,0)\) and transition intensity matrix

$$\begin{aligned} \varvec{{\Lambda }}_r = \begin{pmatrix} -0.25 &{} 0.22 &{} 0.01 &{} 0 \\ 0.14 &{} -1.11 &{} 0.75 &{} 0.18 \\ 0.06 &{} 0.29 &{} -0.63 &{} 0.2 \\ 0.09 &{} 0.22 &{} 0.65 &{} -1.05 \end{pmatrix} + \varvec{{\Delta }}(\varvec{{r}}). \end{aligned}$$

We then determine the equivalence premium \(\theta \) using the method outlined in Sect. 5.3. This is explicit of the form (33) due to \(\varvec{{b}}(t;\cdot )\) being affine (for fixed t), and we get \(\theta = 0.1583467\). This is almost three times lower than the premium rate obtained when pricing with a constant first-order interest rate of \(1\%\) as in [11], which makes sense since the present interest rate model always gives interest rates above this level.

We then calculate moments of up to order 20 of the present value of future payments to approximate its density and distribution function via Gram–Charlier expansions based on the (shifted) Jacobi polynomials, as outlined in Sect. 5.5. The parameters used in the procedure are shown in Table 1, and the resulting density and distribution function are shown in Fig. 6.

Table 1 Parameters for the Gram–Charlier implementation with (shifted) Jacobi polynomials
Fig. 6
figure 6

Left: Density approximation based on 20 moments and a histogram based on 1, 000, 000 simulations. Right: Distribution function approximation based on the same 20 moments and the empirical distribution function from the same simulations

From the fitted distribution function, one may compute different quantities of interest, e.g., quantiles of the present value. In Table 2, we show various quantiles based on the empirical (simulated) distribution function and the approximated distribution function based on 20 moments.

Table 2 Selected quantiles of the present value based on the empirical distribution of 1,000,000 simulations and based on the Gram–Charlier approximation based on 20 moments