Phase-type representations of stochastic interest rates with applications to life insurance

The purpose of the present paper is to incorporate stochastic interest rates into a matrix-approach to multi-state life insurance, where formulas for reserves, moments of future payments and equivalence premiums can be obtained as explicit formulas in terms of product integrals or matrix exponentials. To this end we consider the Markovian interest model, where the rates are piecewise deterministic (or even constant) in the different states of a Markov jump process, and which is shown to integrate naturally into the matrix framework. The discounting factor then becomes the price of a zero-coupon bond which may or may not be correlated with the biometric insurance process. Another nice feature about the Markovian interest model is that the price of the bond coincides with the survival function of a phase-type distributed random variable. This, in particular, allows for calibrating the Markovian interest rate models using a maximum likelihood approach to observed data (prices) or to theoretical models like e.g. a Vasiˇcek model. Due to the denseness of phase-type distributions, we can approximate the price behaviour of any zero-coupon bond with interest rates bounded from below by choosing the number of possible interest rate values sufﬁciently large. For observed data models with few data points, lower dimensions will usually sufﬁce, while for theoretical models the dimensionality is only a computational issue.


Introduction
This paper considers stochastic interest models, which are state-wise deterministic dependent on an underlying finite state-space Markov process.The spot rate r(u) at time u is assumed to be on the form where {X(u)} u≥0 denotes a time-inhomogeneous Markov jump process on a p-dimensional state-space, and r i (u), i = 1, ..., p, are deterministic functions.Assuming an arbitrage free bond market, a zero-coupon bond with terminal date T can then be defined in terms of its prices by where F(t) = σ(X(u) : 0 ≤ u ≤ t) is the σ-algebra generated by {X(u)} u≥0 .The expectation is taken under some risk-neutral measure Q (see, e.g., Björk [2009], Elliott and Kopp [1999]).If all r i (u) ≥ 0, a key result of the paper is that, conditionally on X(t), T → B(t, T ) equals the survival function of an inhomogeneous phase-type distribution.
In the presence of negative interest rates, this is longer certain since B(t, T ) may be larger than one and non-monotone.However, assuming that the negative interest rates are bounded from below by a number −ρ < 0, we get from (2) that e −ρ(T −t) B(t, T ) = E Q e − T t (r X(u) +ρ) du F(t) then equals a survival function of an inhomogeneous phase-type distribution.
The interpretation that the bond prices are (possibly scaled) phase-type survival functions enables us to fit (calibrate) the transition rates of {X(u)} u≥0 from the observed bond prices by using a maximum likelihood approach.Since phase-type distributions are dense, i.e. can approximate any distribution with a sufficient number of phases, we may then fit a PH to the observed survival function (equivalent to a histogram) such that all observations (bond prices) are hit.The last point of observation may be considered right censored.All fitted transition rates are under a risk-neutral measure Q.
The functional form of the state-wise price of the bond was noted already in [Norberg, 2003, (3.17)], though its relation to phase-type theory was not mentioned, and its potential was not further explored.We also believe that the "bond price representation" (2) of a phase-type survival function is unknown to the phase-type community.
In the context of multi-state life insurance, modelling stochastic interest rates also play a crucial role.The literature varies from SDE based models, see e.g.Norberg and Møller [1996], Møller and Steffensen [2007], Buchardt [2014], Baños [2020], Asmussen and Steffensen [2020], to the finite state-space Markov chain models of Norberg [1995b,a] on the form (1).
In the SDE-based methods, one often relies on an independence assumption between interest rates and biometric risk so that available forward rate curves can be used for valuation; an exception is Buchardt [2014], where dependence between interest rates and biometric risk is incorporated.In either case, the SDE-based models do not integrate into classic Thiele and Hattendorf type of results, which limits time-dynamic valuations based on these traditional methods.
The spot rate model (1), however, can be wholly incorporated into Thiele and Hattendorf type of differential equations for reserves and higher order moments, as shown by Norberg [1995a,b] and further explored in Norberg [2003].These observations allow for dependency between interest rates and transitions in life insurance, as well as time-dynamic valuations, without altering the traditional methods.The latter refers to the model (1) as the Markov chain market while Koller [2012] refers to it as Markovian interest intensities.
In this paper, we work with an extended version of the bond prices, which in an insurance context are the discounting factors on the event that the terminal state will be j.Providing a matrix-representation for (4), we then find how it naturally integrates into the matrix framework of Bladt et al. [2020].The extension is convenient from a mathematical point of view and also relates to the partial (Bladt et al. [2020]) and retrospective reserves in single states ( [Norberg, 1991, Sec. 5E]).The treatment of the latter, however, is outside the scope of the current paper.We restate the results of the latter framework in the context of stochastic interest rates.The proofs, and parts of the exposition, will differ from that of Bladt et al. [2020].
Markov jump processes in finance are often used in connection with regime switching models or where the different states are used to alter the parameters of usually SDE-driven processes.
Here transitions can take place under some physical measure and may have a real-world interpretation.The Markov chain model for interest rates (1) can be thought of as a regimeswitching model under a risk-neutral measure, particularly if the interest rates for each state are known a priori.
The Markov jump process approach can approximate bond price modelling in terms of diffusions.Formal constructions have been made in Bharucha-Reid [1960], Kurtz [1970Kurtz [ , 1978]], Mijatović and Pistorius [2013].Since phase-type distributions form a dense class of distributions on the positive reals, this paper will offer an alternative and parsimonious way to approximate any zero-coupon bond (arbitrarily close) by a bond on the form (2).
The paper is organised as follows.Section 2 introduces some background and notation.Bond price modelling using phase-type distribution is developed in Section 3. In Section 4, we develop estimation of the Markovian interest rate model, both with and without restricted interest rates, and we provide examples of calibration to diffusion models and real data.In Section 5 we adjust the life-insurance framework of Bladt et al. [2020] to allow for stochastic interest rates of the form (1). It contains examples of how to set up a model using the fitted bond parameters of Section 3 as well as a matrix-based method for calculating the equivalence premium, either via Newton's method or as an explicit formula.In Section 6 we present a numerical example.For the sake of exposition, the proofs are deferred to Appendix B.
Unless otherwise stated, row vectors are denoted by bold Greek lowercase letters (e.g., π) and column vectors by bold lowercase Roman letters (e.g., v).Elements of vectors are denoted by the same unbold, indexed letters (like v = (v 1 , ..., v p ) ).The vector e i is the column vector which is 1 at index i and zero otherwise whereas e = (1, 1, ..., 1) .
Matrices are denoted by bold capital letters (Greek or Roman) and their elements by their corresponding lowercase indexed letters (e.g.A = {a ij }).If v is a vector (row or column), then ∆(v) denotes the diagonal matrix, which has v as diagonal.
2.2.The product integral.(5) The solution to (5), which in general is not explicitly available, will be denoted by and referred to as the product integral of M (x) from s to t.This is also true for general matrix functions M (t), which satisfy (5) but are not intensity matrices.
Product integrals have several nice properties.For any s, t, u ≥ 0, it satisfies the product rule which in turn implies that the product integral is invertible with If all M (x) commute, then In particular, for M (x) ≡ M , we get If A(x) and B(y) commute for all x, y, then In particular, e −r(t−s) t s where I denotes the identity matrix.
Remark 2.1.The idea behind the notation of the product integral comes from a Riemann type of construction using step-functions.If we approximate M (x) by a piecewise constant matrix function taking values . By letting ∆x i → 0 and using (7) we then arrive at the notation (6).
A valuable formula for computing integrals involving product integrals is the so-called Van-Loan's formula for product integrals (see [Bladt et al., 2020, Lemma 2]), which states that This formula is valid for matrix functions A(x), B(x) and C(x), which are piecewise continuous.The matrices A(x) and C(x) are square matrices of possibly different dimensions, so B(x) is not necessarily a square matrix.

Let
where ⊗ denotes the Kronecker product.The Kronecker product between a p 1 × q 1 matrix A = {a ij } and a p 2 × q 2 matrix B = {b ij } is defined as the p 1 p 2 × q 1 q 2 matrix and we conclude that A similar argument gives that Finally, if A(t) and B(t) are Riemann integrable matrix functions of dimensions q × q and p × p respectively, then where ⊕ denotes the Kronecker sum, defined by , and where the first I has the dimension of B(t) and the second I has the dimension of A(t).To see this, we notice that A(t) ⊗ I and I ⊗ B(t) commute, so by (11) we get that For further details on Kronecker products and sums, we refer to Graham [1981] 2.3.Phase-type distributions.
Consider a (time-inhomogeneous) Markov jump process {Y (t)} t≥0 , where state p + 1 is absorbing and 1, ..., p are transient.The intensity matrix M (x) for {Y (t)} t≥0 is then on the form where T (x) is a p × p sub-intensity matrix consisting of transition rates between transient states, and t(x) = −T (x)e is a column vector of exit rates, i.e. rates for jumping to the absorbing state.Then by Van-Loan's formula (13), the transition matrix for {Y (t)} t≥0 is given by Hence t s (I + T (u) du) is the matrix which contains the transition probabilities between the transient states from times s to t.
We assume that P(Y (0) = p + 1) = 0, and define π i = P(Y (0) = i).Hence π = (π 1 , ..., π p ) satisfies that πe = i π i = 1, so that π is the initial distribution for {Y (t)} t≥0 concentrated on the transient states only.Then is a row vector that contains the probabilities of the process being in the different transient states at time t.
Now let τ = inf{t > 0 : Y (t) = p + 1} denote the time until absorption.Then from (18) we immediately get that since the right-hand side equals the probability of the process belonging to any of the transient states by time t, i.e., absorption has not yet occurred.Differentiating (19) and using (5) we see that τ has a density on the form Definition 2.2.The distribution of τ is called an inhomogeneous phase-type distribution, and we write τ ∼ IPH(π, T (x)), where the indexation of T (x) is over x ≥ 0.
We do not need to specify t(x) since it is implicitly given by T (x).Indeed, since row sums of intensity matrices (and hence of ( 17)) are zero, we have that t(x) = −T (x)e.If T (x) ≡ T , then we simply write τ ∼ PH(π, T ).This corresponds to the underlying Markov jump process being time-homogeneous.
We also notice T (x)+∆(t(x)) defines an intensity matrix (without the absorbing state).
The class of phase-type distributions (both PH and IPH) is dense (in the sense of weak convergence) in the class of distributions on the positive reals, implying that any distribution with support R + may be approximated arbitrarily close by a phase-type distribution.This result is also of considerable practical importance since phase-type distributions can be fitted both to data and distributions using a maximum likelihood approach.For the time-homogenous case, PH, see Asmussen et al. [1996] while for IPH we refer to Albrecher et al. [2022].

Phase-type representations of bond prices
Consider the stochastic interest rate model of (1), and let E = {1, . . ., p} denote the statespace of the Markov jump process X = {X(t)} t≥0 with intensity matrix Let r(t) = r 1 (t), . . ., r p (t) be the column vector which contains the interest rate functions.
The main result of this section is the following result.
Theorem 3.1.For i, j ∈ E, let Then the matrix D(s, t) = {d ij (s, t)} i,j∈E has the following representation Proof.Conditioning on the state of s + ds, we get that In matrix form, this amounts to Noting that D(t, t) = P (t, t) = I, we hence conclude that (21) holds.
Remark 3.2.The quantities d ij (s, t) in Theorem 3.4 are introduced as r-deflated transition probabilities in [Buchardt et al., 2020, Appendix 1], where the authors derive the differential equation ( 23).While they give a martingale-based proof, we provide a probabilistic sample path argument and give a product integral representation.
Remark 3.3.Multiplying both sides of ( 23) with e from the right, we recover the differential equation for the state-wise discount factors obtained in [Norberg, 1995b, (4.4)].
Assume that all r i (x) are bounded from below, and let Then ρ = 0 if all interest rates are non-negative, and otherwise −ρ provides a lower bound for all of them.Then we have the following result.
Theorem 3.4.The price of the zero-coupon bond (2) satisfies Conditional on X(t) = i, T → e −ρ(T −t) B(t, T ) is the survival function for an IPH distributed random variable, τ (t), with initial distribution e i and intensity matrices In particular, if all interest rates are non-negative, then ρ = 0 and the price itself, T → B(t, T ) becomes the survival function.
Proof.The formula (24) follows directly from the construction of the D(s, t) matrix by summing out over j in d ij (t, T ), which corresponds to post-multiplying D(t, T ) by e. Next, we notice that e −ρ(T −t) which follows from (12).The matrix M (x) − ∆(r(x)) − ρI is a sub-intensity matrix, which together with the distribution for X(t) defines a phase-type representation (π t , M (x + t) − ∆(r(x + t) − ρI), x ≥ 0 (starting at time t).
The forward rate f (t, T ) is defined by Using Theorem 3.4, we may write where Fτ(t , where f τ (t) denotes the density function for τ (t).Hence we have proved the following result.
Another immediate consequence of Theorem 3.4 is the following.
Corollary 3.6.Assume that all interest rates are non-negative.Then conditional on X(t) = i, the random variable τ (t) ∼ IPH(e i , M (t + x) − ∆(r(t + x))), x ≥ 0 then has a c.d.f.given by Proof.This follows from Theorem 3.4 with ρ = 0 and Integrating the expression then yields the result.
For the case where t = 0, the above results are reduced to the following.
Corollary 3.7.Assume that all interest rates are non-negative.Let τ ∼ IPH(π, M (x) − ∆(r(x))) and let π = (π 1 , ..., π p ) denote the (initial) distribution of X(0).Then Remark 3.8.The density f τ (t) has the interpretation of being the expected present value of the current interest rate accumulated in a small time interval arround t, and F τ (T ) is the present value of the total accumulated interest rate during [0, T ].

Estimation
Time-homogeneous phase-type distributions or inhomogeneous phase-type distribution where the sub-intensity matrices are on the form for some parametric function λ θ (x), can be estimated in terms of an EM algorithm.
An observation from a phase-type distribution is hence considered to be the time until a Markov jump process is absorbed, where all transitions and sojourn times in the different states are unobserved.This makes the estimation an incomplete data problem, which an EM algorithm can solve.Essentially the unobserved sufficient statistics (number of jumps between states, total time in then different states) are replaced by their conditional expectations given data and used in the explicit formulas for the maximum likelihood estimators.This updates the parameters, and the procedure is repeated until convergence.Convergence is secured as the likelihood increases in each step.The limit may be a global or only a local maximum.
Repeated data (absorption times), of course result in the same conditional expectations given their data.This carries over to weighted data as well, and hence the EM algorithm may efficiently estimate data in histograms.In particular, we may estimate to theoretical distributions by treating their discretised density as a histogram.This provides the link to fitting the intensity matrix of {X(t)} t≥0 in (1) through bond prices, (2) or (3), either in terms of observed data or to a theoretical model.
Indeed, consider bond prices B(0, T i ) available at different maturities T 1 , T 2 , ..., T n .Then according to Theorem 3.4 we have that for some ρ > 0 and where τ ∼ IPH(π, M (u) − ∆(r(u)) − ρI).Then ρ must satisfy that This can be achieved by choosing In the life-insurance context in Denmark, by regulation the bond prices (discounting factors) must be computed from discrete forward rates, f d (0, T i ), published by the Danish Financial Supervisory Authority.Thus Hence Hence calibrating to data B(0, T i ), i = 1, ..., n can be done by fitting PH or IPH distributions to e −ρT i B(0, T i ) using an EM algorithm.The possible interest rates can either be picked by the EM algorithm (referred to as unrestricted interest rates), or we can fix the possible rates to values (or functions) of our choice (restricted interest rates).
In the former case, we obtain a maximum likelihood estimate ( π, T (x)) for the parameters.
The estimate for M (x) is then readily obtained from To find the induced interest rates, we also have from Theorem 3.4 that T (x) = M (x) − ∆(r(x)) − ρI so we conclude that the estimated exit rates t(x) must satisfy where e is the vector of ones.Hence the induced interest rates are given by r(x) = t(x) − ρe.
Neither the transition rates nor the interest rates are unique, but the resulting discount factor (bond price) is invariant under different representations, which is all that matters regarding reserving in the insurance context.
If, in turn, we decide to choose the possible range of interest rates r i (x) ourselves, then the EM-algorithm is modified not to update the exit rates.This modification is easily dealt with by simply removing updates of the latter in the original EM algorithm of Asmussen et al. [1996] or Albrecher et al. [2022].See Appendix A for details.In this case, the exit rates will be fixed at While the parametrisation of the transition rates may not be unique, the interest rates remain fixed.
We now present two examples of fitting to real data and one example to a theoretical model.The estimation is computed using the R-package matrixdist.This corresponds to an empirical survival distribution to which we can then fit phase-type distributions of different dimensions.Regarding the discretisation, we let 0.5 + i, i = 0, ..., 29 denote the data points with probability mass B(i) − B(i + 1), where B(0) = 1, and a right censored data point at 30 with probability mass B(30) = 0.1994495.Since all observed bond prices are less than one, we have ρ = 0, corresponding to an environment with non-negative interest rates.
We used p = 2, 3, 4, 5, 10 and 15 phases, with state-wise interest rates being r p i = i/(10p), i = 1, ..., p for the different dimensions p. Underlying this choice is the assumption that the interest rates fluctuate between 1% and 10%, and the r i 's are obtained as the points that divide the interval [0, 0.1] into p, including the right endpoint.The vectors r p = (r p 1 , ..., r p p ) will serve as exit rate vectors of the phase-type distributions to be fitted.
In Figure 1 (left), we have plotted the phase-type fits to the empirical survival curve for dimensions p = 2, 3, 4, 5.At dimension 3, we obtain a decent fit and excellent fits for dimensions 4 and 5.The likelihood values for 4 cases are -3.178171,-3.16838,-3.166633,and -3.166182.Further experimentation with dimensions 10 and 15 resulted in likelihoods of -3.165002 and -3.164654, respectively.However, the plots of bond prices and yields are indistinguishable from the plots corresponding to dimension 5, see Figure 2. We can also assess the quality of the fits by plotting the estimated density function vs. the weighted data, shown in Figure 3. Again the plots for dimensions p = 5, 10, 15 are almost indistinguishable.Therefore, we conclude that dimension 4 or 5 will suffice to approximate the bond prices.
The estimates of the sub-intensity matrix M − ∆(r) (under a risk neutral measure Q) for dimensions p = 3, 4, 5 are given by To fit the bond prices, the initial distributions of Markov processes were all on the form (1, 0, ..., 0) of appropriate dimension, i.e., initiation in state 1.
• Example 4.2 (Fitting to 2019 bond prices with unrestricted interest rates).To illustrate the applicability of our methods also in the case of a negative interest rate environment, we can instead fit to bond prices as of 31/12/2019 from the Danish Financial Supervisory Authority; this dataset consists of maturities of T = 1, 2, ..., 120 years.In this case, we let the EM algorithm choose the necessary positive and negative interest rates.
The first five years have bond prices above one and given by 1.00231736, 1.00403337, 1.00445679, 1.00382807, and 1.00197787, which reflects the (slightly) negative interest rate environment at the time.From (29), we get ρ = 0.002314677 as the exponential factor to down-scale prices to below one.In Figure 5, we show the phase-type fits to the bond prices.We have used the subclass of time-homogeneous Coxian distributions, where initiation is always in state 1, and the only possible transitions are from a state, i say, to the following, i + 1, or to exit to the absorbing state.If the primary purpose is using the fits as a discounting factor in a life-insurance model, then probably all fits could be used (right plot).If the yield curve fitting is the concern, then only dimensions 10 and 15 seem to catch the appropriate curvature.Regarding the probability density of the phase-type, the 15-dimensional fit is the best.
These should also be counted as parameters.

•
Example 4.3 (Fitting to a two-factor Vasicek model).In this example we consider the two-factor Vasicek short rate model G2++ (see Brigo and Mercurio [2006]) with an initial negative interest rate.

Applications to life insurance
In this section, we incorporate the stochastic interest rate model of the previous sections to life insurance valuations.We consider the model introduced by Norberg [1995a,b] and extend their results on reserves and higher order moments to so-called partial reserves and higher order moments, that is, corresponding results on events of the terminal state.Partial reserves and moments play important roles when dealing with so-called retrospective reserves in single states (cf.[5, Section 5.E]), which, however is outside the scope of the present paper.We provide this extension following the matrix approach of Bladt et al. [2020] so that these types of results are extended to allow for stochastic interest rates on the form (1). The extensions of the results of these papers are pointed out in a series of remarks throughout the section.

A Life insurance model with stochastic interest rates.
Let X = {X(t)} t≥0 be a time-inhomogeneous Markov jump process with a finite state-space E and intensity matrix Λ(t) = {λ ij (t)} i,j∈E .Then we define a payment process {B(t)} t≥0 by where b i (t) are continuous payment rates (negative if premiums) and b ij (t) lump sum payments, which occur according to the counting measure N ij (t).The intensity matrix is decomposed into where Λ 1 (t) is a non-negative matrix and, consequently, Λ 0 (t) a sub-intensity matrix, i.e. row sums are non-positive.The counting process is linked to the transitions of X in the following way.Upon transition from i to j, i = j, in X at time t, a lump sum payment of b ij (t) will be triggered with probability If i = j, then N ii (t) denotes an inhomogeneous Poisson process with intensity λ ii (t), and a lump sum during a sojourn in state i will then be triggered in [t, t + dt) with probability λ 1 ii (t) dt.Finally, we assume that the spot interest rates in state i follow a deterministic function r i (t).Hence the interest rates follow the model (1).
Remark 5.1.The classic Markov chain life insurance setting of, e.g., Hoem [1969], Norberg [1991], is the recovered if r i (t) ≡ r(t), b ii (t) = 0 and if the probabilities (32) are either zero or one.Extending the classic setting to allow for different interest rates in the different states was considered in Norberg [1995a,b], where Thiele type of differential equations for the reserves and higher order moments were derived.
For the purpose of computing reserves and higher order moments, [Bladt et al., 2020, (3.8)-(3.11)],we let b(t) = (b i (t)) i∈E denote the vector containing the continuous rates, and define matrices where ∆(b(t)) denotes the diagonal matrix with b(t) as diagonal.The operator • denotes Schur (entrywise) matrix product, defined by Hence B(t) is the matrix containing the lump payments at transitions and at Poisson arrivals during sojourns, R(t) is the matrix whose ij'th element is the expected reward accumulated during [t, t + dt) upon transition from i to j, or during a sojourn in state i if i = j.The C (k) (t) matrix is more technical to be used when dealing with higher order moments.
Finally, we let Now assume that the interest rate process is modelled and fitted using bond prices like in Section 3. Accordingly there is a Markov jump process X r = {X r (t)} t≥0 with state-space E r = {1, 2, ..., p} and intensity matrix Λ r (t) = {λ r ij (t)} t≥0 , say, such that the corresponding bond prices B(t, T ) are given as in Theorem 3.4.Similarly, we let X b = {X b (t)} t≥0 denote the Markov jump process governing the transition between the biometric states with the statespace E b = {1, 2, ..., q} and intensity matrix Λ b (t) = {λ b ij (t)} t≥0 .Hence the Markov jump process appearing in (30) can be written on the form Hence we need to decide upon an ordering of E, which will be lexicographical.This means the for elements (i, ĩ), (j, j) ∈ E, (i, ĩ) < (j, j) ⇐⇒ (i − 1)p + ĩ < (j − 1)p + j.
In other words, each biometric state i consists of sub-states (i, 1), ..., (i, q) depending on the state of the underlying Markov process X r , see Figure 6.
The processes X b and X r may or may not be independent, and the payment processes (30) likewise may or may not be independent of X r .In the independent case the processes X b and X r are defined on each their state-space, and the common state-space will be the product set of the two.If the processes are sharing states, with the possibility of having simultaneous jumps, then we obtain dependency of the processes.Such a case could, e.g.be a rise in the interest rate causing an increased intensity of jumping to surrender or free-policy states (see, e.g., Buchardt [2014]).
In the following example, we consider the simplifications in the representations when assuming independence.
Example 5.1 (Independence).If the transition rates of X satisfy, for all i, j ∈ E b , j = i, and ĩ, j ∈ E r , j = ĩ, λ (i, ĩ),(i, j) (t) = λ (j, ĩ),(j, j) (t) =: λ r ĩj (t), we have that X b and X r are independent.Using the lexicographical ordering, we can, in this case, obtain compact matrix representations in terms of the two processes as follows.The transition intensity matrix of X is now of the form where ⊕ denotes the Kronecker sum, and where I n denotes the identity matrix of dimension n × n.We recall that the Kronecker product, ⊗, is defined by The interest rate vector satisfies r(t) = e ⊗ (r 1 (t), ..., r p (t)), where e = (1, 1, ..., 1) .
If we further assume that the payment process (30) is independent of X r , i.e. such that the payment functions satisfy, for all i, j ∈ E b and ĩ, j we have that the payment matrices are on the form Similarly, we may directly decompose Λ b : The conceptual difference in the decomposition of Λ 1 and Λ 0 lies in the absence of lump sum payments upon transition between interest levels. •

Reserves.
We now consider the valuation of the payment process B. Introduce the matrix of partial state-wise prospective reserves, Due to the stochastic interest rates, this is an extension of Bladt et al. [2020].With D(s, t), introduced in ( 21), modified to the setup of this section as we have the following result.
Theorem 5.2.The matrix of partial state-wise prospective reserves V (s, t) has the following integral representation: Proof.See Appendix B.
The actual computation of the reserves can be effectively executed using the following Van-Loan type of formula, which avoids integration.
Corollary 5.3.V (s, t) can be extracted from the relation Finally, we state and prove Thiele's differential equations for partial reserves with stochastic interest rates.
where V (t, t) = 0.For the conventional state-wise prospective reserves, V T h (t) = V (t, T )e, this has the form where V T h (T ) = 0.
Proof.See Appendix B.
Remark 5.5.Writing out the elements of the differential equation for V T h , we get for i ∈ E, V T h i (T ) = 0, which is the differential equation obtained in [Norberg, 1995a, (3.2)] in the case of a first-order moment.
5.3.Higher order moments.Consider the matrix of partial state-wise higher order moments of future payments, given by, for k ∈ N (see [Bladt et al., 2020, (3.6)-(3.7)]), and introduce what we shall term the reduced partial state-wise higher order moments: Since all payment functions and transition rates are deterministic, results for these higherorder moments are now straightforward to obtain by using the undiscounted result, where m (k) r (t, T ), k ∈ N, contains the partial state-wise k'th moment, normalised by k!, of the undiscounted future payments (see [Bladt et al., 2020, (7.4)]), i.e.V (k) r (s, T ) with no interest rate.Indeed, rates b i (t) and lump sums b ij (t) must be replaced by the discounted versions with discounting factor, exp(− we then obtain the following version of Hattendorff's theorem for partial reserves with stochastic interest rate.
Theorem 5.6.The matrix of reduced partial state-wise higher order moments satisfies the integral equation, for k ∈ N 0 , Proof.See Appendix B. Defining we get by Van Loan that From these results, we can derive a number of classical results.Differentiation of ( 35) gives We then obtain the following differential equation by only considering the first row times the last column.
Theorem 5.7.The matrix of reduced partial state-wise higher order moments satisfies the system of differential equations, for k ∈ N 0 , Remark 5.8.A martingal-based proof for the corresponding (unreduced) state-wise moments, k!V (k) r (t)e, can be found in Norberg [1995a].
Example 5.2 (Independence continued).We can continue our decompositions from the independence case of Example 5.1 to reserves and higher-order moments.Indeed, since we get from (16) that Thus, each diagonal block element can be computed using these representations when setting up the matrix F U for the computation of these higher order moments.
In particular, for partial state-wise reserves (i.e.k = 1), we obtain a more direct expression.
Assuming that the initial biometric state is i ∈ E b , the terminal j ∈ E b and that the initial distribution of the fitted interest rate phase-type distribution is π.Then which is consistent with similar expressions obtained in Norberg [1995b].• 5.4.Equivalence premium.
Assume that R(t) = R(t; θ) such that θ is a parameter of either B(t) and/or ∆(b(t)) only.Hence, θ could, e.g., be a premium rate in state 1 or a transition payment between some states.We then write V (t) = V (t; θ) so that Hence we get from the Van Loan formula (13), Remark 5.9.Similar kinds of derivatives as those of (36) are considered in Kalashnikov and Norberg [2003], where differential equations for reserves concerning valuation elements and payments are derived.The formulas presented here may thus be seen as corresponding matrix representations.
If state i ∈ E is the starting state, we can formulate the equivalence principle by finding the θ that solves V T h i (0; θ) = e i V (0; θ)e = 0 using Newton's method, where V θ denotes the partial derivative wrt.θ.For example, if θ is a constant premium (rate) such that R θ (t; θ) = A(t), i.e. a matrix function not depending on θ, then V θ (t; θ) = V θ (t) will not depend on θ either, so we conclude that the map θ → V T h i (t; θ) is linear (for fixed t), so that in particular V T h i (0; θ) = aθ + b for some constants a, b.Then b can be computed from b = V T h i (0; 0) = e i V (0; 0)e and a = e i V θ (0; 0)e.Hence, Newton's method converges in one iteration, and the θ which fulfils the equivalence principle is given by θ = − e i V (0; 0)e e i V θ (0; 0)e .
Hence, this formula can compute the equivalence premium if it is assumed to be (piecewise) constant over time, which is often the case in practical examples.However, the formulation in terms of derivatives is usually not seen, with [Kalashnikov and Norberg, 2003, (3.5)] being one of few exceptions.If the constancy assumption is not satisfied, a parametrised expression in terms of θ can be calculated by Newton's method.
5.5.Distributions of future payments based on reduced moments.
In this section, we briefly comment on the implementation of the Gram-Charlier series for the density and distribution functions based on reduced moments, following along the lines of Bladt et al. [2020]; for an approach based on PDEs and integral equations (though not implemented numerically), we refer to [Norberg, 2005, Section 5].
The goal is to approximate the distribution of using a Gram-Charlier series expansion.In Bladt et al. [2020], it was shown that under suitable regularity conditions, the density f for X can be approximated by where f * is a reference density, p n (x) an orthonormal basis of polynomials for Hilbert space L 2 (f * ), and c n = E(p n (X)).The reference distribution f * can be chosen arbitrarily as long as f /f * ∈ L 2 (f * ).Hence it is advisable to choose f * as close to f as possible.
For a given reference density f * , the polynomials x n f * (x) dx, n = 0, 1, ... defines an orthogonal basis for Hilbert space L 2 (f * ) with inner product

With the Hankel determinants
, n = 0, 1, ..... it can then be shown that is an orthonormal basis (ONB) in L 2 (f * ).Also, it is immediate that If f * is chosen to be the standard normal distribution, the corresponding polynomials p n are the (probabilists) Hermite polynomials.While the Hermite polynomials were used in Bladt et al. [2020] up to very high orders, their use in the following example fails already at low orders.This is likely caused by the tail of the normal distribution being too light.We propose a class of reference distributions based on a shifted beta distribution closely related to the Jacobi Polynomials as an alternative.This distribution will have finite support but a much heavier tail.Finite support is usually not a problem in a life insurance context.Define a reference distribution f * with support on a finite interval [a, b] by Thus we need to find an orthonormal basis for L 2 (f * ).The starting point is the weight function The space L 2 (w) has an orthogonal basis of Jacobi polynomials given by So for given a, b, we need to compute Here where the inner expectation is computed as Finally, the approximation is then given by Concerning the corresponding distribution function, we integrate the above equation to obtain Hence, these formulas can be used to approximate the density and distribution via these Jacobi types of polynomials.

Numerical Example
We now present a numerical example based on Example 5.1, where interest rates and biometric risk are assumed independent, where we carry over the estimation of interest transition rates from the calibrated bond prices of Section 3.
Consider the numerical example of Buchardt and Møller [2015] as the model for the biometric risk and corresponding life insurance contract.That is, the states of the insured X b are modelled as a time-inhomogeneous Markov jump process taking values E b = {1, 2, 3}, the three-state disability model depicted in Figure 7.We consider a 40-year-old male today (at time 0) with a retirement age of 65 and the following life insurance contract: • A disability annuity of rate 1 while disabled until the retirement of age 65.
• A life annuity of rate 1 while alive until the retirement of age 65.
• A constant premium rate θ paid while active until the retirement of age 65, priced under the equivalence principle at time 0. The payment matrices for this product combination corresponds to having B(t) = Λ 1 (t) = 0, and .
For the stochastic interest rate model, we take the fitted bond prices from Example 4.1 with p = 4 phases, so that the interest rates are given as r(t) = r Xr(t) , with r = (0.025, 0.050, 0.075, 0.100), and where X r is a time-homogeneous Markov jump process taking values the finite state space E r = {1, 2, 3, 4} with initial distribution π = (1, 0, 0, 0) and transition intensity matrix We then determine the equivalence premium θ using the method outlined in Section 5.4.This is explicit on the form (37) due to b(t; •) being affine (for fixed t), and we get θ = 0.1583467.This is almost three times lower than the premium rate obtained when pricing with a constant first-order interest rate of 1% as in Buchardt and Møller [2015], which makes sense since the present interest rate model always gives interest rates above this level.
We then calculate moments of up to order 20 of the present value of future payments to approximate its density and distribution function via Gram-Charlier expansions based on the (shifted) Jacobi polynomials, as outlined in Section 5.5.The parameters used in the procedure are shown in  In matrix form this amounts to (34).
Proof of Theorem 5.4.Using that the product integral satisfies Kolmogorov's forward and backward equations, we get that  (t) dt + b X(t−)j (t) dN X(t−)j (t), and assume that s ∈ [t, T ] is a point of increase for the counting process x → N ab (x) which trigger lump sum payments.Then in the computation of the above integral, there will be jump contributions at time s, where any number m ∈ {1, 2, ..., k} of the variables x 1 , ..., x k may be equal to s, say Indeed, since there are precisely m coincidences, the remaining integrals must start from s+ = s; otherwise, the integration intervals would contain s as well.Changing the lower limits of the integrals appearing in the exponentials, we can further rewrite the expression as On the event that X(x) = , the contribution to the expectation of the above integral ( 46), where no coincidences are allowed (i.e., the reward at time x from at most one jump and benefit rates), amounts to E 1{X(T ) = j} Adding ( 45) and ( 47) then prove the result.

Figure 5 .
Figure 5. Fitted phase-type densities (left) and corresponding yield curves (middle) and bond prices (right) for dimensions p = 3,4,5 based on bond prices from the two-factor Vasicek G2++ model.

Figure 6 .
Figure 6.Lexicographical ordering: for each biometric state (blue), several sub-states (orange) define the underlying interest rate level.
a) n = a(a + 1) • • • (a + n − 1) denotes the Pochammer symbol.By normalizing the weight function into a density on [−1, 1] and then transforming it into a density on [a, b], we obtain an ONB for f * of polynomials given by

Figure 8 .Ep
Figure 8. Left: Density approximation based on 20 moments and a histogram based on 1,000,000 simulations.Right: Distribution function approximation based on the same 20 moments and the empirical distribution function from the same simulations.
's differential equation can be pulled out from the upper right corner of each side of the equation.Proof of Theorem 5.6.Write T t e − x t r X(u) du dA(x) u) du • • • e − x k t r X(u) du dA(x k ) • • • dA(x 1 ).(41)Now dA(t) = b X s.We can pick m out of the k variables in k m = k!/(m!(k−m)!) ways.If m variables coincide at the jump time s, then a contribution of b ab (s) m is added.Hence only looking at jump coincidences, i.e. m ≥ 2, the contribution to the integral (41) is u) du • • • e − x k t r X(u) du dA(x k ) • • • dA(x m+1 ) dN ab (s).

Table 1
, and the resulting density and distribution function are shown in Figure8.From the fitted distribution function, one may compute different quantities of interest, e.g., quantiles of the present value.In Table2, we show various quantiles based on the empirical (simulated) distribution function and the approximated distribution function based on 20 moments.