Abstract
A phasetype distribution is the distribution of the time until absorption in a finite statespace timehomogeneous Markov jump process, with one absorbing state and the rest being transient. These distributions are mathematically tractable and conceptually attractive to model physical phenomena due to their interpretation in terms of a hidden Markov structure. Three recent extensions of regular phasetype distributions give rise to models which allow for heavy tails: discrete or continuousscaling; fractionaltime semiMarkov extensions; and inhomogeneous timechange of the underlying Markov process. In this paper, we present a unifying theory for heavytailed phasetype distributions for which all three approaches are particular cases. Our main objective is to provide useful models for heavytailed phasetype distributions, but any other tail behavior is also captured by our specification. We provide relevant new examples and also show how existing approaches are naturally embedded. Subsequently, two multivariate extensions are presented, inspired by the univariate construction which can be considered as a matrix version of a frailty model. We provide fully explicit EMalgorithms for all models and illustrate them using synthetic and reallife data.
1 Introduction
Phasetype (PH) distributions have been employed extensively in applied probability since they often provide exact and explicit solutions to complex stochastic problems. Another attractive property of PH distributions is that they form a dense class in the set of distributions in the positive halfline in the sense of weak convergence (see [Section 3.2.1] Bladt and Nielsen (2017)). However, and despite their denseness, PH distributions are always lighttailed, which may be a problem when heavy tails are present.
At least three approaches to remedy this problem have been introduced in the literature. The first one, originally introduced in Bladt et al. (2015) and called the NPH class of distributions, consists of considering PH distributions scaled by nonnegative discrete random variables, N. This construction principle has the advantage that the resulting distribution maintains the interpretation as being the absorption time of a homogeneous Markov jump process but in an infinitedimensional statespace. This, indeed, allows for genuinely heavy tails for the resulting distribution. For instance, in RojasNandayapa and Xie (2018), the authors showed that if the scaling component is unbounded (but otherwise arbitrary), then the resulting distribution is always heavytailed in terms of nonexistent moment generating functions (see also Su and Chen (2006) for more general results). However, their different functionals are in terms of infinitedimensional matrices, which in practice, can only be computed up to a finite number of terms. More recently, in Albrecher et al. (2021a), the authors considered continuous scaling and showed that closedform expressions for different functionals of the resulting distributions can be obtained. They denoted this class by CPH. Another advantage of continuous scaling is that it maintains the (finite) dimension of the underlying PH.
A second approach was introduced in Albrecher et al. (2020a) by considering a timefractional version of the underlying stochastic process dynamics, effectively moving into the semiMarkov domain. Together with subsequent multivariate extensions based on rewards (cf. Albrecher et al. (2021, 2020b)), these models were shown to be feasible models for applications such as nonlife insurance modeling. More recently, Bladt (2021) showed that these models are relevant in describing lifetimes and performing the corresponding lifeinsurance calculations.
The third approach, introduced in Albrecher and Bladt (2019), consists of allowing the Markov jump process to be timeinhomogeneous in the construction principle of PH distributions leading to the class of inhomogeneous phasetype (IPH) distributions. An advantage of this approach is that one gains substantial flexibility on the tails: not only are heavy tails possible but also, e.g., lighter tails than exponentialdecay can be obtained. Further extensions to covariatedependent distributions can be found in Albrecher et al. (2021b), which is particularly wellsuited for survival analysis applications.
Estimation of PH distributions was initially developed to calibrate such stochastic models to reallife data, and it is a welldeveloped topic in the literature. It is typically done via an expectationmaximization (EM) algorithm (Asmussen et al. (1996)), although other methods such as an MCMC approach have been introduced (Bladt et al. (2003)). More recent trends have moved towards considering PHbased models purely as flexible models for statistical fitting, irrespectively of their explicit and closedform formulas. This datadriven approach is particularly attractive compared to other classical alternatives (for instance, kernel smoothing) since there is the implicit interpretation of an underlying process traversing through different states before it terminates, which is easy to justify in many application areas. Algorithms for discretelyscaled PH distributions, IPH models, and continuouslyscaled PH distributions can be found, respectively, in Bladt and RojasNandayapa (2018), Albrecher et al. (2020), and Albrecher et al. (2021a). To the best of the authors’ knowledge, an EMbased estimation procedure for fractional phasetype distributions (also called matrix MittagLeffler distributions) has not been considered before the present work, with Albrecher et al. (2020a) performing a purely numerical multidimensional maximumlikelihood estimation.
The primary purpose of this paper is to present a unified theory that englobes the above approaches to produce heavytailed phasetype distributions. The construction principle of the proposed models is simple to conceptualize and can be seen as a matrix extension of the frailty model in survival analysis. However, the flexibility of the underlying Markov structure allows for very different objects to be constructed as special cases. More precisely, we study IPH distributions with intensity matrices scaled by any nonnegative random variable. In other words, we impose both a random and a deterministic component which modify the speed at which the finite statespace is traversed by the Markov process, such that absorption times can possess any desired tail and body behavior, in particular obtaining heavytailed distributions. Inhomogeneous generalizations of Albrecher et al. (2021a); RojasNandayapa and Xie (2018), the matrix MittagLeffler models of Albrecher et al. (2020a), and randomly scaled generalizations of Albrecher and Bladt (2019); Albrecher et al. (2021b) (with the possibility of missing covariates) are all comprised in this rich class.
In terms of physical interpretation, the latent variables play different roles. The underlying Markov dynamics aim to model heterogeneity by assuming that unobserved traversing of states has occurred. In contrast, the interpretation of the scaling component is closely related to the statistical concept of frailty. Recall that frailty models (see, e.g., Wienke (2010) for a comprehensive account of such models) specify a multiplicative random effect on the hazard rate of a distribution, effectively accounting for unobserved covariates in a Cox proportional hazards model. In contrast, we specify a multiplicative random effect on the intensity function of a Markov jump process. Nonetheless, since for IPH distributions, the hazard rate and intensity function are asymptotically equivalent (cf. Albrecher et al. (2021b)), the scaling variable can also be interpreted as accounting for heterogeneity or missing covariates in an asymptotically proportional hazards model.
The secondary aim of the paper is to present multivariate models based on this construction, which can be interpreted as generalizations of the shared and correlated frailty models (cf. Wienke (2010)). We derive EM algorithms for maximumlikelihood estimation of all the proposed models, which can be implemented either in full generality or by simplifying some assumptions and tailoring the methods for the specific application. For pedagogical reasons, we build up the multivariate case from the univariate one, although a topbottom approach is also possible.
The rest of the paper is organized as follows. In Section 2, we present an overview of the class of IPH distributions and some important properties for our present purposes. In Section 3, we introduce our main univariate model, which we call scaled inhomogeneous phasetype, derive its main properties, give several parametric examples relevant for reallife applications, and propose a generalized EM algorithm for its maximumlikelihood estimation. In Section 4, we present a multivariate extension inspired by the shared frailty model and show how estimation of the proposed models can be performed via EM algorithms. In Section 5, we present a different multivariate extension, now based on the construction principle of correlated frailty models, and derive an EM algorithm for maximumlikelihood estimation. In Section 6, we present several numerical illustrations. Finally, Section 7 concludes.
2 Preliminaries
This section presents the relevant preliminaries on timeinhomogeneous Markov jumpprocesses and their absorption times. The distributions of the latter times are the building blocks for the scaled models introduced in Section 3. For distributional equality between two random variables X, Y, we use the notation \(X{\mathop {=}\limits ^{d}}Y\), while the notation \(X\sim F\) for F a distribution function, density, or acronym is understood as X following the distribution uniquely associated with F. Unless stated otherwise, equalities between random objects hold almost surely. For two realvalued functions, g, h the terminology \(g(t)\sim h(t)\), as \(t\rightarrow a\in \mathbb {R}\cup \{\infty ,+\infty \}\) is defined as \(\lim _{t\rightarrow a}g(t)/h(t)=1\). If a is not explicitly mentioned, it is assumed to be \(+\infty\).
Let \(( X_t )_{t \ge 0}\) denote a timeinhomogeneous Markov jump process on the statespace \(E = \{1, \dots , p, p+1\}\), where states \(1,\dots ,p\) are transient and state \(p+1\) is absorbing. In this way, \(( X_t)_{t \ge 0}\) has an intensity matrix of the form
Since \(\varvec{\Lambda }(t)\) is an intensity matrix, the sum of its rows is zero for any time \(t\ge 0\), and so the identity \(\mathbf{t} (t)= \varvec{T}(t) \, \mathbf{e} ,\) holds, where \(\mathbf{e}\) is the p–dimensional column vector of ones. Moreover, the probability transition matrix \(\varvec{P}(s, t) = \{p_{k,l}(s,t)\}_{k,l \in E}\) of \(( X_t )_{t \ge 0}\), where
is given in terms of the product integral (see Albrecher and Bladt (2019))
To avoid degeneracies, we assume that the process starts almost surely in a nonabsorbing state \(k\le p\) with probabilities given by \(\pi _{k} = {\mathbb P}(X_0 = k)\), \(k = 1,\dots , p\). In vector notation, we write \(\varvec{\pi }= (\pi _1 ,\dots ,\pi _p )\). In the sequel, we follow the convention that greek boldface lowercase letters are rowvectors, while roman boldface lowercase letters are columnvectors. Thus \(\sum _{k=1}^p\pi _k=\varvec{\pi }\mathbf{e} =1\).
The main quantity of interest of such a process for our present purposes is the time taken to reach the absorbing state, denoted by
which has an inhomogeneous phasetype distribution (cf. Albrecher and Bladt (2019)) with representation \((\varvec{\pi },\varvec{T}(t))\), and we write \(\tau \sim \text{ IPH }(\varvec{\pi },\varvec{T}(t) )\). Application of such random variables to statistical modeling is often treated for the special case \(\varvec{T}(t) = \lambda (t)\,\varvec{T}\), with \(\lambda (t)\) some known nonnegative real function, known as the intensity function, and \(\varvec{T}\) a fixed subintensity matrix. We adopt this approach in the present text. Thus we may simply write \(\tau \sim \text{ IPH }(\varvec{\pi }, \varvec{T}, \lambda )\). The interested reader is referred to Bladt and Nielsen (2017) for a comprehensive account of the \(\lambda \equiv 1\) case and Albrecher and Bladt (2019) for further reading on general IPH distributions.
The restricted class of IPH distributions is nonetheless quite versatile. Whenever \(Y \sim \text{ IPH }(\varvec{\pi }, \varvec{T}, \lambda )\), then there exists a function h such that
where \(Z \sim \text{ PH }(\varvec{\pi }, \varvec{T})\). More specifically, the relationship between h and \(\lambda\) is given by
or in terms of derivatives
To make sure that Y is positive, unbounded, and almost surely finite, we require that
The density \(f_Y\) and survival function \(S_Y\) of \(Y \sim \text{ IPH }(\varvec{\pi }, \varvec{T}, \lambda )\) are explicit in terms of matrix exponential formulas, and given by
The tail behavior of IPH distributions is driven by the asymptotic behavior of the \(\lambda\) function. Table 1 presents an overview of some commonly used intensities and transforms for generating parametric IPH distributions (see Bladt and Yslas (2021)). Applications and estimation can be found, for instance, in Albrecher and Bladt (2019); Albrecher et al. (2020, 2021b). Their names are inspired by the \(p=1\) case, e.g., a matrixWeibull distribution reduces to the regular Weibull distribution when \(\varvec{T}\) is a \(1\times 1\) matrix. In general, the additional parameters allow for more flexible modeling in the body of the distribution while preserving the same tail behavior as the scalar case.
3 Scaled inhomogeneous phasetype distributions
In this section, we introduce the main general specification of the paper and then derive some special cases together with a detailed analysis of their specific tail asymptotics. The central assumption underpinning our model is that an individual’s intensity function depends on an unobservable nonnegative random variable \(\Theta\). More specifically, we focus on the case where \(\Theta\) acts multiplicatively on the intensity function, that is
where \(\lambda\) is the baseline intensity function. If we denote by Y a random variable with intensity (2), then we have that
For the representation of these distributions, we make use of functional calculus. More specifically, if g is an analytic function and \(\varvec{A}\) is a matrix, we can express \(g(\varvec{A})\) by Cauchy’s formula
where \(\Gamma\) is the simple closed path in \(\mathbb {C}\) which encloses the eigenvalues of \(\varvec{A}\) (cf. [Section 3.4.] Bladt and Nielsen (2017) for details).
The following result characterizes the density and survival functions of Y. In particular, observe that the asymptotic behavior of the tail of Y depends on both the shape of \(\mathcal {L}_\Theta\), the Laplace transform of \(\Theta\), and on \(\lambda\). In subsection 3.1, we give an indepth asymptotic analysis of the new parametric models presented in this paper.
Proposition 3.1
Let Y be given by (3). Then we have that, for \(y\ge 0\),

I
\(S_{Y}(y) = \varvec{\pi }\mathcal {L}_\Theta (  h^{1}(y) \varvec{T}) \mathbf{e}\),

II
\(f_Y(y) = \lambda (y) \varvec{\pi }\mathcal {L}_\Theta ^{\prime } (  h^{1}(y) \varvec{T}) \mathbf{t}\),
where \(h^{1}(y)= \int _{0}^{y}\lambda (t)dt\).
Proof
Property (1) follows from
where we have used functional calculus to define the Laplace transform evaluated at a matrix. Taking derivatives in the expression above yields
from which (2) follows.
The following lemma shows that Y has the same distribution as the transformation of a scaled PH distribution. Such a representation is useful for simulation and for estimation, as is apparent in later sections.
Lemma 3.1
Let Y be given in terms of (3). Then, \(Y {\mathop {=}\limits ^{d}}h(Z /\Theta )\), where \(Z \sim \text{ PH }(\varvec{\pi }, \varvec{T})\), independent of \(\Theta\), and \(h^{1}(y)= \int _{0}^{y}\lambda (t)\ dt\).
Proof
We now make the following formal definition of a random variable Y satisfying (3).
Definition 3.1
A random variable Y is said to have scaled inhomogeneous phasetype distribution (SIPH) with representation \((\varvec{\pi }, \varvec{T}, \lambda )\) and scaling distribution \(F_\Theta\) if its survival function is given by
We write \(\text{ SIPH }(\varvec{\pi }, \varvec{T}, \lambda , \Theta )\).
Remark 3.1
(Existing special cases of heavytailed PH models).

i)
For \(\lambda \equiv 1\) and \(\Theta \in \mathbb {N}\), almost surely, we obtain the class of NPH distributions introduced in Bladt et al. (2015), while for \(\lambda \equiv 1\) and \(\Theta \in \mathbb {R}_+\), almost surely, we recover the CPH class in Albrecher et al. (2021a); RojasNandayapa and Xie (2018).

ii)
Consider a Matrix Mittag Leffler (fractional phasetype) random variable \(Y \sim \text{ MML }(\alpha , \varvec{\pi }, \varvec{T})\) as introduced in Albrecher et al. (2020a). Then, it can be shown that
$$\begin{aligned} Y \ {\mathop {=}\limits ^{d}} \ Z^{1/\alpha } S_{\alpha } = (Z S_{\alpha }^{\alpha } )^{1/\alpha } \,, \end{aligned}$$where \(Z \sim \text{ PH }(\varvec{\pi }, \varvec{T})\) and \(S_{\alpha }\) is an independent (positive stable) random variable with Laplace transform given by \(\exp (u^{\alpha })\), \(\alpha \in (0,1]\). Hence, we have that Y is SIPH distributed with \(h(x) = x^{1/\alpha }\) and \(\Theta = 1/S_{\alpha }^{\alpha }\). This class of distributions is the timefractional counterpart of PH distributions and can be interpreted as absorption times of a stochastic process that traverses through a finite number of states. The holding times of the latter are MittagLeffler distributed, which are regularly varying, and thus can possess abnormally large holding times compared to a Markov framework. However, the boundary case \(\alpha =1\) corresponds to the usual exponential holding times, and thus there is a regimeshift with respect to tail behavior.

iii)
When the scaling component \(\Theta\) degenerates to a point \(\Theta \equiv k\in \mathbb {R}_+\), we recover the class of IPH distributions. This also implies that the class of SIPH distributions, with a given and fixed intensity, is dense in the class of distributions on the positive real line. The argument is omitted, but it is a simple application of convergence through the diagonal of an array, for instance, by choosing a sequence of scalings \(\Theta _n\) with constant mean k and variances shrinking to zero.
Remark 3.2
Recall that for a continuous and positive random variable Y, the hazard function \(\mu _Y\) is given by
Sometimes, it is convenient to deal with the cumulative or integrated hazard function \(M_Y\), which is given by
The classical frailty model in survival analysis assumes that the hazard function of an individual depends on an unobservable random variable \(\Theta\). More specifically, it assumes that \(\Theta\) acts multiplicatively on a baseline hazard function \(\mu\), that is
Here, the random variable \(\Theta\) is known as the frailty. If we denote by Y the random variable with the above hazard, then the survival function of \(Y \mid \Theta = \theta\) is given by
Thus, the unconditional survival function of Y is given by
Furthermore, model (4) can incorporate covariates \(\mathbf{X} = (X_1,\dots , X_{q})^{\top }\in \mathbb {R}^q\) in a similar way to the Cox’s proportional hazards model via
where \(\varvec{\beta }\in \mathbb {R}^q\) is a qdimensional parameter row vector. Note that when the frailty degenerates to \(\Theta \equiv 1\), one recovers the proportional hazards model, meaning that the frailty model generalizes the proportional hazards model. Commonly employed distributions as frailties include the Gamma and the positive stable distributions, among others.
In Albrecher et al. (2021b), it was shown that the intensity function of an IPH distribution is asymptotically equivalent to its hazard function. More specifically, we have that \(\lambda (t) \sim C \mu (t)\) as \(t \rightarrow \infty\) with \(C>0,\) a positive constant. In particular, when \(p = 1\), the previous asymptotic result becomes equality. It follows that the frailty model is a special case of our more general matrix specification of SIPH distributions, when \(p=1\).
Remark 3.3
(Incorporating regressors). As in the frailty model, we can introduce covariates into (2) via
In this case, we write \(Y \sim \text{ SIPH }(\varvec{\pi }, \varvec{T}, \lambda , \Theta , \varvec{\beta })\) to denote a random variable with above intensity. Note that the proportional intensities model introduced in Albrecher et al. (2021b) is retrieved if the scaling distribution degenerates to \(\Theta \equiv 1\) for all individuals. Consequently, the SIPH model is a generalization of the proportional intensities model.
In what follows, we mostly restrict ourselves to the model (2) without covariates, the extension being straightforward but somewhat distracting to the current train of thought. Moreover, we assume that \(\Theta\) is a continuous random variable unless stated otherwise.
3.1 Novel examples
Next, we present a suite of new examples that arise naturally as matrix extensions of some wellknown frailty models, providing along the way some insight into the precise asymptotic behavior of the proposed models. In Appendix 1, the definitions of the different classes of heavytailed distributions are provided.
Example 3.4
(Gamma Scaling). Consider \(\Theta \sim \text{ Gamma }(\alpha , 1)\), \(\alpha >0\), with Laplace transform
Then, the survival function \(S_{Y}\) of Y is given by
As for the matrixPareto type II laws introduced in Albrecher et al. (2021a), taking more general \(\Theta \sim \text{ Gamma }(\alpha , \gamma )\), \(\gamma >0\), results in the same class of distributions. For this reason, we work only with \(\text{ Gamma }(\alpha , 1)\). Consider now the particular case \(\lambda (y) = \eta y^{\eta  1}\), \(\eta >0\), then
We call this the MatrixBurr distribution.
Regarding the asymptotic behavior, we have that
where C is a positive constant, which follows from an eigenvalue decomposition of \(\varvec{T}\). The firstorder precise asymptotics for the different intensities from Table 1 are provided in Table 2, where D, b, and c denote positive realvalued constants, which may change between intensities, but we write the same symbol for display purposes. Throughout the rest of this section, we use the same notational convention.
Example 3.4
(Positive stable scaling). Consider \(\Theta\) positive stable with stability parameter \(\alpha \in (0,1]\). Then
As a particular case, take \(\lambda (y) = \eta y^{\eta  1}\), \(\eta >0\). Then
It was noted in Albrecher et al. (2021a) that \((\varvec{\pi },  (\varvec{T})^{\alpha })\) is a PH representation. Thus, some simple calculations show that these distributions span the same class as the matrixWeibull laws introduced in Albrecher and Bladt (2019). This is in contrast to the class of CPH distributions with stable mixing in Albrecher et al. (2021a), which only span the matrixWeibull laws with \(\eta \in (0,1)\).
Regarding their asymptotic behavior, we have
Table 3 gives the precise asymptotics for the different intensities of Table 1.
Example 3.6
(Inverse Gaussian scaling). Consider inverse Gaussian scaling with parameters \(\nu >0\) and \(\eta >0\) and density
Then, the corresponding Laplace transform of \(\Theta\) is given by
We take the particular case \(\nu = 1\) and \(\sigma ^2 = 1/\eta\). In this way
Thus,
Regarding the asymptotic behavior, we have that
Table 4 gives the precise asymptotics for the different intensities of Table 1.
Example 3.7
(PVF scaling). Consider the family of power variance function (PVF) distributions with Laplace transform
where \(\nu >0\), \(\eta >0\) and \(0 < \gamma \le 1.\) This family includes the Gamma, inverse Gaussian and the positive stable distributions as particular cases. Here we assume that \(\nu = 1\), which results in
Regarding the asymptotic behavior, we have that
which results in the same asymptotics of Table 3 for the positive stable case, but with \(\alpha\) replaced by \(\gamma\).
Example 3.8
(Compound Poisson scaling). Consider a compound model \(\Theta = \sum _{i = 1}^N V_i\) with \(V_1, V_2, \dots\) i.i.d. random variables independent of N. In general, the Laplace transform of \(\Theta\) is given by
In particular, for \(V\sim \text{ Gamma }(\alpha , 1)\) and \(N \sim \text{ Poisson }(\rho )\), we obtain
Thus,
Note that this distribution has an atom at infinity with probability \(\exp (\rho )\), corresponding to the probability of \({\mathbb P}(N= 0)\). In survival analysis terms, this means that an individual may never experience the event of interest with such probability. Considering \(N+1\) instead of N removes such an atom.
Example 3.9
(Discrete scaling). Assume that \(\Theta\) is a discrete random variable taking values in \(\{\eta _1, \eta _2, \dots \}\subset \mathbb {R}_+\) with corresponding probabilities \(\varvec{\alpha }= (\alpha _1, \alpha _2, \dots )\), that is, \({\mathbb P}(\Theta = \eta _i ) = \alpha _i\), \(i=1,2,\dots\). Then,
Define the linear transformation \(\tilde{\varvec{T}}\) on \(\mathbb {R}^\mathbb {N}\) given by
Then, we can rewrite the survival function of Y as
where \(\otimes\) denotes the Kronecker product, and \(\tilde{\mathbf{e }}\) is a column vector of ones of appropriate dimension. This can be thought of as an infinitedimensional IPH distribution. The case \(\lambda \equiv 1\) recovers the class of NPH distributions introduced in Bladt et al. (2015).
Note that another approach to study the asymptotic behavior, and that is particularly convenient in the discrete scaling case, is to use the representation \(Y = h(Z/\Theta )\), so that
and employ the asymptotics of \(Z/\Theta\). For instance, taking \(\Theta \sim \text{ Gamma }(\alpha , 1)\), we have that \(Z/\Theta\) is regularly varying with index \(\alpha\) (see Albrecher et al. (2021a) for details). This leads to the same asymptotic results in Table 2 for the different choices of intensities \(\lambda\). For the discrete scaling, we could take, for instance, \(\Theta\) with Zeta distribution leading to the same asymptotic results.
As a second case, take \(V := 1/\Theta\) with Weibulltype tail so that VZ has Weibulltype tail with shape parameter in (0, 1) (see RojasNandayapa and Xie (2018)). Thus, the asymptotic behavior for the different intensities resemble those in Table 3.
Example 3.10
(Missing covariates in the proportional intensities model). Consider the proportional intensities model (also known as PH regression) introduced in Albrecher et al. (2021b) with vectors of observed and unobserved covariates \(\mathbf{X} _1\) and \(\mathbf{X} _2\), respectively. Namely, the intensity is of the form
Given that the vector \(\mathbf{X} _2\) is unknown, the model cannot be employed in practice. However, we can assume that
is an unobserved random variable independent of \(\mathbf{X} _1\). In this way, the scaled intensity model can be employed to account for the effect of omitted covariates by considering a parametric model for \(\Theta\). Such additional random component can thus help account for additional variability observed in data that cannot be explained by a simpler model.
3.2 Parameter estimation
In order to derive an EM algorithm for SIPH distribution, we first recall the corresponding algorithm for CPH distributions in Albrecher et al. (2021a) (see Bladt and RojasNandayapa (2018) for the discrete scaling case). Consider \(y_1, \dots , y_K\) an i.i.d. sample from a CPH distributed random variable Y, which we will also denote by \(\mathbf{y}\). Here, we assume that the scaling component \(\Theta\) belongs to a parametric family depending on the parameter vector \(\varvec{\alpha }\) and denote by \(f_\Theta\) its corresponding density. We now make the following definitions. Let \(B_k\) be the number of times the underlying Markov jump process of Y starts in state k, \(N_{kl}\) the total number of transitions from state k to l until absorption, \(N_k\) the number of times that k was the last state to be visited before absorption, and finally, let \(Z_k\) be the cumulated time that the Markov jump process spent in state k. The detailed routine for estimation of CPH distributions is given in Algorithm 1.
We now derive a generalized EM algorithm for maximumlikelihood estimation of SIPH distributions. Assume that \(\lambda (\,\cdot \, ; \mathbf \eta )\ge 0\) is a nonnegative parametric function depending on the vector \(\mathbf \eta\). Let \(Y \sim \text{ SIPH }(\varvec{\pi }, \varvec{T}, \lambda (\,\cdot \, ; \mathbf \eta ), \Theta , \varvec{\beta })\), then
where \(Z \sim \text{ PH }(\varvec{\pi }, \varvec{T})\). In particular, this implies that \(h^{1}(Y; \mathbf \eta ) \exp (\varvec{\beta }\mathbf{X} ) {\mathop {=}\limits ^{d}}Z /\Theta\), meaning that \(h^{1}(Y; \mathbf \eta ) \exp (\varvec{\beta }\mathbf{X} )\) is scaled PH distributed. Consider now \(y_1,\dots ,y_K\) an i.i.d. sample from this Y, then the EM algorithm for parameter estimation is the following.
Proposition 3.2
Algorithm 2 increases the likelihood function at each iteration. Since for fixed p, the likelihood of SIPH distributions is bounded, convergence towards a (possibly local) maximum is guaranteed.
Proof
By the change of variable theorem, we have that
Consider parameter values \((\mathbf \pi _i,\varvec{T}_i,\mathbf \alpha _i,\mathbf \eta _i, \varvec{\beta }_i)\) after the ith iteration. Then the data loglikelihood after the ith iteration is given by
In the \((i + 1)\)th iteration, we first obtain \((\mathbf \pi _{i+1},\varvec{T}_{i+1},\mathbf \alpha _{i+1})\) in 1. so that
Finally, by 2.
Remark 3.4
The optimization problem
of Algorithm 2 is computationally heavy. However, observe that fewer iterations of any optimization routine are sufficient for the proof and conclusion of Proposition 3.2 to hold, and full convergence of (5) is not necessary. For instance, one step of the \(arg\,max\) routine can already provide good results.
Remark 3.5
(Incorporating rightcensoring). Algorithm 2 can be modified to work with censored data. We illustrate the changes by considering only the case of rightcensoring since it is the most common scenario in survival analysis applications. However, leftcensoring and intervalcensoring can be treated by similar means. In such a case, we no longer observe \(Y = y\) but instead only that \(Y \in [v, \infty )\). By monotonicity of h, we have that \(h^{1}(Y; \mathbf \eta ) \exp (\varvec{\beta }\mathbf{X} ) \in [h^{1}(v; \mathbf \eta ) \exp (\varvec{\beta }\mathbf{X} ) , \infty )\), which can be interpreted as a censored observation of a scaled PH distributed random variable. Moreover, in Albrecher et al. (2021a) (and Bladt and RojasNandayapa (2018)), a modified EM algorithm for the estimation of scaled PH distributions is presented for the case of censored observations. This means that the main change in Algorithm 2 is in step 2, where we must now compute
3.3 Estimation for fractional PH distributions
A key distinction of the matrix MittagLeffler distribution (or fractional PH), with respect to the other models introduced in Section 3.1, is that the transformation \(h(x) = x^{1/\alpha }\) and the mixing distribution \(\Theta = 1/S_{\alpha }^{\alpha }\) depend on the same parameter \(\alpha\). This makes statistical estimation very challenging by adhoc methods, and thus embedding into the SIPH class is useful for this purpose. Note that the transformation parameters are different from the scaling component’s parameters for the previously presented models, and this last scenario is the central assumption in the derivation of Algorithm 2. Thus, special treatment must be taken for the estimation of matrix MittagLeffler distributions when seen as SIPH distributions. This is now solved by employing a modified EM algorithm, the details given in Algorithm 3.
By the same method of proof of Algorithm 2, one can show that Algorithm 3 increases the likelihood in each iteration, and hence we omit the details for brevity.
4 Shared scaling
This section presents a multivariate extension of SIPH distributions, inspired by the construction principle of the shared frailty model. The key idea is to think of an underlying random variable which is a common scaling factor to all the coordinates of an independent random vector, creating dependency and heavytailedness all at once through the same mechanism.
4.1 A class of multivariate CPH distributions
Before going into full generality, we consider the case where there is no deterministic timetransform component. This allows for a more transparent treatment with explicit formulas. Thus, consider the conditionally independent random variables \(\mathbf{Y} =(Y_1,\dots ,Y_d)^{\top }\) given \(\Theta =\theta\) such that
Then, the joint survival function of \(\mathbf{Y}\) is given by
where \(\oplus , \otimes\) denote the Kronecker sum and product, respectively. In particular, this yields the joint density
where \(\tilde{\mathbf{t }} = \mathbf{t}_1\otimes \cdots \otimes \mathbf{t}_d\) and \(\mathcal {L}_\Theta ^{(d)}(u)\) is the derivative of order d of \(\mathcal {L}_\Theta (u)\), which can again be shown by the use of functional calculus through Cauchy’s formula. Moreover, marginally we get continuously scaled PH behavior:
Alternatively, it is easy to see that \(\mathbf{Y}\) has representation \((Y_1, \dots , Y_d)^{\top } = (Z_1, \dots ,Z_d)^{\top }/ \Theta\), where \(Z_i\) are independent \(\text{ PH }(\mathbf \pi _i,\varvec{T}_i)\) distributed random variables independent of \(\Theta\), \(i =1, \dots , d\). Indeed,
These multivariate distributions were studied from another perspective in Furman et al. (2021), where the authors derived some properties in the context of risk management. We presently derive some probabilistic properties, provide an estimation method, and extend the class to allow for deterministic time transforms. In the next section we also allow for scaling of different components of the random vector by different (but correlated) scaling random variables. Since these distributions will be the building blocks of the more general timeinhomogeneous multivariate models presented in Section 4.3, a good understanding of the former facilitates the treatment of the latter.
Example 4.1
(Gamma scaling). Consider \(\Theta \sim \text{ Gamma }(\alpha , 1)\), \(\alpha >0\), then the joint survival function of \(\mathbf{Y}\) is given by
This distribution can be seen to be a matrix version of Mardia’s multivariate Pareto distribution (see Mardia et al. (1962)).
4.2 Parameter estimation: multivariate CPH distributions
We now present a generalized EM algorithm for maximumlikelihood estimation of the class of multivariate CPH distributions introduced previously. The complete data is the scaling component \(\Theta\) together with the conditionally independent Markov jump processes paths. We further assume that \(\Theta\) belongs to a parametric family depending on the vector \(\varvec{\alpha }\) and denote by \(f_\Theta\) its corresponding density.
Consider observations \(\mathbf{y} _n = (y_{n}^{(1)}, \dots , y_{n}^{(d)})^{\top }\), \(n =1 ,\dots , K\), from a multivariate CPH distributed random vector, and let \(\tilde{\mathbf{y }}\) denote the whole data set. We also denote by \(\tilde{\varvec{\pi }}\) and \(\tilde{\varvec{T}}\) the sets of parameters \(\{\varvec{\pi }_1, \dots , \varvec{\pi }_d\}\) and \(\{\varvec{T}_1,\dots , \varvec{T}_d\}\), respectively, and \(\pi _{k}^{(i)}\) and \(t_{kl}^{(i)}\) to refer to the entries of \(\varvec{\pi }_i\) and \(\varvec{T}_i\), \(i = 1, \dots , d\). In order to write down the complete likelihood \(L_c(\tilde{\varvec{\pi }}, \tilde{\varvec{T}},\mathbf \alpha ;\tilde{\mathbf{y }})\), we need the following definitions. For each \(i = 1, \dots ,d\), let \(B_k^i\) be the number of times the underlying Markov jump process of \(Y_i\) starts in state k, \(N_{kl}^i\) the total number of transitions from state k to l until absorption, \(N_k^i\) the number of times that k was the last state to be visited before absorption, and finally, let \(Z_k^i\) be the cumulated time that the Markov jump process spent in state k.
Then, the complete likelihood is given by
with corresponding loglikelihood (discarding the terms which do not depend on any parameters)
Regarding the Estep, which consists of computing the conditional expectation of the loglikelihood given the observed data, the calculations are somewhat similar to those of Albrecher et al. (2021a). We illustrate the procedure by computing the conditional expectation of the logarithmic term. Consider one (generic) data point (\(K = 1\)) and let \(\mathbf{y} = \mathbf{y} _1\). Then
The formulas for all the other statistics are derived by similar calculations.
Concerning the Mstep, consisting of maximizing the conditional expected loglikelihood in terms of the parameters, for the parameter \(\varvec{\alpha }\) of the scaling component we have in full generality
Regarding the PH component’s parameters, the entries of the subintensity matrix can be found by direct differentiation of the loglikelihood, while for the vector of initial probabilities, we can employ a Lagrange multiplier argument. We omit further details for brevity. We summarize the complete procedure in Algorithm 4.
4.3 A class of multivariate SIPH distributions
We now proceed to incorporate deterministic timeinhomogeneity into the shared scaling construction. Consider conditionally independent random variables \((Y_1,\dots ,Y_d)^{\top }\) given \(\Theta =\theta\) by
Then
and
where \(h^{1}_i (y) = \int _0^{y} \lambda _i(t) dt\), \(i = 1, \dots , d\). Note that \(\mathbf{Y}\) has representation \((Y_1, \dots ,Y_d)^{\top } = (h_1(Z_1/\Theta ), \dots ,h_d(Z_d/\Theta ))^{\top }\), which can be seen as follows
Example 4.2
(Positive stable scaling). Take \(\Theta\) positive stable with stability parameter \(\alpha \in (0, 1]\), then
For the particular case \(\lambda _i(y) \equiv \eta _i y^{\eta _i  1}\), \(\eta _i>0\), \(i = 1, \dots , d\), we have
This joint distribution can be seen to be a matrixparameter version of the multivariate Weibull distribution introduced in Manatunga and Oakes (1999).
Remark 4.1
Covariates can be incorporated into the model by assuming that the intensities are of the form
Remark 4.2
(Shared frailty model). In the shared frailty model, it is assumed that a group of individuals is conditionally independent given the frailty. In this way, the conditional joint survival function of \(\mathbf{Y} \mid \Theta =\theta\), \(\mathbf{Y} = (Y_1, \dots , Y_d)^{\top }\), is given by
where \(M_i\) are baseline cumulative hazards, \(i = 1,\dots , d\). Thus, the joint survival function of \(\mathbf{Y}\) is given by
Using that
the above joint survival function can be rewritten as
In particular, this means that the survival copula of \(\mathbf{Y}\) is an Archimedean copula. Note that the shared frailty model is a particular case of the class of multivariate SIPH distributions introduced here when \(p = 1\).
We now study the dependence structure of multivariate SIPH distributions. When \(p = 1\), the survival copula of \(\mathbf{Y}\) is an Archimedean copula. To study the more general case, note that all the transformations presented in Table 1 are strictly increasing. This means that the copulas for models based on these intensities are the same as the ones of the models presented in Section 4.1, and thus it is enough to study the later case. Define the coefficient of upper tail dependence as
Proposition 4.1
Let \(V := 1/ \Theta\) be regularly varying with index \(\alpha >0\). Then
where \(\tilde{\varvec{T}}_i := \varvec{T}_i \mathbb {E}(Z_i^\alpha )^{1/\alpha }\), \(i =1,2\).
Proof
Given the definition of our model, Proposition 1 of Section 2 in Engelke et al. (2019) yields
where \(Z_i\) are PH(\(\varvec{\pi }_i,\varvec{T}_i\)), and
Moreover, \({Z_i}/{\mathbb {E}(Z_i^\alpha )^{1/\alpha }}\) is PH distributed with the same vector of initial probabilities \(\varvec{\pi }_i\) and subintensity matrix \(\tilde{\varvec{T}}_i = \varvec{T}_i \mathbb {E}(Z_i^\alpha )^{1/\alpha }\), \(i =1 ,2\). This implies that
which now yields
Note that the resulting explicit expression for \(\lambda _U\) is in terms of the parameters of the PH components. For instance, when considering \(\Theta \sim \text{ Gamma }(\alpha ,1)\), the survival copula of the model can be different from the Clayton copula, for which \(\lambda _U = 2^{\alpha }\). In Figure 1, we take the same value \(\alpha = 1\) and plot the implicit copula of two multivariate CPH distributions, one with upper tail dependence coefficient smaller than \(2^{1}\) and the other larger than \(2^{1}\), achieved solely by changing the parameters of the PH components.
4.4 Parameter estimation: multivariate SIPH distributions
If we assume that \(\lambda _i(\,\cdot \, ; \mathbf \eta _i)\) is a parametric function depending on the vector \(\mathbf \eta _i\), \(i =1 ,\dots , d\), and let \(\mathbf \eta = (\mathbf \eta _1, \dots , \mathbf \eta _d)\). Then we can use that \((h^{1}_1(Y_1 ; \mathbf \eta _1), \dots , h^{1}_d(Y_d;\mathbf \eta _d))^{\top } {\mathop {=}\limits ^{d}}(Z_1/\Theta , \dots ,Z_d/\Theta )^{\top }\) to formulate a generalized EM algorithm for maximumlikelihood estimation, which generalizes Algorithm 2 to the multivariate case.
5 Correlated scaling
We now extend the scaling of the subintensity matrix of SIPH distributions to the case where we condition on a random vector, the scaling factors being the components of such vector. We consider first the conditionally PH case, i.e. when no deterministic timetransform is present, and a scaling vector \(\mathbf \Theta = (\Theta _1, \dots ,\Theta _d)^{\top }\) and \(\mathbf{Y} = (Y_1,, \dots , Y_d)^{\top }\) such that the random variables \(Y_i\) are conditionally independent given \(\mathbf \Theta\) with laws
Then, in full generality, the joint survival function of \(\mathbf{Y}\) is given by
Consider the bivariate case. Then, using functional calculus, we have that that the joint survival function takes the explicit form
where \(\mathcal {L}_{\mathbf \Theta }\) is the joint Laplace transform of \(\mathbf \Theta\), that is
Note that \(\mathbf{Y} = (Z_1 /\Theta _1, \dots , Z_d /\Theta _d)^{\top }\), where \(Z_i\) are independent \(\text{ PH }(\varvec{\pi }_i, \varvec{T}_i)\) distributed random variables, \(i = 1, \dots , d\), independent of \(\varvec{\Theta }\). Indeed,
5.1 Parameter estimation: correlated CPH distributions
The maximumlikelihood estimation of this class of multivariate distributions can be performed via a generalized EM algorithm. The derivation is done similarly to Algorithm 4 and thus omitted for brevity. Again, for estimation, we assume that \(\varvec{\Theta }\) belongs to a parametric family depending on the vector \(\varvec{\alpha }\) and denote by \(f_{\varvec{\Theta }}\) its corresponding joint density. The resulting detailed routine is provided in Algorithm 6.
Remark 5.1
This algorithm suffers from the curse of dimensionality. The integrals above must typically be computed numerically, given that explicit expressions are not available. Thus, the number of summands needed for the approximation increases rapidly with the dimension. It is also important to mention that correlated frailty models are typically employed only in the bivariate case. In such a case, the above algorithm is computationally feasible, thus its relevance.
5.2 Correlated SIPH distributions
We now introduce an analogous model to the correlated frailty model based on IPH distributions, effectively the most general of our models. Consider a multivariate random scaling component \(\mathbf \Theta = (\Theta _1, \dots ,\Theta _d)^{\top }\) and \(\mathbf{Y} = (Y_1, \dots ,Y_d)^{\top }\), both in in \(\mathbb {R}_+^d\), such that \(Y_i\) are conditionally independent given \(\mathbf \Theta\) with conditional distribution
The joint survival function of \(\mathbf{Y}\) is then given by
In the bivariate case, we have by simple calculations (using functional calculus) the explicit expression
Note that an alternative representation for \(\mathbf{Y}\) is \(\mathbf{Y} = ( h_1 (Z_1 / \Theta _1), \dots , h_d (Z_d / \Theta _d))^{\top }\), where \(Z_i\) are independent PH distributed random variables independent of \(\varvec{\Theta }\). The proof is akin to those of previous sections.
Now we consider a specific example with explicit joint density, namely the correlated Gamma case.
Example 5.1
(Correlated Gamma scaling). Inspired by Yashin et al. (1995), we consider \(\mathbf \Theta = (\Theta _1, \Theta _2)^{\top }\) such that
where \(W_i \sim \text{ Gamma }(\kappa _i, \eta _i)\), \(\kappa _i,\eta _i>0\), \(i = 0,1,2\), are independent. Then we have that
This yields
One typically sets \(\eta _1 = \kappa _0 + \kappa _1\) and \(\eta _2 = \kappa _0 + \kappa _2\). In this way \(\mathbb {E}(\Theta _1) = \mathbb {E}(\Theta _2) = 1\), \(\text{ Var }(\Theta _1) = \eta _1^{1}\), \(\text{ Var }(\Theta _2) = \eta _2^{1}\) and \(\text{ Corr }(\Theta _1, \Theta _2) = \kappa _0 / \sqrt{(\kappa _0 + \kappa _1 )(\kappa _0 + \kappa _2)}\).
Remark 5.2
(Estimation). Maximumlikelihood estimation can be performed via a modified EM algorithm, which is in the same form as Algorithm 5 with the only change in step 1, where we now employ Algorithm 4. We omit further details.
Remark 5.3
(Correlated frailty). The correlated frailty model assumes that the frailties of individuals are correlated and not necessarily shared. More specifically, in a bivariate correlated frailty model, the conditional joint density of \(\mathbf{Y} \mid \varvec{\Theta }= \varvec{\theta }\) is
In this way, the joint survival function of \(\mathbf{Y}\) is given by
This is indeed a particular case of the correlated intensities model introduced in the present section when \(p = 1\).
6 Numerical illustrations
In this section, we present some numerical illustrations of practical relevance. In the first example, we test the performance of Algorithm 3 for the estimation of matrix MittagLeffler distributions in a simulation study. In the second example, we consider the fitting of a SIPH distribution to a theoretical given distribution. In the third example, we fit a SIPH to a reallife insurance data set. As a final example, we perform a simulation study for a multivariate CPH distribution. In all cases, we ran the generalized EM algorithms until the changes in the successive loglikelihoods became negligible.
6.1 Matrix mittagleffler distributions
We generated an i.i.d. sample of size 1, 000 from a matrix MittagLeffler distribution of 4 phases with parameters
We then fitted a matrix MittagLeffler distribution with the same number of phases to the resulting sample using Algorithm 3, obtaining the following parameters:
Observe that we can somewhat retrieve the parameters by keeping in mind possible permutation of states (since their labels are not relevant). Figure 2 shows that the algorithm recovers the structure of the data. Moreover, note that \(\hat{\alpha }= 0.7928\), which determines the heaviness of the tail, is close to the original value \(\alpha = 0.8\). As further evidence of the quality of the fit, we have that the loglikelihood of the fitted model is \(1,769.596\), while using the original distribution parameters and structure, we obtain \(1,773.453\).
6.2 MatrixWeibull
Algorithm 2 can be easily modified to approximate given theoretical distributions. As in the PH case (Asmussen et al. (1996)), the idea consists of considering sequences of empirical distributions with increasing sample size. For instance, if we denote by g the theoretical given density that we want to approximate, in step 1, we have that as \(K\rightarrow \infty\),
The rest of the formulas in step 1 are adapted through the same limit. Regarding step 2, we have
As a concrete example, we consider a MatrixWeibull distribution (as introduced in Albrecher and Bladt (2019), having no random scaling component) with density function
and parameters
Then we fitted a SIPH distribution of 3 phases with baseline intensity \(\lambda (y) = \eta y^{\eta 1}\), \(\eta >0\), and positive stable scaling. The fitted parameters are the following
The quality of the approximation is supported by Figure 3, which shows that we recover the shape of the original distribution. Moreover, the product \(\hat{\alpha }\hat{\eta } = 1.9867\), which determines the heaviness of the tail, can be compared with \(\beta = 2\) for the given theoretical model.
6.3 Reallife data
The GammaGompertz frailty model is commonly employed for modeling human mortality at old ages (see, e.g., Missov (2013); Vaupel et al. (1979)). In the present example, we propose using SIPH distributions with Gamma scaling and Gompertz baseline intensity for modeling this type of data.
As a concrete case of study, we consider the lifetimes of the Swedish population that die in the year 2011 between ages 50100. This data was obtained from the Human Mortality Database (HMD). We add covariate information by considering a separation between females (\(X=1\)) and males (\(X=0\)) in the population. Then we fitted a SIPH distribution of 4 phases with general Coxian structure in the PH component. The estimated parameters are
Figure 4 shows that the fitted distribution provides a reasonable model for both groups. If an even closer fit is sought, other parameters of the model need to be regressed as well.
6.4 Multivariate example
We generated an i.i.d. sample of size 2, 500 from a bivariate CPH distribution with parameters
and Gamma scaling with \(\alpha = 1.5\). Note that the upper tail dependence coefficient of the theoretical model is \(\lambda _U = 0.2765\), while the empirical estimator of the sample is \(\hat{\lambda }_U = 0.28\). Then we fitted a bivariate CPH model of same dimensions using Algorithm 4 obtaining the parameters
Figure 5 shows that we recover the structure of both marginals. Regarding the dependence structure, this is supported by Figure 6, where we offer some contour plots. Moreover, note that the parameter \(\alpha\) that determines the heaviness of the tails of the marginals is close to the original model and that the coefficient of upper tail dependence \(\lambda _U = 0.254\) is close to the original (and sample) one. Finally, note that the original model’s loglikelihood is \(11,753.27\), compared with \(11,752.45\) for the fitted model.
7 Conclusion
We have provided a phasetypebased model which can result in nonexponential tail behavior by introducing random and deterministic transformations. The resulting model is generally tractable in terms of matrix calculus through the Laplace transform of the random component, and thus closedform formulas allow for statistical and probabilistic treatments, for instance, for fully explicit generalized EM algorithms. In the univariate case, the current three main ways of generating heavytailed phasetype distributions fall into our framework, and several new models are introduced to complement the existing suite of hidden Markov models. In the multivariate case, we obtain generalizations of wellknown frailty models with fully explicit densities, contrary to other approaches of multivariate phasetype distributions in the literature (in terms of rewards or copulas). We finally show the feasibility of the statistical implementation of our models using four different examples.
Heavytailed phasetype distributions are statistically attractive since their interpretation in terms of an underlying evolving process is natural in many domains of application which involve processes that traverse numerous states through time, for instance, human lifetimes or legal cases. With the models and algorithms provided in this paper, we aim to provide a clearer picture of the possibilities and limitations of Markov models for practitioners that require nonstandard but interpretable models. A promising further direction of research for generating uni and multivariate scaled phasetype distributions is to consider a general stochastic process as timechange, which for certain choices may provide fully explicit functionals and estimation procedures while remaining conceptually simple.
Data availability statement
The datasets generated and analyzed during the current study are available in the Zenodo repository, at https://doi.org/10.5281/zenodo.5115819.
References
Albrecher, H., Bladt, M.: Inhomogeneous phasetype distributions and heavy tails. J. Appl. Probab. 56(4), 1044–1064 (2019)
Albrecher, H., Bladt, M., Bladt, M.: Matrix MittagLeffler distributions and modeling heavytailed risks. Extremes 23(3), 425–450 (2020)
Albrecher, H., Bladt, M., Bladt, M.: Multivariate fractional phasetype distributions. Fractional Calculus and Applied Analysis 23(5), 1431–1451 (2020)
Albrecher, H., Bladt, M., Bladt, M.: Multivariate matrix MittagLeffler distributions. Ann. Inst. Stat. Math. 73(2), 369–394 (2021)
Albrecher, H., Bladt, M., Bladt, M., Yslas, J.: Continuous scaled phasetype distributions. (2021) https://arxiv.org/abs/2103.02965
Albrecher, H., Bladt, M., Bladt, M., Yslas, J.: Mortality modeling and regression with matrix distributions. (2021) https://arxiv.org/abs/2011.03219
Albrecher, H., Bladt, M., Yslas, J.: Fitting inhomogeneous phasetype distributions to data: The univariate and the multivariate case. Scandinavian J. Stat. pages 1–34 (2020)
Asmussen, S., Nerman, O., Olsson, M.: Fitting phasetype distributions via the EM algorithm. Scand. J. Stat. 23(4), 419–441 (1996)
Bladt, M.: Fractional inhomogeneous multistate models in life insurance. Scandinavian Actuarial Journal, to appear (2021)
Bladt, M., Gonzalez, A., Lauritzen, S.L.: The estimation of phasetype related functionals using Markov chain Monte Carlo methods. Scand. Actuar. J. 2003(4), 280–300 (2003)
Bladt, M., Nielsen, B.F.: MatrixExponential Distributions in Applied Probability. Springer (2017)
Bladt, M., Nielsen, B.F., Samorodnitsky, G.: Calculation of ruin probabilities for a dense class of heavy tailed distributions. Scand. Actuar. J. 2015(7), 573–591 (2015)
Bladt, M., RojasNandayapa, L.: Fitting phasetype scale mixtures to heavytailed data and distributions. Extremes 21(2), 285–313 (2018)
Bladt, M., Yslas, J.: matrixdist: An R package for inhomogeneous phasetype distributions. (2021) https://arxiv.org/abs/2101.07987
Engelke, S., Opitz, T., Wadsworth, J.: Extremal dependence of random scale constructions. Extremes 22(4), 623–666 (2019)
Furman, E., Kye, Y., Su, J.: Multiplicative background risk models: Setting a course for the idiosyncratic risk factors distributed phasetype. Insurance Math. Econom. 96, 153–167 (2021)
Manatunga, A.K., Oakes, D.: Parametric analysis for matched pair survival data. Lifetime Data Anal. 5(4), 371–387 (1999)
Mardia, K.V., et al.: Multivariate Pareto distributions. Ann. Math. Stat. 33(3), 1008–1015 (1962)
Missov, T.I.: GammaGompertz life expectancy at birth. Demogr. Res. 28, 259–270 (2013)
RojasNandayapa, L., Xie, W.: Asymptotic tail behaviour of phasetype scale mixture distributions. Annals of Actuarial Science 12(2), 412–432 (2018)
Su, C., Chen, Y.: On the behavior of the product of independent random variables. Science in China Series A 49(3), 342–359 (2006)
Vaupel, J.W., Manton, K.G., Stallard, E.: The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16(3), 439–454 (1979)
Wienke, A.: Frailty Models in Survival Analysis. CRC Press (2010)
Yashin, A.I., Vaupel, J.W., Iachine, I.A.: Correlated individual frailty: an advantageous approach to survival analysis of bivariate data. Math. Popul. Stud. 5(2), 145–159 (1995)
Acknowledgements
MB would like to acknowledge financial support from the Swiss National Science Foundation Project 200021_191984. JY would like to acknowledge financial support from the Swiss National Science Foundation Project IZHRZ0_180549.
Funding
Open access funding provided by University of Lausanne.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest statement
MB and JY declare no conflict of interest related to the current manuscript.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Heavytailed definitions
Heavytailed definitions
Definition 8.1
A distribution function F on \(\mathbb {R}_{+}=[0, \infty )\), with corresponding survival function \(S = 1  F\), is called:

1.
Regular varying with index \(\alpha \ge 0\) if
$$\begin{aligned} \lim _{x \rightarrow \infty } \frac{S(\lambda x)}{S( x)} = \lambda ^{\alpha } \end{aligned}$$for all \(\lambda >0\). If \(\alpha = 0\), then F is called slowly varying.

2.
Weibulltype if
$$\begin{aligned} S(x) \sim c x^{\beta } \exp ( \lambda x ^{\tau } )\,, \quad x\rightarrow \infty \,, \end{aligned}$$for some constants \(\beta \in \mathbb {R}\) and \(\tau , \, \lambda , \, c > 0\). A Weibulltype distribution has heavier than exponential tail behavior if \(\tau \in (0,1)\), exponentialtype behavior if \(\tau = 1\), and lighter than exponential otherwise.

3.
Lognormaltype if
$$\begin{aligned} S(x) \sim c x^{\beta } (\log x)^{\xi }\exp ( \lambda (\log x )^{\gamma } )\,, \quad x\rightarrow \infty \,, \end{aligned}$$for some constants \(\beta , \xi \in \mathbb {R}\), \(\gamma > 1\) and \(\lambda , \, c > 0\). Note that in particular, the lognormal distribution is lognomaltype with \(\gamma = 2\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bladt, M., Yslas, J. Heavytailed phasetype distributions: a unified approach. Extremes 25, 529–565 (2022). https://doi.org/10.1007/s10687022004368
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10687022004368
Keywords
 Frailty models
 Heavy tails
 Parameter estimation
 Phasetype
 Scale mixtures
AMS 2000 Subject Classifications
 Primary 60E05
 Secondary 60G70
 62N02
 62F10
 60J22