Statistics and Computing

, Volume 24, Issue 2, pp 155–164

Estimating parametric semi-Markov models from panel data using phase-type approximations

Article

DOI: 10.1007/s11222-012-9360-6

Cite this article as:
Titman, A.C. Stat Comput (2014) 24: 155. doi:10.1007/s11222-012-9360-6

Abstract

Inference for semi-Markov models under panel data presents considerable computational difficulties. In general the likelihood is intractable, but a tractable likelihood with the form of a hidden Markov model can be obtained if the sojourn times in each of the states are assumed to have phase-type distributions. However, using phase-type distributions directly may be undesirable as they require estimation of parameters which may be poorly identified. In this article, an approach to fitting semi-Markov models with standard parametric sojourn distributions is developed. The method involves establishing a family of Coxian phase-type distribution approximations to the parametric distribution and merging approximations for different states to obtain an approximate semi-Markov process with a tractable likelihood. Approximations are developed for Weibull and Gamma distributions and demonstrated on data relating to post-lung-transplantation patients.

Keywords

B-splines Gamma distribution Hidden Markov model Misclassification Panel data Phase-type distribution Semi-Markov Weibull 

1 Introduction

Processes from a wide range of fields may be modelled as multi-state stochastic processes on a finite discrete state space in continuous time. Often continuous monitoring of the process is not possible and instead data consist of a series of snapshots of the process at potentially irregular and subject specific time points, with no information on the trajectory of the process between these times. Such data are referred to panel data, which is sometimes restricted to the case where all subjects are observed at a common set of observation times, but is here used more broadly to include cases where the observation times may be irregularly spaced and subject specific. We also allow for the possibility that the observed state may be subject to classification error.

Multi-state models under panel observation have application in a wide range of fields, for instance in financial applications such as credit risk scoring (Bladt and Sorensen 2009), social science applications such as monitoring spells of unemployment (Lancaster and Nickell 1980) and wide uses in medical and biostatistical applications for modelling spells of infection (Crespi et al. 2005) or the progression of diseases (Gentleman et al. 1994). An advantage of multi-state models is that they allow estimation of many outcomes of interest including sojourn times in states, first hitting times and mean time to terminal event (Mandel 2010). However, the accuracy of these estimates is somewhat reliant on correctly specifying the model. Analysis of panel data, with or without classification error, is generally performed using a Markov or hidden Markov model. In practice, processes can act on several different time scales. In particular, if the transition intensities between states of the process depend on the length of time already spent in that state, the process is semi-Markov.

Inference for semi-Markov models under panel data presents considerable computational difficulties. Such models can be categorized into those without recovery, i.e. where once a state has been exited it cannot be re-entered, and those with recovery. When recovery to previous states is not possible, the likelihood can be computed using numerical integration (Foucher et al. 2010), although this approach becomes difficult for models with many states. For more general semi-Markov models under panel observation, allowing recovery, the likelihood is somewhat intractable and as a result there has been relatively little work in this area. Kang and Lagakos (2007) developed methods based on numerical solution of integral equations appropriate when at least one of the transient states of the process has an exponential distribution and the non-exponential states have a guarantee time, i.e. minimum sojourn length, in each state.

Titman and Sharples (2010) developed methods for fitting semi-Markov models with phase-type sojourn distributions to panel data. The advantage of using phase-type distributions is that the model can be represented as an aggregated Markov model where occupancy in a particular state of the semi-Markov model corresponds to occupancy in a set of states in a latent Markov model. This greatly simplifies computation of the likelihood. Using 2-phase Coxian phase-type distributions, each transition intensity has three parameters which correspond to an initial intensity, a limiting intensity as the time spent in the state tends to infinity and the rate at which the intensity evolves between the two values. While this formulation provides a fairly flexible class of models, it has some disadvantages. First, the use of phase-type distributions is for computational convenience and makes interpretation harder than if more familiar distributions were used. Second, there is often a need for parsimony in multi-state models from panel data, but the 2-phase Coxian phase-type distribution requires two additional parameters compared to the exponential distribution. The inter-phase rate parameter in particular, is often difficult to estimate. Third, the inter-phase rate parameter is not identifiable under the null (Markov) model. This makes testing the Markov assumption using the phase-type approach difficult.

In this article, an alternative application of phase-type distributions for semi-Markov models is developed, which addresses these problems. Rather than fitting models with phase-type distributions, models with standard parametric sojourn distributions are used, with phase-type distributions applied to provide a computationally tractable approximate likelihood. For a particular parametric distribution, a family of phase-type distribution approximations is developed. Establishing this family of approximations involves a relatively large optimization problem but only needs to be done once and then allows semi-Markov models with a wide range of state spaces to be fitted very quickly. Using well known survival distributions such as Weibull or Gamma, makes interpretation simpler and requires fewer parameters than the 2-phase Coxian distribution.

The remainder of the article is organized as follows. Section 2 outlines likelihood computation for panel data. In Sect. 3 a method of obtaining functional approximations to families of distributions via phase-type distributions is developed. In Sect. 4, phase-type approximations to parametric semi-Markov multi-state models are developed. Section 5 is a simulation study investigating the performance of estimates based on the phase-type approximations. Section 6 gives an illustrative example of the methodology on the BOS data from post-lung-transplantation patients. The article concludes with a discussion.

2 Likelihood computation for panel data

We consider a continuous time process {X(t),t≥0} on a finite set of states indexed 1,…,R. The observed data for an individual subject consist of observed states x0,x1,…,xN at times t0<t1<⋯<tN, where these time points and the number of times N may be subject specific. The key aspect of panel data is that while the state is known at these time points, nothing is known about the trajectory of the process between these times.

Multi-state models may be defined according to the transition intensities between states
$$q_{rs}(t, \mathcal{F}_{t}) = \lim_{\delta t \downarrow0} \frac{P\{X(t + \delta t) = s | X(t) = r, \mathcal{F}_{t} \}}{\delta t} $$
where \(\mathcal{F}_{t}\) is the past history, or filtration, of the process up to time t.
Commonly analysis of multi-state panel observed data has been performed under a Markov assumption (Kalbfleisch and Lawless 1985). Here the transition intensities are only a function of t and the likelihood for an individual can be written as
$$L(\theta) = \prod_{i=1}^{N} p_{x_{i-1},x_{i}}(t_{i-1},t_{i} ; \theta) $$
where \(p_{x_{i-1},x_{i}}(t_{i-1},t_{i};\theta) = P\{X(t_{i}) = x_{i} | X(t_{i-1}) = x_{i-1}\}\) are the transition probabilities. In time homogeneous models the transition intensities may be found by evaluating a matrix exponential. More generally, the transition probabilities are the solution of the Kolmogorov Forward equations, a first order system of ordinary differential equations (Cox and Miller 1965).

2.1 Semi-Markov models

In a semi-Markov model the transition intensities depend on the time spent in the current state where T denotes the time of entry into the current state. Semi-Markov models can be thought of as a generalization of time homogeneous Markov models, where the waiting times in each state are no longer constrained to have exponential distributions. Computation of the likelihood for a set of panel observed states is much more difficult than in the Markov case.

In progressive semi-Markov models, where there is a finite number of possible paths that an individual can take conditional on their observed states, computation of the likelihood requires considering each path and integrating over the possible sojourn times in each state of the path (Foucher et al. 2010). Numerical quadrature methods can be applied to compute the likelihood but become unattractive for models with more than 3 or 4 states because of the increasing dimension of the integrals.

For more general models where backward transitions are possible, direct integration is not possible because the number of possible state visits is unbounded. Computation of the transition probabilities defined as prs(u,t)=P{X(t)=s|X(u)=r,T=u} i.e. the transition probability from state r to state s from time u to time t given entry into the state at time u, requires solution to a system of integral equations (Howard 1964; De Dominics and Manca 1984). However, the likelihood for a set of panel observed states cannot be expressed simply as the product of transition probabilities, as the observation times will not correspond to the entry time into the observed state.

2.2 State classification error

In observational studies of chronic diseases, the disease status at a clinic visit may be subject to classification error. A common model in this circumstance is to assume the observed state at time ti, Oi, depends only on X(ti) and is independent of classifications at other time points, conditional on the true disease status at those times, with classification probabilities ers=P(Oi=s|Xi=r). When the underlying process is Markov, this leads to a hidden Markov model (Jackson et al. 2003). When classification error is present, the likelihood may be calculated by summing over all the possible sequences of true states that could have resulted in the observed states. For hidden Markov models, the forward algorithm provides an efficient way to do this recursively exploiting the relationship

However, for a general hidden semi-Markov model, the forward algorithm cannot be applied meaning more general summation over all possible true state sequences is required. If backwards transitions are possible this means the number of summation terms increases exponentially with sequence length.

3 Phase-type approximations to parametric distributions

In this section, phase-type distributions are introduced and an approach to developing functional phase-type approximations to parametric distributions is developed.

For a non-negative random variable T with density function f(t), we define the survivor function as S(t)=P(Tt) and the hazard function as
$$q(t) = \lim_{\delta t \downarrow0} \frac{P(t \leq T < t + \delta t | T \geq t)}{\delta t} = \frac{f(t)}{S(t)}. $$
Phase-type distributions are probability distributions that can be represented as a random variable describing the absorption times of a continuous-time Markov process with one absorbing state. A general k-phase phase-type distribution can be characterized by the initial state vector π and the k×k sub-generator matrix S. The density is given by fπ,S(t)=πTexp(St)S0 and the survivor function is given by Sπ,S(t)=πTexp(St)1, where S0=−S1 and 1 is a length k vector of 1’s.

Many parametric failure time distributions can be expressed in terms of a rate (or scale) parameter, λ, and an additional parameter, α, typically referred to as the shape parameter. Examples include the Weibull, Gamma and log-logistic distributions and the two-parameter Birnbaum-Saunders distribution. The key property which we will exploit is that the survivor function and hazard functions for a distribution with rate parameter λ can be equated with the survivor function and hazard functions for rate parameter 1 via Sα,λ(t)=Sα,1(λt) and qα,λ(t)=λqα,1(λt), where Sα,λ(t) and qα,λ(t) denote the survivor and hazard functions for a parametric distribution with parameters (α,λ).

The use of phase-type distribution approximations to parametric distributions is common in stochastic control problems, for instance the analysis of queues with general interarrival and service time distributions (Neuts 1981). As a result there is a reasonably wide literature on developing phase-type approximations. In particular, Asmussen et al. (1996) developed an EM algorithm to fit phase-type distributions to data or known distributions and developed a C program EMpht (Olsson 1998), based on minimizing the Kullback-Leibler distance between the target distribution and the phase-type approximation.

We adapt this approach to the particular needs of approximating a semi-Markov system. First, we consider approximations based on the class of Coxian phase-type distributions. This class is desirable because it requires relatively few parameters compared to more general phase-type distributions but with comparable flexibility. The constraint within Coxian phase-type distributions that π=(1,0,0,…,0)T will also become useful when approximating a semi-Markov system. The accuracy of a phase-type approximation will increase with the number of phases chosen for the approximating distribution. However, choosing a large number of phases makes the one-off optimization much larger and will also increase routine computation times once the approximation is used within a semi-Markov system. By considering approximations with between 3 and 6 phases, it was found that, for the parametric distributions considered in this paper, the 5 phase approximation gives a substantially better approximation than 3 or 4 phases but are quite similar to 6 phases. For this reason we concentrate on developing approximations based on 5 phase Coxian phase-type distributions which have π=(1,0,0,0,0)T and constrain S such that the when in phase k, the process can only proceed to phase k+1 or else enter the absorbing state. This distribution has
$$\mathbf{S} = \left [\!\! \begin{array}{c@{\quad\!}c@{\quad\!}c@{\quad\!}c@{\quad\!}c} -\xi_1 - \mu_1 & \xi_1 & 0 & 0 & 0 \\ 0 & -\xi_2 - \mu_2 & \xi_2 & 0 & 0 \\ 0 & 0 & -\xi_3 - \mu_3 & \xi_3 & 0 \\ 0 & 0 & 0 & -\xi_4 - \mu_4 & \xi_4 \\ 0 & 0 & 0 & 0 & -\mu_5\\ \end{array} \!\right ] $$
and S0=(μ1,μ2,μ3,μ4,μ5)T.
A representation of the latent Markov process that induces the phase-type distribution is shown in Fig. 1.
Fig. 1

A 5-phase Coxian Phase type distribution

We apply an approach similar to Asmussen et al. (1996) in that we seek to minimize the Kullback-Leibler distance from the desired target distribution to the phase-type distribution. Specifically, at a particular value of α we seek to find the phase-type generator S which minimizes an approximation to the Kullback-Leibler distance: where t0=0, tN=tmax, tN+1=∞ and t1,…,tN−1 are chosen to correspond with quantiles of the target parametric distribution, such that each interval between 0 and tmax has the same probability mass. This implies that we seek to find the distribution that matches the shape of fα,λ up to tmax and also has the correct mass above tmax. Unlike existing approaches to obtaining phase-type approximations, we require a good approximation to fα,λ for a range of values of λ and α. Given the assumed scaling property of λ we can note that if fS is a good approximation to fα,1 then fλS is a good approximation to fα,λ. Specifically, if fS minimizes the KL distance for fα,1 with truncation at tmax, then fλS minimizes the KL distance for fα,λ with truncation at tmax/λ. It is therefore only necessary to find approximations for a single value of λ, e.g. λ=1.

In principle, separate phase-type fits, minimizing (1) with respect to the parameters θ of S(θ), could be performed for each α required in optimizing the likelihood of a multi-state model. However, this approach is unattractive, firstly because the requirement to fit a new phase-type approximation at each iteration would be excessively time consuming and also because sensitivity to the convergence criterion used may lead to a discontinuous likelihood surface.

We define θ(α) to be the set of parameters from a 5-phase Coxian phase-type distribution that satisfy
$$ \hat{\theta}(\alpha) = \arg\min_{\theta} \mathit{KL}(f_{\alpha ,1},f_{\mathbf{S}(\theta)}) , $$
(2)
where θ=(ξ1,ξ2,ξ3,ξ4,μ1,μ2,μ3,μ4,μ5). It is expected that θ(α) is smooth with respect to α. However, for certain values of α there may be multiple solutions to (2) corresponding to different representations of the same distribution. This issue precludes using an approach of finding θ(αi) on a grid of points, αi, and directly smoothing them. Instead we adapt the approach of attempting to solve a functional optimization problem, finding the vector function \(\hat{\theta}(\alpha)\) that minimizes
$$ \int_{\alpha_{l}}^{\alpha_{u}} \mathit{KL}(f_{\alpha ,1},f_{\mathbf{S}(\theta)}) d\alpha , $$
(3)
where αl and αu represent the lower and upper values in the range of values of α of interest. An approximate solution to this problem is sought by applying the Ritz method of variational optimization (Smith 1974) which involves assuming that the components of \(\hat{\theta}(\alpha)\) may be expressed as B-splines of some fixed order. Each element of θ then takes the form θi(α)=∑kuikBk(α) where Bk are B-spline basis functions and the same knot points are assumed for all parameters. By restricting the number of knot points we avoid the problems relating to multiple solutions of (2) at a particular value of α.

Noting that, up to a constant, the Kullback-Leibler distance as defined in (1) is equivalent to a multinomial log-likelihood, we can apply a constrained Fisher scoring-type algorithm to fit the parameters for the B-spline approximation.

3.1 Weibull and Gamma approximations

In this article we develop approximations for the Weibull distribution, with probability density function
$$f(t;\alpha,\lambda) = \alpha\lambda(\lambda t)^{(\alpha- 1)} \exp \bigl(-( \lambda t)^\alpha\bigr) $$
and the Gamma distribution with probability density function
$$f(t;\alpha,\lambda) = \frac{\lambda^{\alpha}}{\varGamma(\alpha)} t^{(\alpha- 1)} \exp(-\lambda t). $$
These distributions are particularly suited to phase-type approximations. Firstly because they both have the property that an exponential distribution occurs when α=1, meaning the Markov model is nested within the semi-Markov model. The Weibull distribution is attractive because of its extensive use as a failure time model in medical and reliability contexts. A Gamma distribution with positive integer α coincides with an Erlang distribution, which has an exact phase-type representation. This makes the Gamma distribution particularly well suited to phase-type approximations.

For the Weibull distribution, we choose as our range of values for α the interval [0.4,2]. This covers a fairly wide range of possible hazard shapes, which is sufficient for the BOS application. The dynamics of the Weibull distribution are somewhat different for α<1 than α>1. In particular, the hazard is 0 at t=0 for α>1 whereas it is unbounded for α<1. Therefore rather than seek a single B-spline solution for the full range [0.4,2], we consider separate functions for [0.4,1] and [1,2].

For the Gamma distribution, we choose a range of values of α of [0.4,5]. A higher upper limit for α is chosen because the relationship with the Erlang distribution meaning near perfect fits can be achieved for α∈[1,5]. Again separate B-spline fits are applied to [0.4,1] and [1,5]. In each case, preliminary fits were performed to determine good choices for the location of the knot points. Once these were determined, finding the phase-type approximations took around 8 hours of computation time on a terminal with 3.9 GB of RAM and four 2.4 GHz processors. If a greater number of knot points are used this has little bearing on the time it takes to evaluate KL but will increase the number of evaluations needed to calculate the gradient of KL at each iteration and the number of iterations required for convergence. At best we might expect a quadratic increase in computation time with the number of knots. Increasing the order of the phase-type distribution leads to a linear increase in the number of parameters but also increases the size of the matrix exponential to be computed for each evaluation of KL and so is likely to lead to a greater increase in computation time.

Full details of the optimization procedure for the phase-type fit is given in the Supplementary Materials.

4 Approximation of a semi-Markov system

4.1 Likelihood computation for a semi-Markov model with phase-type sojourn distributions

In this section, the equivalence of the likelihoods for a panel observed semi-Markov process with phase-type sojourn distributions and an aggregated Markov model is given. Suppose {X(t);t>0} is a semi-Markov model with state space {1,2,…,R}. Let π=(1,0,…,0)T be a vector of length k. Further suppose the transition intensities are given by for t>0, where \(T^{*}_{r}\) is the time of entry into state r during the current sojourn, S(r) is the k×k subgenerator matrix of a k phase phase-type distribution, \(\sum_{j \neq r} \mathbf{S}^{(rj)}_{0} = -\mathbf{S}^{(r)}\mathbf{1}\) and 1 is a vector of 1s of length k.
Now let {X(t),t>0} be a Rk state time homogeneous Markov process with states indexed by X=11,…,1k,21,…,2k,…,Rk and Rk×Rk generator matrix
$$\mathbf{Q} = \left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}\multicolumn{2}{c}{\mathbf{S}^{(1)}}&\mathbf {S}^{(12)}_{0}&0&\ldots&\mathbf{S}^{(1R)}_0&0\\[1mm] \mathbf{S}^{(21)}_{0}&0&\multicolumn{2}{c}{\mathbf{S}^{(2)}}&\ldots &\mathbf{S}^{(2R)}_{0}&0\\ \multicolumn{2}{c}{\vdots}&\multicolumn{2}{c}{\vdots}&\ddots &\multicolumn{2}{c}{\vdots}\\ \mathbf{S}^{(R1)}_{0}&0&\mathbf{S}^{(R2)}_{0}&0&\ldots&\multicolumn {2}{c}{\mathbf{S}^{(R)}} \end{array} \right ], $$
where each block of zeroes is k×(k−1) in size. Finally define {O(t),t∈{t1,…,tn}} to be an aggregated Markov model with realizations O(t) related through X(t) by P{O(t)=r|X(t)=sl}=δrs for l=1,…,k where δrs=1 if r=s and is 0 otherwise. If the semi-Markov process X(t) is initiated in state 1 at time 0 and is observed at time points t1,t2,…,tn, then for all sets of observation points 0<t1<t2<⋯<tn. Since the aggregated Markov model is a special case of a hidden Markov model, methods for likelihood calculation for continuous time hidden Markov models, such as the forward algorithm, can therefore be used for likelihood calculation for phase-type semi-Markov models.

4.2 Phase-type approximations to parametric semi-Markov models

We now detail how the phase-type approximations to the distributions of Sect. 3 can be used to provide approximate likelihoods for parametric semi-Markov multi-state models. We consider semi-Markov models for which the sojourn time in each state, r, has a parametric distribution with related hazard function qr(u;λr,αr) and the associated transition intensity from state r to state s is
$$q_{rs}(u ; \lambda_{r}, \alpha_{r}) = \lambda_{rs}/\lambda_{r} q_{r}(u ; \lambda_{r}, \alpha_{r}) $$
where λr=∑jλrj. Hence the probability of making an rs transition is then fixed at λrs/λr and is independent of the sojourn time in state r. If S(r) is the phase type approximation to \(f_{\alpha _{r},\lambda_{r}}\), obtained using the methods of Sect. 3, then the generator, Q, for the underlying Markov process in the corresponding aggregate Markov model can be obtained through a block matrix with 5×5 blocks corresponding to the observable states composed of S(r) in the (r,r) block and
$$\bigl[ \mathbf{S}_{0}^{(rs)} \ \mathbf{0} \bigr] $$
in the (r,s) block, where \(\mathbf{S}_{0}^{(rs)} = \mathbf {S}_{0}^{(r)} \lambda_{rs}/\lambda_{r}\) and 0 is a 5×4 matrix of zeroes.

4.3 Incorporation of classification error

A key advantage of using phase-type approximations is that the addition of classification error, at least in terms of likelihood computation, is straightforward.

We assume P{O(t)=s|X(t)=r}=ers and that O(t)|X(t) is independent of O(t′)|X(t′) for tt′. In the framework set up in Sect. 4.1, the observed states then relate to the latent Markov process by P{O(t)=r|X(t)=sl}=ers, i.e. the probabilities are in the interval [0,1] rather than exactly 0 or 1.

4.4 Likelihood maximization

Since the approximate likelihood is continuous and differentiable everywhere in the range α∈[αl,αu], except at α=1, standard numerical maximization procedures including those based on numerical gradients can be used to maximize the likelihood provided a starting value away from α=1 is chosen. However, since the approximation is only defined within the range [αl,αu] problems will occur if the true maximum is outside of this range or near to the boundary. Clearly for these cases an approximation valid on a wider range for α would be needed. This either requires a greater number of knot points, in which case a larger one-off optimization is needed to establish the B-spline approximation, or some compromise on the accuracy of the approximation. For both the Gamma and Weibull cases, we also developed a B-spline approximations with the same number of knot points but over the range [0.2,4]. A pragmatic approach is then to attempt to optimize the likelihood based on the original approximation range, but terminate if the boundary is reached and restart the optimization using the likelihood based on the wider approximation range.

5 Performance on simulated data

5.1 Direct comparison of likelihood curves

To assess the accuracy of the approximate likelihood we first consider a scenario where recovery is possible but where simulation based methods can be used to compute the likelihood curve to high accuracy. Due to the computational difficulty of computing the exact likelihood, we constrain ourselves to a simple case where there are two states (e.g. healthy and ill), the sojourn distribution in the healthy state is known to be exponential with rate λ0, while the sojourn distribution in the illness state has a Weibull or Gamma distribution with a known rate, but unknown shape parameter. Moreover, we assume that all subjects are initiated in the healthy state at time 0 and are observed at a common set of 5 equally spaced examination times. The assumptions of an exponential sojourn distribution for state 1 and equally spaced examination times mean that the likelihood can be expressed in terms of just 6 probabilities corresponding to different lengths of time without an observed state 0 (i.e. patterns of 0, 10, 110, 1110, 11110 and 11111).

Figure 2 shows the approximate likelihood curves based on the phase-type approximations compared to the approximate likelihood based on estimating the 6 probabilities by simulating 106 processes from initiation in state 0 until time 5. State 0 is taken to have an exponential sojourn distribution with rate λ0=0.7. For the results in panel (a), state 1 is taken to have a Weibull distribution with rate λ1=0.4 and shape parameter α, whilst for panel (b), state 1 is taken to have a Gamma distribution with rate λ1=0.4. For the simulated likelihood α is assessed on a grid of points from 0.62 to 1.2, using a Weibull distribution for panel (a) and Gamma distribution for panel (b). The likelihood is for a dataset of 1000 subjects simulated from a process where α=0.8 again with a Weibull distribution for panel (a) and Gamma distribution for panel (b). We see that in each case there is very close agreement between the phase-type approximation and the simulated log-likelihood curves both in terms of the location of the maxima and the absolute value. Note that there is a greater dependence of α on the likelihood for the Gamma model because the shape parameter has a stronger influence on the mean of a Gamma distribution than is the case with the shape parameter of a Weibull distribution.
Fig. 2

Comparison of phase-type approximation to the log-likelihood (dashed line) and simulated based approximation to the log-likelihood (circle points) for a two-state process

5.2 Realistic data scenario

To further assess the performance of the phase-type approximation method, we additionally apply the method to simulated datasets of a similar nature to the lung transplantation dataset to be introduced in Sect. 6. We consider samples of either 500 or 1000 patients, where patients are observed up to 5 or 8 times and for the 8 observations per patient cases we also consider additional scenarios where there is mild and moderate state misclassification. This leads to a total of 8 different simulation scenarios which are repeated for both Weibull and Gamma models. For scenarios where patients are observed up to 8 times, the time intervals between observations is U(0.5,2), while if they are observed up to 5 times they are U(1,2.5). In both cases the time interval between observations varies within patient sequences as well as between patients. Data are generated from a model with three states: healthy, ill and dead, where recovery from the illness state is possible. All patients are initialized in state 1 (healthy) at time 0. The parameters of the semi-Markov process in both the Weibull and Gamma cases are (λ12,α1,λ13,λ21,α2,λ23)=(0.25,1.4,0.05,0.04,0.7,0.1) corresponding to a process where the hazard of a transition is increasing with time in the same state for the healthy state and decreasing for the illness state and a subject is more likely to progress to the illness state than die from the healthy state and more likely to die from the illness state than recover to the healthy state. In the mild misclassification scenario e12=0.04 and e21=0.1 while in the moderate misclassification scenario e12=0.1 and e21=0.2.

Exact approaches to likelihood computation are not computationally feasible meaning performance is restricted to assessing consistency of the estimates from maximization of the approximate likelihood. To assess the likely performance of (approximate) likelihood ratio based confidence intervals, we also consider the difference in log-likelihood at the maximum likelihood estimate and at the true parameter values.

The approximate likelihood used for the simulations was optimized for the case λ=1 and tmax=2, which is appropriate given the true λ are 0.3 and 0.14 and the maximum follow-up time is around 12 years. In a small proportion of cases, the MLE for α1 was not in the range [0.4,2]. In these cases an analogous B-spline approximation on the range [0.2,4] was used to approximate the likelihood.

In general the parameter estimates are close to being unbiased (Table 1). As might be expected, bias is typically slightly higher for the smaller sample size and is higher for models with moderate levels of misclassification. The estimates for the Weibull model perform slightly better than those for the Gamma model.
Table 1

Bias in parameter estimates for simulated data. Presented values are Bias ×103. Misc refers to whether the model included state misclassification

Scenario

Parameter

Distribution

N

m

Misc

λ12

α1

λ13

λ21

α2

λ23

e12

e21

Weibull

1000

5

No

2.16

10.26

−1.40

0.84

4.24

2.21

  

Weibull

1000

8

No

0.82

8.88

−0.59

0.13

1.70

0.77

  

Weibull

1000

8

Mild

0.66

10.80

−0.61

0.36

6.11

1.60

0.25

−0.11

Weibull

1000

8

Moderate

2.05

10.43

−1.57

0.16

6.16

2.66

0.44

0.29

Weibull

500

5

No

4.47

9.14

−2.75

1.67

2.43

3.94

  

Weibull

500

8

No

2.40

13.15

−1.42

0.44

5.04

2.28

  

Weibull

500

8

Mild

3.44

13.92

−2.10

1.50

7.33

3.43

−0.21

−0.23

Weibull

500

8

Moderate

4.17

16.09

−2.77

1.46

8.16

4.40

0.26

0.40

Gamma

1000

5

No

2.38

7.08

0.77

1.14

21.62

2.80

  

Gamma

1000

8

No

6.72

9.91

−2.50

1.45

16.49

3.47

  

Gamma

1000

8

Mild

4.60

9.23

−0.87

0.92

22.10

3.85

0.34

0.28

Gamma

1000

8

Moderate

6.81

9.93

−1.75

0.39

22.48

3.19

−0.21

0.47

Gamma

500

5

No

4.90

10.34

−0.11

1.55

28.28

4.32

  

Gamma

500

8

No

10.17

21.11

−2.84

2.46

36.24

6.27

  

Gamma

500

8

Mild

8.68

18.88

−1.64

1.54

35.36

5.39

0.43

−0.05

Gamma

500

8

Moderate

12.18

23.06

−3.90

0.38

30.89

5.74

0.44

−0.60

The likelihood ratio statistics from the simulated data were close to having their nominal χ2 distributions for all scenarios (Table 1 in the Supplementary Materials document).

6 Illustration: BOS dataset

Bronchiolitis obliterans is the irreversible, progressive airway obstruction leading to impairment of lung function. It is the major limiting factor to long-term survival for lung transplant recipients. Bronchiolitis obliterans can only be reliably assessed histologically. In practice however, Bronchiolitis obliterans syndrome (BOS) is defined as decline in forced expiratory volume in 1 second in litres (FEV1) and this is used as a surrogate measure. Interest lies in determining the rate at which patients develop the disease as well as the effect BOS has on survival. However, disease assessment through FEV1 only occurs at clinic visits and is subject to classification error.

The dataset analyzed includes 364 post-lung-transplant patients who received transplants at Papworth Hospital between 1984 and 2006. 242 of the patients were heart-lung transplant patients and the remaining 122 were double lung transplant patients. The dataset contains 2654 assessments of lung function, 193 deaths and 171 administrative censoring times. Observed transitions are shown in Table 2.
Table 2

Observed transitions for the BOS data. States are 1 = disease-free, 2 = BOS, 3 = dead, C = observed final state mortality censored, HL = Heart-lung transplant, DL = Double-lung transplant

 

To state

1

2

3

C

HL

1

1190

198

36

55

from state

2

47

773

113

38

DL

1

198

68

23

51

from state

2

20

160

21

27

We seek to fit models where the sojourn distributions in the transient states are Weibull or Gamma distributed. In each case, the underlying model has three states: Healthy (BOS free), Ill (BOS present), Dead. A diagrammatic representation of the model is shown in Fig. 3. The approximate representation of this model has 11 states, the first five corresponding to the healthy state, the next five to the ill state and the final (absorbing) state to death.
Fig. 3

Underlying three state semi-Markov process in the model for the BOS dataset

BOS is expressed in terms of decline in FEV1 relative to a post-transplantation baseline measure. BOS is not defined until at least six months after transplantation with measurements before this time used to establish the patient’s baseline measure. Time in our models is therefore measured from six months after transplant. At this time the majority of patients should be BOS free. However, we consider the possibility that some patients’ lung functioning began to decline before 6 months and as a result they are already in the BOS state. We therefore consider 11 parameter Weibull and Gamma models with shape parameters α1,α2, rate parameters λ12,λ13,λ21 and λ23. \(p_{2}^{\mathrm{DL}}\) and \(p_{2}^{\mathrm{HL}}\) represent the probability of the process initiating in state 2 for double lung and heart lung patients respectively. Similarly \(e_{12}^{\mathrm{DL}}\) and \(e_{12}^{\mathrm{HL}}\) are the probabilities of being misclassified to the BOS state given being truly in the healthy state. Finally, e21 represents the probability of being misclassified to the healthy state given the patient is truly in the BOS state—the value is taken to be the same for both transplantation types.

As the mean sojourn time in each state is known to be around 5 years and follow-up is less than 10 years for most patients, the phase-type approximations used are based on optimizing for tmax=2 when λ=1, implying an upper time of 10 years for λ=0.2. Table 3 gives the parameter estimates and bootstrap 95 % confidence intervals for the Weibull and Gamma hidden semi-Markov models. There is strong evidence in favour of the semi-Markov models compared to a time homogeneous Markov model, the likelihood ratio statistics are 25.4 and 22.9 for the Weibull and Gamma models respectively, where the comparison is with a \(\chi^{2}_{2}\) distribution in each case. Note that for several parameters the bootstrap confidence intervals have 0 as the lower limit. This corresponds to when instantaneous risk of death is 0 from state 1, where recovery from state 2 is not possible, where misclassification from state 2 to state 1 is not possible or where all subjects are initiated in state 1.
Table 3

Parameter estimates for Weibull and Gamma hidden semi-Markov models for the BOS dataset with bootstrap 95 % confidence intervals

Parameter

Weibull

Gamma

Estimate

CI

Estimate

CI

α1

0.724

(0.602,0.843)

0.604

(0.485,0.791)

λ12

0.258

(0.196,0.338)

0.137

(0.099,0.192)

λ13

0.010

(0,0.032)

0.002

(0,0.018)

α2

0.717

(0.600,0.893)

0.583

(0.463,0.846)

λ21

0.050

(0,0.106)

0.027

(0.002,0.052)

λ23

0.227

(0.177,0.290)

0.117

(0.084,0.175)

\(e_{12}^{\mathrm{DL}}\)

0.026

(0.009,0.045)

0.026

(0.009,0.045)

\(e_{12}^{\mathrm{HL}}\)

0.092

(0.034,0.171)

0.091

(0.038,0.174)

e21

0.006

(0,0.017)

0.006

(0,0.017)

\(p_{2}^{\mathrm{DL}}\)

0.033

(0,0.097)

0.023

(0,0.090)

\(p_{2}^{\mathrm{HL}}\)

0.136

(0,0.268)

0.125

(0.039,0.174)

−2×LL

2979.7

2982.2

No. pars

11

11

Titman and Sharples (2010) fitted a time homogeneous hidden semi-Markov model with 2-phase Coxian phase-type distributions to the same dataset. This requires 3 parameters for each state, resulting in a model with 13 parameters with −2×LL=2976.5. This model and the Weibull and Gamma models are not nested so direct comparisons are not possible but the Weibull model with 11 parameters and −2×LL=2979.7 is slightly preferable in terms of AIC whereas the Gamma model is slightly inferior. Estimated quantities of interest, such as state occupancy probabilities, are very close between the three models. As might be expected, the bootstrap standard errors of quantities of interest are generally slightly lower for the Weibull and Gamma model compared to the analogous bootstrap standard errors for the 2-phase model. The biggest apparent gain in efficiency is found for the estimates of survival conditional on having spent specific times in the BOS state. Further details of the comparison of models is given in the Supplementary Materials.

7 Discussion

In this article we have detailed a method for approximating the likelihood for parametric semi-Markov models. After a one-off optimization to establish approximations to the parametric distribution, the approach allows semi-Markov models to be fitted in computational times comparable with existing methods for non-homogeneous Markov models. While in this paper approximations have only been established for the Weibull and Gamma distributions, the idea is directly transferable to any two-parameter failure time distribution in which one parameter is a rate (or equivalently scale) parameter. Potentially, B-splines of more than one dimension could be used to get approximation surfaces for distributions requiring two or more non-scaling parameters although this would significantly increase the one-off optimization required to establish the approximation. For distributions with uni- or multi-modal hazard functions, phase-type distributions with a greater number of phases may be required to maintain a good approximation. However, the real advantage of using the Weibull or Gamma distributions is that they allow a reasonable degree of flexibility with few additional parameters.

The method in this article contrasts with the method proposed by Titman and Sharples (2010) in which phase-type distributions are used directly as the sojourn distributions in the semi-Markov model. It is unlikely that panel data provide sufficient information to be able to adequately fit more than the simplest phase-type distributions via this direct approach, although there may be more scope in cases where state misclassification does not occur, where recovery is not possible in the multi-state model, or where the data contain a mixture of interval censored observations and exactly observed transition times. In practice, a strategy for model building would be to fit a model using two-phase Coxian distributions to informally assess whether there is evidence against the Markov assumption. If this appears to be the case, then the methods in this paper can then be used to provide a more interpretable model. In smaller data sets it might only be possible to fit a model where one of the state has a non-exponential distribution.

A limitation of the method proposed is that it assumes the sojourn time in state r is independent of the transition, e.g. rs, that occurs. This restriction was necessary for the BOS dataset where there are relatively few 1→3 or 2→1 transitions. In the semi-Markov literature, it is common to parameterize a model based on the transition probabilities of the embedded Markov process, and the conditional sojourn times. The phase-type approximations can be used to fit models in which the conditional sojourn times have Weibull or Gamma distributions by having a latent generator matrix Q whose blocks correspond to specific transitions rather than states. However, multi-state models in medical contexts usually parameterize in terms of the transition intensities and a more natural Weibull model would allow each intensity to be of a separate Weibull form e.g. \(q_{rs}(u) = \lambda_{rs} \alpha_{rs} (\lambda_{rs} u)^{\alpha_{rs}-1}\). Here the overall sojourn in state r is poly-Weibull. These can still be represented using the underlying phase-type approximations to the constituent competing Weibull distributions, but require a large number of latent states, e.g. a 5 phase approximation with R−1 competing transitions requires 5R−1 latent states. The necessity to compute large matrix exponentials in this case makes the method less computationally appealing. A more feasible compromise is where only one of the transitions out of state r has a Weibull intensity and the others are constant. More details of these model extensions is given in the Supplementary Materials. Similarly, in this paper we have assumed that all subjects are initiated in state 1 at time 0. Often the initial waiting time in the first state is unknown. Some discussion on how this could be incorporated into the current modelling framework is given in the Supplementary Materials.

8 Supplementary materials

Further details on additional model extensions, fitting the phase type distributions, assessing the fit of the approximations and a comparison of the Weibull and Gamma model with the 2 Phase Coxian Phase-type model for the BOS dataset are given in the Supplementary Materials document.

Supplementary material

11222_2012_9360_MOESM1_ESM.pdf (297 kb)
(PDF 297 kB)

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsLancaster UniversityLancasterUK

Personalised recommendations