Estimating parametric semi-Markov models from panel data using phase-type approximations
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s11222-012-9360-6
- Cite this article as:
- Titman, A.C. Stat Comput (2014) 24: 155. doi:10.1007/s11222-012-9360-6
- 2 Citations
- 463 Downloads
Abstract
Inference for semi-Markov models under panel data presents considerable computational difficulties. In general the likelihood is intractable, but a tractable likelihood with the form of a hidden Markov model can be obtained if the sojourn times in each of the states are assumed to have phase-type distributions. However, using phase-type distributions directly may be undesirable as they require estimation of parameters which may be poorly identified. In this article, an approach to fitting semi-Markov models with standard parametric sojourn distributions is developed. The method involves establishing a family of Coxian phase-type distribution approximations to the parametric distribution and merging approximations for different states to obtain an approximate semi-Markov process with a tractable likelihood. Approximations are developed for Weibull and Gamma distributions and demonstrated on data relating to post-lung-transplantation patients.
Keywords
B-splines Gamma distribution Hidden Markov model Misclassification Panel data Phase-type distribution Semi-Markov Weibull1 Introduction
Processes from a wide range of fields may be modelled as multi-state stochastic processes on a finite discrete state space in continuous time. Often continuous monitoring of the process is not possible and instead data consist of a series of snapshots of the process at potentially irregular and subject specific time points, with no information on the trajectory of the process between these times. Such data are referred to panel data, which is sometimes restricted to the case where all subjects are observed at a common set of observation times, but is here used more broadly to include cases where the observation times may be irregularly spaced and subject specific. We also allow for the possibility that the observed state may be subject to classification error.
Multi-state models under panel observation have application in a wide range of fields, for instance in financial applications such as credit risk scoring (Bladt and Sorensen 2009), social science applications such as monitoring spells of unemployment (Lancaster and Nickell 1980) and wide uses in medical and biostatistical applications for modelling spells of infection (Crespi et al. 2005) or the progression of diseases (Gentleman et al. 1994). An advantage of multi-state models is that they allow estimation of many outcomes of interest including sojourn times in states, first hitting times and mean time to terminal event (Mandel 2010). However, the accuracy of these estimates is somewhat reliant on correctly specifying the model. Analysis of panel data, with or without classification error, is generally performed using a Markov or hidden Markov model. In practice, processes can act on several different time scales. In particular, if the transition intensities between states of the process depend on the length of time already spent in that state, the process is semi-Markov.
Inference for semi-Markov models under panel data presents considerable computational difficulties. Such models can be categorized into those without recovery, i.e. where once a state has been exited it cannot be re-entered, and those with recovery. When recovery to previous states is not possible, the likelihood can be computed using numerical integration (Foucher et al. 2010), although this approach becomes difficult for models with many states. For more general semi-Markov models under panel observation, allowing recovery, the likelihood is somewhat intractable and as a result there has been relatively little work in this area. Kang and Lagakos (2007) developed methods based on numerical solution of integral equations appropriate when at least one of the transient states of the process has an exponential distribution and the non-exponential states have a guarantee time, i.e. minimum sojourn length, in each state.
Titman and Sharples (2010) developed methods for fitting semi-Markov models with phase-type sojourn distributions to panel data. The advantage of using phase-type distributions is that the model can be represented as an aggregated Markov model where occupancy in a particular state of the semi-Markov model corresponds to occupancy in a set of states in a latent Markov model. This greatly simplifies computation of the likelihood. Using 2-phase Coxian phase-type distributions, each transition intensity has three parameters which correspond to an initial intensity, a limiting intensity as the time spent in the state tends to infinity and the rate at which the intensity evolves between the two values. While this formulation provides a fairly flexible class of models, it has some disadvantages. First, the use of phase-type distributions is for computational convenience and makes interpretation harder than if more familiar distributions were used. Second, there is often a need for parsimony in multi-state models from panel data, but the 2-phase Coxian phase-type distribution requires two additional parameters compared to the exponential distribution. The inter-phase rate parameter in particular, is often difficult to estimate. Third, the inter-phase rate parameter is not identifiable under the null (Markov) model. This makes testing the Markov assumption using the phase-type approach difficult.
In this article, an alternative application of phase-type distributions for semi-Markov models is developed, which addresses these problems. Rather than fitting models with phase-type distributions, models with standard parametric sojourn distributions are used, with phase-type distributions applied to provide a computationally tractable approximate likelihood. For a particular parametric distribution, a family of phase-type distribution approximations is developed. Establishing this family of approximations involves a relatively large optimization problem but only needs to be done once and then allows semi-Markov models with a wide range of state spaces to be fitted very quickly. Using well known survival distributions such as Weibull or Gamma, makes interpretation simpler and requires fewer parameters than the 2-phase Coxian distribution.
The remainder of the article is organized as follows. Section 2 outlines likelihood computation for panel data. In Sect. 3 a method of obtaining functional approximations to families of distributions via phase-type distributions is developed. In Sect. 4, phase-type approximations to parametric semi-Markov multi-state models are developed. Section 5 is a simulation study investigating the performance of estimates based on the phase-type approximations. Section 6 gives an illustrative example of the methodology on the BOS data from post-lung-transplantation patients. The article concludes with a discussion.
2 Likelihood computation for panel data
We consider a continuous time process {X(t),t≥0} on a finite set of states indexed 1,…,R. The observed data for an individual subject consist of observed states x_{0},x_{1},…,x_{N} at times t_{0}<t_{1}<⋯<t_{N}, where these time points and the number of times N may be subject specific. The key aspect of panel data is that while the state is known at these time points, nothing is known about the trajectory of the process between these times.
2.1 Semi-Markov models
In progressive semi-Markov models, where there is a finite number of possible paths that an individual can take conditional on their observed states, computation of the likelihood requires considering each path and integrating over the possible sojourn times in each state of the path (Foucher et al. 2010). Numerical quadrature methods can be applied to compute the likelihood but become unattractive for models with more than 3 or 4 states because of the increasing dimension of the integrals.
For more general models where backward transitions are possible, direct integration is not possible because the number of possible state visits is unbounded. Computation of the transition probabilities defined as p_{rs}(u,t)=P{X(t)=s|X(u)=r,T^{∗}=u} i.e. the transition probability from state r to state s from time u to time t given entry into the state at time u, requires solution to a system of integral equations (Howard 1964; De Dominics and Manca 1984). However, the likelihood for a set of panel observed states cannot be expressed simply as the product of transition probabilities, as the observation times will not correspond to the entry time into the observed state.
2.2 State classification error
However, for a general hidden semi-Markov model, the forward algorithm cannot be applied meaning more general summation over all possible true state sequences is required. If backwards transitions are possible this means the number of summation terms increases exponentially with sequence length.
3 Phase-type approximations to parametric distributions
In this section, phase-type distributions are introduced and an approach to developing functional phase-type approximations to parametric distributions is developed.
Many parametric failure time distributions can be expressed in terms of a rate (or scale) parameter, λ, and an additional parameter, α, typically referred to as the shape parameter. Examples include the Weibull, Gamma and log-logistic distributions and the two-parameter Birnbaum-Saunders distribution. The key property which we will exploit is that the survivor function and hazard functions for a distribution with rate parameter λ can be equated with the survivor function and hazard functions for rate parameter 1 via S_{α,λ}(t)=S_{α,1}(λt) and q_{α,λ}(t)=λq_{α,1}(λt), where S_{α,λ}(t) and q_{α,λ}(t) denote the survivor and hazard functions for a parametric distribution with parameters (α,λ).
The use of phase-type distribution approximations to parametric distributions is common in stochastic control problems, for instance the analysis of queues with general interarrival and service time distributions (Neuts 1981). As a result there is a reasonably wide literature on developing phase-type approximations. In particular, Asmussen et al. (1996) developed an EM algorithm to fit phase-type distributions to data or known distributions and developed a C program EMpht (Olsson 1998), based on minimizing the Kullback-Leibler distance between the target distribution and the phase-type approximation.
In principle, separate phase-type fits, minimizing (1) with respect to the parameters θ of S(θ), could be performed for each α required in optimizing the likelihood of a multi-state model. However, this approach is unattractive, firstly because the requirement to fit a new phase-type approximation at each iteration would be excessively time consuming and also because sensitivity to the convergence criterion used may lead to a discontinuous likelihood surface.
Noting that, up to a constant, the Kullback-Leibler distance as defined in (1) is equivalent to a multinomial log-likelihood, we can apply a constrained Fisher scoring-type algorithm to fit the parameters for the B-spline approximation.
3.1 Weibull and Gamma approximations
For the Weibull distribution, we choose as our range of values for α the interval [0.4,2]. This covers a fairly wide range of possible hazard shapes, which is sufficient for the BOS application. The dynamics of the Weibull distribution are somewhat different for α<1 than α>1. In particular, the hazard is 0 at t=0 for α>1 whereas it is unbounded for α<1. Therefore rather than seek a single B-spline solution for the full range [0.4,2], we consider separate functions for [0.4,1] and [1,2].
For the Gamma distribution, we choose a range of values of α of [0.4,5]. A higher upper limit for α is chosen because the relationship with the Erlang distribution meaning near perfect fits can be achieved for α∈[1,5]. Again separate B-spline fits are applied to [0.4,1] and [1,5]. In each case, preliminary fits were performed to determine good choices for the location of the knot points. Once these were determined, finding the phase-type approximations took around 8 hours of computation time on a terminal with 3.9 GB of RAM and four 2.4 GHz processors. If a greater number of knot points are used this has little bearing on the time it takes to evaluate KL but will increase the number of evaluations needed to calculate the gradient of KL at each iteration and the number of iterations required for convergence. At best we might expect a quadratic increase in computation time with the number of knots. Increasing the order of the phase-type distribution leads to a linear increase in the number of parameters but also increases the size of the matrix exponential to be computed for each evaluation of KL and so is likely to lead to a greater increase in computation time.
Full details of the optimization procedure for the phase-type fit is given in the Supplementary Materials.
4 Approximation of a semi-Markov system
4.1 Likelihood computation for a semi-Markov model with phase-type sojourn distributions
4.2 Phase-type approximations to parametric semi-Markov models
4.3 Incorporation of classification error
A key advantage of using phase-type approximations is that the addition of classification error, at least in terms of likelihood computation, is straightforward.
We assume P{O(t)=s|X(t)=r}=e_{rs} and that O(t)|X(t) is independent of O(t′)|X(t′) for t≠t′. In the framework set up in Sect. 4.1, the observed states then relate to the latent Markov process by P{O(t)=r|X^{∗}(t)=s_{l}}=e_{rs}, i.e. the probabilities are in the interval [0,1] rather than exactly 0 or 1.
4.4 Likelihood maximization
Since the approximate likelihood is continuous and differentiable everywhere in the range α∈[α_{l},α_{u}], except at α=1, standard numerical maximization procedures including those based on numerical gradients can be used to maximize the likelihood provided a starting value away from α=1 is chosen. However, since the approximation is only defined within the range [α_{l},α_{u}] problems will occur if the true maximum is outside of this range or near to the boundary. Clearly for these cases an approximation valid on a wider range for α would be needed. This either requires a greater number of knot points, in which case a larger one-off optimization is needed to establish the B-spline approximation, or some compromise on the accuracy of the approximation. For both the Gamma and Weibull cases, we also developed a B-spline approximations with the same number of knot points but over the range [0.2,4]. A pragmatic approach is then to attempt to optimize the likelihood based on the original approximation range, but terminate if the boundary is reached and restart the optimization using the likelihood based on the wider approximation range.
5 Performance on simulated data
5.1 Direct comparison of likelihood curves
To assess the accuracy of the approximate likelihood we first consider a scenario where recovery is possible but where simulation based methods can be used to compute the likelihood curve to high accuracy. Due to the computational difficulty of computing the exact likelihood, we constrain ourselves to a simple case where there are two states (e.g. healthy and ill), the sojourn distribution in the healthy state is known to be exponential with rate λ_{0}, while the sojourn distribution in the illness state has a Weibull or Gamma distribution with a known rate, but unknown shape parameter. Moreover, we assume that all subjects are initiated in the healthy state at time 0 and are observed at a common set of 5 equally spaced examination times. The assumptions of an exponential sojourn distribution for state 1 and equally spaced examination times mean that the likelihood can be expressed in terms of just 6 probabilities corresponding to different lengths of time without an observed state 0 (i.e. patterns of 0, 10, 110, 1110, 11110 and 11111).
5.2 Realistic data scenario
To further assess the performance of the phase-type approximation method, we additionally apply the method to simulated datasets of a similar nature to the lung transplantation dataset to be introduced in Sect. 6. We consider samples of either 500 or 1000 patients, where patients are observed up to 5 or 8 times and for the 8 observations per patient cases we also consider additional scenarios where there is mild and moderate state misclassification. This leads to a total of 8 different simulation scenarios which are repeated for both Weibull and Gamma models. For scenarios where patients are observed up to 8 times, the time intervals between observations is U(0.5,2), while if they are observed up to 5 times they are U(1,2.5). In both cases the time interval between observations varies within patient sequences as well as between patients. Data are generated from a model with three states: healthy, ill and dead, where recovery from the illness state is possible. All patients are initialized in state 1 (healthy) at time 0. The parameters of the semi-Markov process in both the Weibull and Gamma cases are (λ_{12},α_{1},λ_{13},λ_{21},α_{2},λ_{23})=(0.25,1.4,0.05,0.04,0.7,0.1) corresponding to a process where the hazard of a transition is increasing with time in the same state for the healthy state and decreasing for the illness state and a subject is more likely to progress to the illness state than die from the healthy state and more likely to die from the illness state than recover to the healthy state. In the mild misclassification scenario e_{12}=0.04 and e_{21}=0.1 while in the moderate misclassification scenario e_{12}=0.1 and e_{21}=0.2.
Exact approaches to likelihood computation are not computationally feasible meaning performance is restricted to assessing consistency of the estimates from maximization of the approximate likelihood. To assess the likely performance of (approximate) likelihood ratio based confidence intervals, we also consider the difference in log-likelihood at the maximum likelihood estimate and at the true parameter values.
The approximate likelihood used for the simulations was optimized for the case λ=1 and t_{max}=2, which is appropriate given the true λ are 0.3 and 0.14 and the maximum follow-up time is around 12 years. In a small proportion of cases, the MLE for α_{1} was not in the range [0.4,2]. In these cases an analogous B-spline approximation on the range [0.2,4] was used to approximate the likelihood.
Bias in parameter estimates for simulated data. Presented values are Bias ×10^{3}. Misc refers to whether the model included state misclassification
Scenario | Parameter | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Distribution | N | m | Misc | λ_{12} | α_{1} | λ_{13} | λ_{21} | α_{2} | λ_{23} | e_{12} | e_{21} |
Weibull | 1000 | 5 | No | 2.16 | 10.26 | −1.40 | 0.84 | 4.24 | 2.21 | ||
Weibull | 1000 | 8 | No | 0.82 | 8.88 | −0.59 | 0.13 | 1.70 | 0.77 | ||
Weibull | 1000 | 8 | Mild | 0.66 | 10.80 | −0.61 | 0.36 | 6.11 | 1.60 | 0.25 | −0.11 |
Weibull | 1000 | 8 | Moderate | 2.05 | 10.43 | −1.57 | 0.16 | 6.16 | 2.66 | 0.44 | 0.29 |
Weibull | 500 | 5 | No | 4.47 | 9.14 | −2.75 | 1.67 | 2.43 | 3.94 | ||
Weibull | 500 | 8 | No | 2.40 | 13.15 | −1.42 | 0.44 | 5.04 | 2.28 | ||
Weibull | 500 | 8 | Mild | 3.44 | 13.92 | −2.10 | 1.50 | 7.33 | 3.43 | −0.21 | −0.23 |
Weibull | 500 | 8 | Moderate | 4.17 | 16.09 | −2.77 | 1.46 | 8.16 | 4.40 | 0.26 | 0.40 |
Gamma | 1000 | 5 | No | 2.38 | 7.08 | 0.77 | 1.14 | 21.62 | 2.80 | ||
Gamma | 1000 | 8 | No | 6.72 | 9.91 | −2.50 | 1.45 | 16.49 | 3.47 | ||
Gamma | 1000 | 8 | Mild | 4.60 | 9.23 | −0.87 | 0.92 | 22.10 | 3.85 | 0.34 | 0.28 |
Gamma | 1000 | 8 | Moderate | 6.81 | 9.93 | −1.75 | 0.39 | 22.48 | 3.19 | −0.21 | 0.47 |
Gamma | 500 | 5 | No | 4.90 | 10.34 | −0.11 | 1.55 | 28.28 | 4.32 | ||
Gamma | 500 | 8 | No | 10.17 | 21.11 | −2.84 | 2.46 | 36.24 | 6.27 | ||
Gamma | 500 | 8 | Mild | 8.68 | 18.88 | −1.64 | 1.54 | 35.36 | 5.39 | 0.43 | −0.05 |
Gamma | 500 | 8 | Moderate | 12.18 | 23.06 | −3.90 | 0.38 | 30.89 | 5.74 | 0.44 | −0.60 |
The likelihood ratio statistics from the simulated data were close to having their nominal χ^{2} distributions for all scenarios (Table 1 in the Supplementary Materials document).
6 Illustration: BOS dataset
Bronchiolitis obliterans is the irreversible, progressive airway obstruction leading to impairment of lung function. It is the major limiting factor to long-term survival for lung transplant recipients. Bronchiolitis obliterans can only be reliably assessed histologically. In practice however, Bronchiolitis obliterans syndrome (BOS) is defined as decline in forced expiratory volume in 1 second in litres (FEV_{1}) and this is used as a surrogate measure. Interest lies in determining the rate at which patients develop the disease as well as the effect BOS has on survival. However, disease assessment through FEV_{1} only occurs at clinic visits and is subject to classification error.
Observed transitions for the BOS data. States are 1 = disease-free, 2 = BOS, 3 = dead, C = observed final state mortality censored, HL = Heart-lung transplant, DL = Double-lung transplant
To state | |||||
---|---|---|---|---|---|
1 | 2 | 3 | C | ||
HL | 1 | 1190 | 198 | 36 | 55 |
from state | 2 | 47 | 773 | 113 | 38 |
DL | 1 | 198 | 68 | 23 | 51 |
from state | 2 | 20 | 160 | 21 | 27 |
BOS is expressed in terms of decline in FEV_{1} relative to a post-transplantation baseline measure. BOS is not defined until at least six months after transplantation with measurements before this time used to establish the patient’s baseline measure. Time in our models is therefore measured from six months after transplant. At this time the majority of patients should be BOS free. However, we consider the possibility that some patients’ lung functioning began to decline before 6 months and as a result they are already in the BOS state. We therefore consider 11 parameter Weibull and Gamma models with shape parameters α_{1},α_{2}, rate parameters λ_{12},λ_{13},λ_{21} and λ_{23}. \(p_{2}^{\mathrm{DL}}\) and \(p_{2}^{\mathrm{HL}}\) represent the probability of the process initiating in state 2 for double lung and heart lung patients respectively. Similarly \(e_{12}^{\mathrm{DL}}\) and \(e_{12}^{\mathrm{HL}}\) are the probabilities of being misclassified to the BOS state given being truly in the healthy state. Finally, e_{21} represents the probability of being misclassified to the healthy state given the patient is truly in the BOS state—the value is taken to be the same for both transplantation types.
Parameter estimates for Weibull and Gamma hidden semi-Markov models for the BOS dataset with bootstrap 95 % confidence intervals
Parameter | Weibull | Gamma | ||
---|---|---|---|---|
Estimate | CI | Estimate | CI | |
α_{1} | 0.724 | (0.602,0.843) | 0.604 | (0.485,0.791) |
λ_{12} | 0.258 | (0.196,0.338) | 0.137 | (0.099,0.192) |
λ_{13} | 0.010 | (0,0.032) | 0.002 | (0,0.018) |
α_{2} | 0.717 | (0.600,0.893) | 0.583 | (0.463,0.846) |
λ_{21} | 0.050 | (0,0.106) | 0.027 | (0.002,0.052) |
λ_{23} | 0.227 | (0.177,0.290) | 0.117 | (0.084,0.175) |
\(e_{12}^{\mathrm{DL}}\) | 0.026 | (0.009,0.045) | 0.026 | (0.009,0.045) |
\(e_{12}^{\mathrm{HL}}\) | 0.092 | (0.034,0.171) | 0.091 | (0.038,0.174) |
e_{21} | 0.006 | (0,0.017) | 0.006 | (0,0.017) |
\(p_{2}^{\mathrm{DL}}\) | 0.033 | (0,0.097) | 0.023 | (0,0.090) |
\(p_{2}^{\mathrm{HL}}\) | 0.136 | (0,0.268) | 0.125 | (0.039,0.174) |
−2×LL | 2979.7 | 2982.2 | ||
No. pars | 11 | 11 |
Titman and Sharples (2010) fitted a time homogeneous hidden semi-Markov model with 2-phase Coxian phase-type distributions to the same dataset. This requires 3 parameters for each state, resulting in a model with 13 parameters with −2×LL=2976.5. This model and the Weibull and Gamma models are not nested so direct comparisons are not possible but the Weibull model with 11 parameters and −2×LL=2979.7 is slightly preferable in terms of AIC whereas the Gamma model is slightly inferior. Estimated quantities of interest, such as state occupancy probabilities, are very close between the three models. As might be expected, the bootstrap standard errors of quantities of interest are generally slightly lower for the Weibull and Gamma model compared to the analogous bootstrap standard errors for the 2-phase model. The biggest apparent gain in efficiency is found for the estimates of survival conditional on having spent specific times in the BOS state. Further details of the comparison of models is given in the Supplementary Materials.
7 Discussion
In this article we have detailed a method for approximating the likelihood for parametric semi-Markov models. After a one-off optimization to establish approximations to the parametric distribution, the approach allows semi-Markov models to be fitted in computational times comparable with existing methods for non-homogeneous Markov models. While in this paper approximations have only been established for the Weibull and Gamma distributions, the idea is directly transferable to any two-parameter failure time distribution in which one parameter is a rate (or equivalently scale) parameter. Potentially, B-splines of more than one dimension could be used to get approximation surfaces for distributions requiring two or more non-scaling parameters although this would significantly increase the one-off optimization required to establish the approximation. For distributions with uni- or multi-modal hazard functions, phase-type distributions with a greater number of phases may be required to maintain a good approximation. However, the real advantage of using the Weibull or Gamma distributions is that they allow a reasonable degree of flexibility with few additional parameters.
The method in this article contrasts with the method proposed by Titman and Sharples (2010) in which phase-type distributions are used directly as the sojourn distributions in the semi-Markov model. It is unlikely that panel data provide sufficient information to be able to adequately fit more than the simplest phase-type distributions via this direct approach, although there may be more scope in cases where state misclassification does not occur, where recovery is not possible in the multi-state model, or where the data contain a mixture of interval censored observations and exactly observed transition times. In practice, a strategy for model building would be to fit a model using two-phase Coxian distributions to informally assess whether there is evidence against the Markov assumption. If this appears to be the case, then the methods in this paper can then be used to provide a more interpretable model. In smaller data sets it might only be possible to fit a model where one of the state has a non-exponential distribution.
A limitation of the method proposed is that it assumes the sojourn time in state r is independent of the transition, e.g. r→s, that occurs. This restriction was necessary for the BOS dataset where there are relatively few 1→3 or 2→1 transitions. In the semi-Markov literature, it is common to parameterize a model based on the transition probabilities of the embedded Markov process, and the conditional sojourn times. The phase-type approximations can be used to fit models in which the conditional sojourn times have Weibull or Gamma distributions by having a latent generator matrix Q whose blocks correspond to specific transitions rather than states. However, multi-state models in medical contexts usually parameterize in terms of the transition intensities and a more natural Weibull model would allow each intensity to be of a separate Weibull form e.g. \(q_{rs}(u) = \lambda_{rs} \alpha_{rs} (\lambda_{rs} u)^{\alpha_{rs}-1}\). Here the overall sojourn in state r is poly-Weibull. These can still be represented using the underlying phase-type approximations to the constituent competing Weibull distributions, but require a large number of latent states, e.g. a 5 phase approximation with R−1 competing transitions requires 5^{R−1} latent states. The necessity to compute large matrix exponentials in this case makes the method less computationally appealing. A more feasible compromise is where only one of the transitions out of state r has a Weibull intensity and the others are constant. More details of these model extensions is given in the Supplementary Materials. Similarly, in this paper we have assumed that all subjects are initiated in state 1 at time 0. Often the initial waiting time in the first state is unknown. Some discussion on how this could be incorporated into the current modelling framework is given in the Supplementary Materials.
8 Supplementary materials
Further details on additional model extensions, fitting the phase type distributions, assessing the fit of the approximations and a comparison of the Weibull and Gamma model with the 2 Phase Coxian Phase-type model for the BOS dataset are given in the Supplementary Materials document.