Phase-type models for competing risks, with emphasis on identifiability issues

We first review some main results for phase-type distributions, including a discussion of Coxian distributions and their canonical representations. We then consider the extension of phase-type modeling to cover competing risks. This extension involves the consideration of finite state Markov chains with more than one absorbing state, letting each absorbing state correspond to a particular risk. The non-uniqueness of Markov chain representations of phase-type distributions is well known. In the paper we study corresponding issues for the competing risks case with the aim of obtaining identifiable parameterizations. Statistical inference for the Coxian competing risks model is briefly discussed and some real data are analyzed for illustration.


Introduction
A phase-type distribution can be defined to be the distribution of the time to absorption for an absorbing finite state Markov chain in continuous time. Initially, phase-type distributions received most attention in applied probability, in particular in queuing theory, generalizing the Erlang distribution. A much cited introduction to phase-type distributions is Neuts (1981). Important theoretical contributions are found in Cumani (1982) and a series of papers by O'Cinneide (e.g., O'Cinneide 1989). Aalen (1995) and (Aalen et al. 2008, Ch. 10) provide useful introductions aimed at applications in survival analysis.
There have recently appeared a number of articles involving the use of phase-type distributions in statistical modeling and inference. In particular there seems to be a particular interest in the use of so-called Coxian phase-type models, first suggested B Bo Henry Lindqvist bo.lindqvist@ntnu.no 1 Norwegian University of Science and Technology, Trondheim, Norway by Cox (1955). Their usefulness stems from the fact that they are able to model phenomena where the individuals go through stages (phases) in a specified order, and may transit to the absorbing state (corresponding to the event of interest) from any phase. Another reason for using Coxian phase-type models is that they apparently can be modeled by considerably fewer parameters than a general phase-type distribution. They are hence also better suited when covariates are included in the models. Coxian phase-type models have recently been successfully applied in health care studies. For example, Faddy et al. (2009), McGrory et al. (2009, Tang et al. (2012) and Rizk et al. (2021) model hospital length of stay by Coxian phase-type models.
The problem of fitting more general phase-type distributions to lifetime data has been considered in a frequentist setting using the EM-algorithm by Asmussen et al. (1996) and in a Bayesian setting using MCMC by Bladt and Gonzalez (2003), Aslett and Wilson (2011) and Laache (2014). Slud and Suntornchost (2014) advocated the use of parametric models based on phase-type distributions with a low number of parameters. One of their conclusions was that simple phase-type models can do almost as well as nonparametric methods in many lifetime studies.
The main purpose of the present paper is to study how the phase-type methodology can be modified to include competing risks, extending earlier work in this direction by Lindqvist (2013) and Lindqvist and Kjølen (2018). One then considers a finite state Markov chain with more than one absorbing state, each of which corresponds to a particular risk. The relevant observation will be the pair (T , C) where T is the time of absorption, while C denotes the absorbing state. This corresponds exactly to the observation of a classical competing risks model. Standard functions from the theory of competing risks (Lawless 2003, Ch. 9), (Aalen et al. 2008, Ch. 3) can now be given in terms of the transition matrix of the underlying Markov chain, see Sect. 3.1.
We will be particularly concerned with the uniqueness of parameterizations of phase-type models for competing risks. This will extend the study of Lindqvist and Kjølen (2018), who considered models with two transient states. It will also extend results from Cumani (1982), Telek and Horváth (2007) and Rizk et al. (2019) who considered uniqueness properties of parameterizations in the ordinary Coxian phasetype case.
Phase-type models with multiple absorbing states have earlier been considered by, e.g., McClean et al. (2010) and Rizk et al. (2021) in the modeling of patient pathways in hospitals. The former paper is concerned with the calculation of some key performance indicators like absorption probabilities and expected length of stay. The latter paper considers patient movements between a series of stations, each with the option of leaving the hospital. Their approach will be further considered in Sect. 3.4.
A large part of the present paper, particularly Sect. 2, concerns basic theoretical results and approaches on phase-type distributions that are essentially known and treated in several articles and books. They are needed here in order to explain their extensions to the competing risks case. For these purposes, it has also been necessary to handle different approaches and viewpoints in the literature. This applies to, for example, representations in terms of Laplace transforms involving poles, versus representations using transition matrices, involving eigenvalues.
The rest of this paper is organized as follows. In Sect. 2 we review basic results for ordinary phase-type distributions with a view towards uniqueness of their Markov chain representations. Representations for Coxian distributions are considered in particular. We also review briefly the basic functions of classical competing risks theory. In Sect. 3 we consider phase-type modeling for competing risks and show how results for the single absorbing state case are extended to multiple absorbing states. An example of fitting a Coxian competing risks model to real data is given. Some concluding remarks are given in Sect. 4. The paper is ended by an Appendix presenting technical derivations and discussions related to matrices, as well as the proof of one of the theorems.

Background theory
We shall let vectors and matrices be given by bold letters, respectively lowercase and uppercase. Vectors will always be assumed to be column vectors. We shall use I to mean the identity matrix, where the dimension will be clear from the connection. This also applies to the use of the vectors 0 and 1 which are, respectively, vectors of all 0s and all 1s. Transposes of vectors will be marked by , e.g. p . A vector of length r will for short be called an r -vector.

Phase-type distributions
Phase-type distributions can be described in terms of a continuous time Markov process {X (t); t ≥ 0}, where the system moves through some or all of m transient states, or phases, before moving to a single absorbing state m + 1. The time of absorption, T , is then said to have a phase-type distribution.
The infinitesimal transition matrix A of the Markov chain that produces the phasetype distribution, is an (m + 1) × (m + 1) matrix given in block form as Here Q is the m × m matrix corresponding to transitions between the transient states; is the m-vector defining direct transition intensities from the transient states to the absorbing state. Letting P(t) be the matrix of transition probabilities P i j (t) = P(X (t) = j|X (0) = i) it is well known (e.g., Ross 2010, Chapter 5) that and it is then straightforward to show that (1) implies Now let p be the m-vector with entries p i = P(X (0) = i) for i = 1, . . . , m, where m i=1 p i = 1. Thus p defines the initial distribution of the Markov chain, assuming that the chain starts in a transient state.
Note that the transition matrix A is completely determined by Q. This is because all row sums of A are 0 so that = −Q1. Hence a phase type distribution is determined by the pair (p, Q).
By using (2) we are able to express the standard functions describing the probability distribution of T in terms of the given matrix representation. The cumulative distribution function of T is Taking the derivative with respect to t we get the density function The survival function S(t) can be found from S(t) = 1 − F(t), or by noting that This leads in turn to the following expression for the hazard function of T ,

Representations and identifiability
As is well known, representations (p, Q) of phase-type distributions are not unique. The representation (p, Q) where Q is an m × m matrix, is said to be of dimension m. Following standard notation, the order of a phase-type distribution is the minimal dimension of all its representations. The unique way of representing a phase-type distribution is through its Laplace transform, f * (s) = E(e −sT ). It was shown by O'Cinneide (1990) that a distribution on the non-negative real numbers which is not the point mass at zero is a phase-type distribution if and only if (a) it has a strictly positive continuous density on the positive reals, (b) it has a rational Laplace transform with a unique pole of maximal real part.
For a given representation (p, Q), the Laplace transform can be written as which is hence a rational function of s, i.e., of the form f * (s) = N (s)/D(s) for polynomials N (s) and D(s) (see Example 1 below for a simple case). The degree of the denominator polynomial, D(s), after having canceled possible equal factors in the numerator and denominator, is called the degree of the phase-type distribution. It was shown by O' Cinneide (1990) that the order of a phase-type distribution is at least as large as its degree, but that it may be strictly larger. By the non-uniqueness of representations of phase-type distributions, it is of interest to consider the identifiability problem for these distributions, i.e., the problem of determining whether two representations (p (a) , Q (a) ) and (p (b) , Q (b) ) lead to the same phase-type distribution in the sense of having the same Laplace transform. In general, the problem of identifying representations (p, Q) for a given Laplace transform is a difficult one, where general recipes are still lacking. We refer to the concluding remarks in Sect. 4 for a further discussion of this.
We present below a theorem given by Telek and Horváth (2007) which compares two representations (p (a) , Q (a) ) and (p (b) , Q (b) ) of the same dimension. Following Telek and Horváth (2007), we shall say that a representation (p, Q) of dimension m is nonredundant if the degree of the corresponding phase-type distribution is m. For intuition, we may think of nonredundancy as a way of excluding representations of phase-type distributions where there are also representations with a lower dimension. Some simple illustrations are given in Example 1 below.
Example 1 (Redundant phase-type distributions) Let m = 2 and consider the representation (p, Q) given by where 0 ≤ p ≤ 1 and a > 0. Figure 1 illustrates the corresponding Markov chain. A calculation using (6) leads to Putting a = 1, this equals 1/(s + 1) which implies that (p, Q) is redundant for any value of p, with T having the standard exponential distribution. This may in fact be concluded directly from Fig. 1 by letting a = 1 and observing that the transition rate to the absorbing state is 1 from each transient state.
Letting instead a = 5, it can be seen that (p, Q) is redundant for p = 1/2 and p = 0. The resulting Laplace transforms are then, respectively, 1/(s +3) and 1/(s +5), corresponding to exponential distributions with respective rates 3 and 5.

Theorem 1 (Telek and Horváth 2007) Let (p (a) , Q (a) ) and (p (b) , Q (b) ) represent two nonredundant phase-type distributions with the same dimension m. Let their cumulative distribution functions be F (a) (t) and F (b) (t), respectively. Then F (a) (t) = F (b) (t) for all t if and only if there exists a nonsingular m × m matrix
For the proof, we refer to Telek and Horváth (2007). A further discussion of the result and the assumption of nonredundancy is given in Appendix A2. As a corollary to the result, Telek and Horváth (2007) stated the following result, which is also well known from the literature, e.g., O'Cinneide (1999).

Theorem 2 (Telek and Horváth 2007) A general nonredundant phase-type distribution of order m can be fitted by 2m − 1 independent parameters.
This result is indeed of special interest for phase-type modeling, since apparently a representation (p, Q) would need m 2 + m − 1 independent parameters.
The following example illustrates the result of Theorem 1. It shows in particular that B need not be nonnegative.

Canonical representations of Coxian distributions
The class of Coxian phase-type distributions are defined via Markov chains with m transient states, where the possible transitions from a transient state i is either to state i + 1 or to the absorbing state m + 1. More specifically, a Coxian distribution can be represented by the pair (p, Q) with p = (1, 0, . . . , 0) and Q given on the form It is seen that the Coxian distribution is then represented by 2m − 1 parameters (see Theorem 2). Fig. 1 illustrates the special case m = 2. Coxian phase-type distributions are, however, more general than they might look. In fact, O'Cinneide (1990) proved the remarkable result that any probability measure on (0, ∞) with rational Laplace transform with only real poles, and with a continuous positive density on (0, ∞), is a Coxian distribution.
The matrix Q in (10) is in particular upper triangular. The subfamily of phase-type distributions with upper triangular Q are of special interest in applications, since they represent non-cyclic Markov chains X (t). A special treatment of these distributions is found in the much cited paper Cumani (1982) (see also Bobbio and Cumani 1992 for a review of the main results). Working with Laplace transforms, Cumani (1982) proved that any phase-type distribution with an upper triangular Q can be represented uniquely in the form (p,Q) whereQ is given bỹ with λ 1 ≥ λ 2 ≥ . . . , ≥ λ m , and wherep = ( p m , p m−1 , . . . , p 1 ) is a probability vector. Figure 2 illustrates the representation. Observe that there are 2m − 1 free parameters.
The representation (p,Q) suggests a way of calculating the Laplace transform for the absorption time T . The distribution of T given X (0) = m − i + 1 is now a convolution of i independent exponential variables with rates λ 1 , λ 2 , . . . , λ i (see Fig. 2). It hence follows that the Laplace transform of T is where is the Laplace transform of the appropriate convolution. We may hence write f * (s) Consider now an arbitrary phase-type distribution with a given upper triangular Q and Laplace transformN (s)/D(s), say. As demonstrated by Cumani (1982), this model may be put in the above canonical form by letting D(s) =D(s) and deriving the p i in (12) by equating coefficients in the identity m i=1 p i g * (i) (s) =N (s)/D(s). Cumani (1982) noted that the resulting set of equations for the p i can be put in triangular form and is hence easy to solve and gives a unique solution with p i ≥ 0 and i p i = 1. The example below gives a simple illustration.
Example 3 Consider the representation (p, Q) in Example 1 with a = 5, p = 1. The resulting Laplace transform is by (8), .
The Coxian distributions as defined by (10) and having initial vector p = (1, 0, . . . , 0) are apparently still of main interest for statistical modeling and inference, see for example the references given in the Introduction. Actually, the representation using (10) with p = (1, 0, . . . , 0) and assuming the ordering λ 1 ≥ λ 2 ≥ . . . ≥ λ m , is presented by Cumani (1982) as an equivalent canonical version for upper triangular distributions, having a one-to-one explicit connection with the representation (p,Q). Note that the ordering of the λ i on the diagonal are opposite in the two matrices Q andQ. Example 3 (cont.) The canonical Coxian representation of the distribution in the first part of this example is The representation (14) is hence equivalent to the representation from (7), where the diagonal elements are interchanged.
Apparently, the Coxian representation (10) is well defined also when the eigenvalues λ i on the diagonal are ordered differently from the decreasing order. This is true, but it turns out that the set of phase-type distributions represented by the matrix (10), with any other predetermined order of the diagonal elements, may not contain all upper triangular phase-type distributions (see remark at the end of Example 4 below). This fact is inherent in Cumani (1982). Equivalently, the canonical representation of Fig. 2 requires an increasing order of the λ i from left to right.

Redundancy of Coxian distributions
Consider a Coxian distribution given in canonical form using (10). It is clear that if α k,k+1 = 0 for some k ∈ {1, 2, . . . , m − 1}, then this representation is redundant. In fact, the distribution can be represented by a matrix in the form (10) of dimension k. The above is, moreover, equivalent to having p k+1 = p k+2 = . . . = p m = 0 in the representation usingQ. As seen from the following example due to O'Cinneide (1989), the possible redundancy of Coxian phase-type distributions does, however, not always arise from cases where p j = 0 from some j on.
Example 4 (O'Cinneide 1989) Let the canonical representation (p,Q) be given by m = 4,p = (1/2, 0, 0, 1/2) and λ 1 = 4, λ 2 = 3, λ 3 = 2, λ 4 = 1. Hence p m = 0, so we cannot use the above observation to conclude redundancy. A calculation of the Laplace transform shows, however, redundancy because a factor (s + 4) is common in numerator and denominator. O'Cinneide (1989) pointed to the remarkable fact that, despite the redundancy, there is still no Markov chain representation with dimension m = 3 of the underlying phase-type distribution. This can be seen by solving for the p i in (12), where a negative value is found for p 3 , hence implying no proper solution.
As an additional observation from this example, the underlying phase-type distribution cannot be represented by a Coxian representation Q where the ordering of the λ i is reversed.

Identifiability properties of Coxian phase-type distributions
The above shows that when using Coxian models one should be careful about the ordering of the λ i on the main diagonal of (10). Knowing the λ i , there is always a valid and unique version of (10) with the λ i in decreasing order. If the λ i are put in another order, then valid representations may still be found, for example using Theorem 1. As noted above, however, changing the order of the λ i on the diagonal may lead to non-valid representations. Rizk et al. (2019) considered this problem and presented an algorithm that for any matrix of the form (10) finds all permutations of the diagonal entries that lead to valid representations in Coxian form. From Cumani (1982) follows that there is always at least one such representation, which however may be the only one as shown by an example in Rizk et al. (2019).
The algorithm of Rizk et al. (2019) is based on their Theorem 2, which we restate in the following theorem. This theorem gives an often useful condition for identifiability of Coxian phase-type models.  (1, 0, . . . , 0) . Let the corresponding cumulative distribution functions be F (a) (t) and F (b)

Classical competing risks
In competing risks, one observes the time to an event, T , in addition to the cause C ∈ {1, 2, . . . , K } for a positive integer K . The observation is hence a pair (T , C), where the joint distribution is completely specified by the subdistribution functions and the subdensities, given by respectively, for j = 1, 2, . . . , K . The interpretation of F j (t) is as the probability of failing from cause j before time t. The F j (t) are also called cumulative incidence functions.
As an extension of the concept of hazard function of a lifetime distribution, one considers the cause-specific hazard functions, where S(t) = P(T > t). The interpretation is that λ j (t) is the hazard rate from cause j conditional on survival up to time t. The ordinary Coxian phase-type model can now be extended to the competing risks case by allowing transitions to any of the K absorbing states m + 1, . . . , m + K from each of the transient states. The case K = 2 is illustrated in Fig. 3.

Representation of phase-type distributions for competing risks.
By extending the matrix (1) to include K absorbing states, we obtain the infinitesimal transition matrix of the modified Markov process to be the (m + K ) × (m + K ) matrix given in block form as Here Q is the m × m matrix corresponding to transitions between the transient states, while the m-vector is replaced by the m × K matrix L of transition intensities from Similarly to the derivation of (2), we obtain the matrix of transition probabilities P i j (t) given by Let the m-vector p be the initial distribution of the Markov chain. The triple (p, Q, L) thus determines a competing risks distribution. Observe that ≡ L1 is the vector of transition rates from the transient states to the lumped set of absorbing states, and hence corresponds to the vector of the ordinary phase-type model considered in Sect. 2.1. It follows that L1 = −Q1. Thus the m × K matrix L satisfies m conditions given by Q and has hence K m − m = (K − 1)m free parameters. Adding these to the 2m − 1 parameters of (p, Q), we have (K + 1)m − 1 parameters in the representation (p, Q, L).
From (16) we obtain expressions for the subdistribution functions, given by for j = 1, . . . , K , where p is the m-vector defining the initial distribution of the Markov chain and j is the jth column of L. By differentiation we get the subdensities while the cause-specific hazard rates are given by We shall below also need the Laplace transforms of the subdensities f j (t). Similarly to (6) we get

Identifiability of phase-type models for competing risks
Let (p, Q, L) be a phase-type model for competing risks. We shall call it nonredundant if (p, Q) is a nonredundant phase-type model as defined in Sect. 2.1. The next theorem extends the result of Theorem 1 to the competing risks case. The proof is given in Appendix A3.

Coxian phase-type models for competing risks
In the present subsection we specialize to the case of Coxian phase-type distributions for competing risks. These will be defined as the triple (p, Q, L) where p = (1, 0, . . . , 0), Q is given by (10), and L is an m × K matrix defined in the same way as in Sect. 3.1. Recalling that Q has 2m − 1 parameters, it follows from the reasoning of the cited subsection that the Coxian model has (K + 1)m − 1 parameters.
We can now prove the following result which extends Theorem 3 and is a simple consequence of Theorem 4. As for Theorem 3, we build upon the reasoning in Rizk et al. (2019). (1, 0, . . . , 0) . Assume further that the diagonals of Q (a) and Q (b) are ordered in the same way. Then if F (a)

Theorem 5 Consider two non-redundant Coxian phase-type distributions for competing risks, given by (p (a) , Q (a) , L (a) ) and (p
for all t > 0 and j = 1, 2, . . . , K , we have Proof Let B be the invertible matrix obtained by using Theorem 4 for the present situation. Corollary 1 of Rizk et al. (2019) shows that if p (a) = p (b) = (1, 0, . . . , 0) , then the matrix B is lower triangular. Moreover, they show that if the ordering of the diagonal elements of Q (a) and Q (b) are the same, then B is the identity matrix and hence Q (a) = Q (b) . Since B is the identity matrix, we conclude from Theorem 4 that L (b) = L (a) and we are done. (2018) (1, 0) and, respectively,

Example 5 (An example of non-uniqueness) Lindqvist and Kjølen
and It was shown that these different representations lead to the same subdensities, namely Since Q (a) = Q (b) this shows that the condition of equally ordered diagonals in Theorem 5 cannot be removed. Note also that Theorem 4 can be used to show that the two representations of (20) and (21) give rise to the same competing risks model, using the matrix

Canonical representation of Coxian competing risks distributions
The recent article by Rizk et al. (2021) motivates an extension of the canonical model (p,Q) of Sect. 2.3 to the case of multiple absorbing states. These authors model an emergency department with patients moving through a series of service stations, where each station is modeled by a Coxian phase-type distribution. Then in order to model the movement between stations, they include an additional absorbing state in each station. Hence one absorbing state represents patients leaving the hospital, while the other represents movement to the next station. Their clue is to model the case with two absorbing states using a mixture of two series models like the one considered in Fig. 2. As they note, this approach facilitates statistical inference and the inclusion of covariates. For example, matrix exponentials can then be avoided in the likelihood function.
In the following we use their idea to extend the canonical model of Sect. 2.3 to involve an arbitrary number K of absorbing states, thus obtaining a canonical model for the Coxian competing risks situation. We also demonstrate below that properties Fig. 4 The two possible paths of the canonical representation for Coxian competing risks models with K = 2 absorbing states. Here λ 1 ≥ λ 2 ≥ . . . ≥ λ m and i, j p i j = 1 of the single absorbing state case (Sect. 2.3) imply that this is a valid representation for any competing risks representation (p, Q, L) with upper triangular Q.
As indicated above, the idea is essentially to involve one canonical model of the form (p,Q) (or see Fig. 2) for each absorbing state, and then consider a mixture of them to represent the full competing risks situation. This is illustrated in Fig. 4 for the case K = 2. Let p i j be the probability of entering the system in state m − i + 1 (i = 1, 2, . . . , m) and being absorbed in state m + j ( j = 1, 2, . . . , K ). Once entered, the transition rates leading to the absorbing states are identical for each subchain. The parameters of the model are now the p i j , which sum to 1 and hence contribute K m − 1 parameters, in addition to the m transition rates λ i . Altogether, this gives (K +1)m −1 parameters.
In a similar way as for the single absorbing state case, the Laplace transforms of the subdensities f j (t) of the above representation, can be given in the form [recall (12)] where the g * (i) (s) are as given by (13). Consider then an arbitrary representation (p, Q, L) where Q is upper triangular. The corresponding Laplace transforms, defined in (19), are of the form (1989) pointed out that the canonical representation (p,Q) of Cumani (1982) also holds for subdensities (actually, O'Cinneide in his papers has included the possibility of a positive probability for T = 0). This proves that there are nonnegative p i j such that which by (24) implies that the representation presented above is equivalent to the (p, Q, L). Indeed, the nonnegativeness of the p i j is guaranteed by the result by Cumani (1982) and is a consequence of requiring λ 1 ≥ λ 2 ≥ . . . ≥ λ m . That the p i j sum to 1, follows since K j=1f * j (s) is the Laplace transform for the absorption time T , which is given in (12). Validity and uniqueness of the new representation can now be proven in the same manner as for the single absorbing state case as shown in Cumani (1982). Note the requirement that λ 1 ≥ λ 2 ≥ . . . ≥ λ m .
In a way similar to the single absorbing state case, an equivalent version of the above canonical model can be given in the form of a Coxian competing risks model with Q as given in (10) and λ 1 ≥ λ 2 ≥ . . . ≥ λ m . Rizk et al. (2021) present formulas for going from the above canonical representation to the Coxian model representation for the case K = 2. Example 5 (cont.) We show how to derive the canonical representation of the competing risks model considered in the first part of Example 5. First, calculate the Laplace transforms corresponding to f 1 (t) and f 2 (t) in (22) and (23), With notation as above, we further get by letting λ 1 = 5, λ 2 = 4, By equating coefficients on each side of (24) with the above functions, we arrive at p 11 = 0.40, p 21 = 0.25, p 12 = 0.20, p 22 = 0.15.
The alternative canonical Coxian model turns out to be simply (Q (b) , L (b) ), given in the beginning of Example 5.

Maximum likelihood estimation in the canonical model
As noted by Rizk et al. (2021), the representation for Coxian competing risks models given in the previous subsection, turns out to be very well suited for statistical inference. We consider below for simplicity the case of right-censored competing risks data without covariates, while a note on the inclusion of covariates is given in the end, following the suggestion of Rizk et al. (2021).
Let the observed survival time for individual k (k = 1, 2, . . . , N , say) be T k and let D k be the corresponding status variable, where D k = 0 if T k is a (right) censoring time j if T k is the time of absorption in state m + j, j = 1, 2, . . . , K .
Consider the modeling of these data using the model of Sect. 3.4. The parameters are hence p i j for i = 1, . . . , m; j = 1, 2, . . . , K , noting that p m K = 1 − (i, j) =(m,K ) p i j , and λ 1 , . . . , λ m . For the latter parameters, we shall assume strict inequalities in λ 1 > . . . > λ m , which simplifies the likelihood and which seems reasonable when there is no apriori modeled connection between the λ i .
Several authors have suggested using the EM-algorithm for maximum likelihood estimaton in phase-type models (see, e.g., Asmussen et al. 1996). The clue is then to let the states visited by the Markov chain on the way to absorption, define the latent observations. This simplifies the full likelihood and its maximization in the M-step, while the E-step is usually also rather straightforward to perform. As we shall see, the canonical representation considered here allows a very simple and transparent use of the EM-algorithm.
Let then the (latent) starting state for individual k be defined by the vector where X k,i j = 1 if the kth individual starts in state m − i + 1 in the jth subchain, while the other entries in X k equal 0. The likelihood contribution for an item that starts in state m − i + 1 with i ∈ {1, 2, . . . , m} and is absorbed in state m + j for j ∈ {1, 2, . . . , K } after a time t is where p i j is defined in Sect. 3.4, g (i) (t; λ) is the density of U 1 + . . . + U i when U for = 1, 2, . . . , m is the waiting time in state , which is exponentially distributed with rate λ , and λ = (λ 1 , . . . , λ m ). The density g (i) is given by For right censored observations, the likelihood contribution for an item that starts in state m − i + 1 in the jth subchain, and is right censored at time t, is Here, S (i) (t; λ) is the survival function of U 1 + . . . + U i , given by The contribution to the log-likelihood function from a single individual k can then be written (T k ; λ))), so the full log-likelihood function for the data plus latent variables is L = N k=1 L k . For the E-step of the algorithm we get for j = 1, . . . , K , The M-step is a maximization of the log-likelihood function L with respect to the parameters p i j and λ, with the restriction λ 1 > λ 2 > . . . λ m .
Following Rizk et al. (2021), if z k is a covariate vector for individual k, then we may let the covariates influence the rates λ i in the way λ i exp{β z k }. In particular, this will maintain the inequalities between the λ-parameters for each individual.
Example 6 (Real data case) As an illustration we fitted competing risks data from Beyersmann et al. (2012, Ch. 1). The data are observations from an intensive care unit, with the purpose to examine the effect of hospital-acquired infections. We analyzed the data for patients without pneumonia at admission to the unit. The time variable was length of stay at the intensive care unit, with two outcomes of interest, either alive discharge (D = 1) or hospital death (D = 2). There were 650 patients in these data, with 589 being discharged alive, 55 dead in hospital, and 6 right censored.
Using the model of Fig. 4 with m = 3, we obtained the estimatesp 31 = 0.8712, p 21 = 0.0410,p 32 = 0.0878,p 11 =p 22 =p 12 = 0.0000,λ 1 = 1.5147,λ 2 = 0.6938,λ 3 = 0.0946. Figure 5 shows the estimated cumulative incidence functions obtained from the phase-type model, together with nonparametric estimates found by using the Aalen-Johansen estimators (see, e.g., Borgan 1998). With the nonparametric curve serving as a "benchmark", the fit seems very good for the outcome 'hospital death', while the model with m = 3 is seemingly not able to pick up completely the steepness of the first part of the cumulative incidence function for the outcome 'alive discharge'. A seemingly better fit was obtained, on the other hand, using a model with m = 4 in Lindqvist and Kjølen (2018).

Concluding remarks
In this paper we have studied the concept of phase-type distributions and how they can can be extended to cover the modeling of competing risks in survival analysis. For a proper treatment of the competing risks case it has been necessary to review some basic parts of the theory of ordinary phase-type distributions, and in particular the so-called Coxian phase-type distributions. As already indicated in the Introduction, there appears currently to be a considerable research activity in the field of phase-type distributions. Indeed, the increased opportunity of handling large data sets using complicated models, has led to an interest in the use of phase-type models for example in health studies. One of the first papers of this kind was Faddy et al. (2009) who claimed the superiority of Coxian phase-type models over common parametric models like gamma and lognormal.
The new applications have also motivated new theoretical developments and extensions of the classical models. For example, while the present study has been limited to phase-type modeling via homogeneous Markov chains, there have recently been published papers which involve nonhomogenous Markov chains (Bladt and Yslas 2020), semi-Markov models (Garcia-Maya et al. 2021), and models including unobserved heterogeneity (Surya 2016). Guihenneuc-Jouyaux et al. (2000) considered a Markov model of disease progression with a single absorbing state, where noisy biomarker measurements, depending on the state of the Markov chain, were available.
The main emphasis of the present paper has been on identifiability of phasetype models and their extensions to competing risks, with motivation from statistical inference. A basic problem has been to characterize representations (p (a) , Q (a) ) and (p (b) , Q (b) ) which lead to the same phase-type distribution. Most of the discussion has involved representations of the same dimension, which perhaps is of most interest in practical statistical modeling. The case of upper triangular matrices Q, where phasetype distributions can be represented by simple canonical versions, is seemingly well understood, see Cumani (1982) and the more recent paper by He and Zhang (2006). On the other hand, the general case, which may include complex eigenvalues (poles), seems to have less established general results. For example, it is not known whether there exist a kind of canonical representation for such cases (see, e.g., O'Cinneide 1999).
Some remarkable, but seemingly not much developed, results for equivalence of representations (p (a) , Q (a) ) and (p (b) , Q (b) ) of different dimensions were given by Ryden (1996). The results were obtained by specializing more general results from Ito et al. (1992) for discrete time Markov chains. The underlying idea was that a phasetype distribution can be represented as an aggregated Markov chain, where there are two compartments in the aggregated process, {1, 2, . . . , m} and {m + 1}. There is hence a natural extension to the competing risks case. A further study in this direction will certainly be of interest.
Funding Open access funding provided by NTNU Norwegian University of Science and Technology (incl St. Olavs Hospital -Trondheim University Hospital).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
where there is exactly one Jordan block for each eigenvalue. This implies in turn that Q (a) and Q (b) are similar in the sense that there is an invertible matrix B such that see the proof of Theorem 1 in Telek and Horváth (2007). Furthermore, the above property of the Jordan representation is shown by Telek and Horváth (2007) to imply that B can be chosen so that B1 = 1. O'Cinneide (1989) introduced the notion of a simple phase-type generator Q. Let PH (Q) be the set of all phase-type distributions with representation (p, Q) for some p. Then Q is called simple if each distribution in PH (Q) has a unique representation (p, Q). It can be shown (He and Zhang 2006) that Q is simple if and only if the minimal polynomial of Q equals the characteristic polynomial. Thus, nonredundancy of (p, Q) implies that Q is simple. O'Cinneide (1989) showed that for two simple matrices Q (a) and Q (b) of the same dimension, PH (Q (a) ) = PH (Q (b) ) if and only if there is a permutation matrix U such that Q (a) U = UQ (b) . This condition is considerably stronger than the one of Telek and Horváth (2007) provided in our Theorem 1, where however the latter is concerned with comparison of two single distributions instead of two sets of distributions.

A3. Proof of Theorem 4
Suppose F  Theorem 1 then implies that there is an m × m invertible matrix B such that It follows by (17) that Multiplying this from the right by (Q − ρ I) m −1 and using (d) of "Appendix A1" with A = Q, we get γ 0 p (Q − ρ I) m −1 Z ≡ γ 0 c ,m −1 = 0 and hence γ 0 = 0 since c ,m −1 = 0. Putting γ 0 = 0 in (33) and multiplying from the right by (Q − ρ I) m −2 , we get γ 1 = 0. Continuing in this way we prove that γ r = 0 for r = 0, 1, . . . , m − 1. Then this can be done for any = 1, 2, . . . , v and the necessity part of Theorem 4 is proven. To prove sufficiency is, on the other hand, straightforward.