Classification and properties of acyclic discrete phase-type distributions based on geometric and shifted geometric distributions

Acyclic phase-type distributions form a versatile model, serving as approximations to many probability distributions in various circumstances. They exhibit special properties and characteristics that usually make their applications attractive. Compared to acyclic continuous phase-type (ACPH) distributions, acyclic discrete phase-type (ADPH) distributions and their subclasses (ADPH family) have received less attention in the literature. In this paper, we present the definition, properties, characteristics and PH representations of ADPH distributions and their subclasses with finite state space. Based on the definitions of geometric and shifted geometric distributions, we propose a distinct classification for the ADPH subclasses analogous to ACPH family. We develop the PH representation for each ADPH subclass and prove them through their closure properties. The advantage of our proposed classifications is in applying precise representations of each subclass and preventing miscalculation of the probability mass function, by computing the ADPH family based on geometric and shifted geometric distributions.


Introduction
Phase-type (PH) distributions, introduced by Neuts (1975Neuts ( , 1981, form a very general class of distributions that have been successfully applied in a wide variety of stochastic disciplines for the last few decades. Acyclic subsets of PH (APH) distributions, continuous and discrete, are the important and interesting subclass of PH distributions with triangular matrix representation. They constitute a versatile modeling tool for as much as firstly, they admit a unique minimal representation, called canonical form (Bobbio et al. 2003). Secondly, the canonical form would simplify the computation of the best approximation for given distribution, by not taking into account redundant parameters. Thirdly, the complexity of overall system model can be controlled by APH minimal representation (Cumani 1982). Special properties and characteristics of APH distributions usually make the analysis easier and they are highly applicable in mathematical and analytical approaches. Two important applications of APH distributions in stochastic modeling, namely smallersized representations, and estimation of the APH distribution parameters are reviewed in the following paragraphs.
The first application, smaller-sized representations, is one of the most interesting theoretical research questions in the field of APH distributions. The size of the matrix representations has a strong effect on the computational efforts which is needed in analyzing this kind of distributions. These representations, however, are not unique, and two representations of the same distribution can differ drastically in size. The systematic study of representations for acyclic continuous phase-type (ACPH) distributions has been initiated by Cumani (1982). In particular, he proves that every ACPH representation has a bidiagonal representation of the same or less order. Aside from the bidiagonal representation, he also provides two other canonical forms and straightforward procedures to transform one to others. In O'Cinneide (1989), the result of Cumani (1982) is extended and restated that every PH representation with a triangular PH generator has a Coxian representation. O'Cinneide (1991O'Cinneide ( , 1993 identifies liminality conditions without presenting algorithmic considerations. Mocanu and Commault (1999) introduce an extension of the triangular PH distributions (monocyclic distributions). They show that any PH distribution can be represented as a mixture of these simple sparse distributions. For two recent decades, most researchers have focused their attention on algorithms to construct the minimal representation of any ACPH distribution. He and Zhang (2008) provide an algorithm for computing minimal representations of APH distribution. This algorithm involves converting the given ACPH distribution to a representation that only contains the poles of the distribution and solving a system of nonlinear equations for each additional state. Pulungan and Hermanns (2008a) develop an algorithm to address the same problem which is considered by He and Zhang (2008). Their algorithm eliminates states from a representation until no further elimination is possible. He et al. (2011) present two new algorithms to find a Coxian representation for any PH representation with only real eigenvalues. Pulungan and Hermanns (2013) develop an algorithm that almost surely (i.e., with probability 1) finds the smallest possible representation of a given ACPH distribution. The algorithm is embedded in a simple, yet expressive calculus of delays, enabling the user to specify complex delay dependencies with the aid of convenient operations. The first exploration of acyclic discrete phase-type (ADPH) distributions is started by Bobbio et al. (2003), and they show that similar to the continuous case (Cumani 1982), the ADPH class admits a unique minimal representation, called canonical form. Bobbio et al. (2004) introduce a new parameter for DPH distributions named scale factor. This new parameter represents the time span associated with each step and can be viewed as a new degree of freedom since its choice largely impacts the shape and properties of a DPH distribution over the continuous time axis. They show that the case when the scale factor is strictly positive results in DPH distributions and if the scale factor is zero, the resulting class is the class of CPH distributions. New results on the canonical representation of DPH with 2 and 3 phases (DPH(2) and DPH(3)) as well as discrete MAP with 2 phases (DMAP(2)) are presented by Meszáros et al. (2014). They provide explicit formulas for parameter matching using these canonical forms, give moments and correlation bounds for these models and show their efficiency in fitting through numerical examples. The canonical representation of DPH distributions with 3 phases is investigated by . During the course of their investigation, they find that the problem of canonical representation of DPH distributions with 3 phases is far more complex than the one of CPH distribution with 3 phases. As a result, they distinguish 8 different subclasses of DPH distribution with 3 phases, while it is enough to distinguish 3 subclasses of CPH distribution with 3 phases for their canonical representation.
The second application, estimation of the PH distribution parameters, is a critical problem with several numerical limitations in practice. The difficulty of the fitting problem is largely related to the nonlinearity of the model and to the number of the parameters to be estimated (Bobbio and Telek 1994). Considerations of model parsimony have led many authors to constrain many of the PH transition rates to be the same or functionally related such as ACPH subclasses (Slud and Suntornchost 2014). One of the CPH subclass distributions, represented by the so-called Coxian distribution (Cox 1955), can be formally considered as resulting from a series of exponential stages with complex valued transition rates. Fitting a Coxian distribution of order n needs the estimation of 2n parameters. Even with the reduced number of parameters required for the Coxian distribution, estimation can still be problematic. This is due to the nonlinear expression and non-unique representations of the distribution which requires optimizing a number of parameters simultaneously (Marshall and Zenga 2012). In order to overcome these problems, various restrictions of the ACPH representation are defined by many authors.
A simple and popular restriction representation of ACPH distributions consists of mixtures of Erlang or hyper-Erlang distributions (HErD). Bux and Herzog (1977) develop a nonlinear estimation approach based on the matching of the first two moments coupled with the minimization of a distance measure with respect to the mixtures of Erlangs. Singh et al. (1977) consider series/parallel combinations of Erlang stages and estimated parameters by matching an equal number of moments by means of a Newton-Raphson numerical method. In some series of papers, Johnson and Taaffe explore the problem of matching the first three moments to a mixture of two Erlangs (Johnson andTaaffe 1989, 1990a;Johnson 1993). Thümmler et al. (2006) develop a new approach by an expectation-maximization (EM) algorithm for mixed-type distributions to compute MLEs of hyper-Erlang distributions (mixed-Erlang distributions). Since their approach focuses only on the hyper-Erlang distributions, the computation speed is improved over other algorithms such as Asmussen et al.'s EM algorithm (Asmussen et al. 1996).
Another popular restriction representation of ACPH distributions is hyper-exponential distribution introduced by Botta and Harris (1986). An ML estimation procedure for hyper-exponential distribution has been described by Harris and Sykes (1984). A new technique for fitting long-tailed data sets is proposed by Riska et al. (2004). This technique fits data sets with non-monotone densities into a mixture of Erlang and hyper-exponential distributions, and data sets with completely monotone densities into hyper-exponential distributions. Their method partitions the data set in a divideand-conquer fashion and uses the EM algorithm to fit the data of each partition into a hyper-exponential distribution. Sadre and Haverkort (2008) focus on the EM-based fitting of heavy-tailed distributed data to hyper-exponential distributions. They present a data aggregation algorithm which accelerates the fitting by several orders of magnitude.
The primary attempt to define the subclass of ADPH is given by Bobbio et al. (2003). They propose three canonical forms to introduce the subclasses of ADPH and present the ML estimation algorithm for one of them. Callut and Dupont (2006) mention some example of ADPH such as negative binomial, the mixture of negative binomials and the discrete Coxian distribution. They also present an EM algorithm considered as an adaptation to discrete distributions of the work of Asmussen et al. (1996), which handles CPH distributions. Table 1 summarizes the majority of studies performed on fitting algorithms and smaller-sized representation approaches related to APH distributions. Based on the reviewed literature, listed in Table 1, ACPH distribution and its subclasses have been extensively studied. Conversely, the ADPH distribution and its subclasses (ADPH family) have received very little attention, and most studies presented in the literature are just concentrated on general ADPH. Moreover, the ADPH family takes advantage of the canonical form, minimal representation as well as simplification of computation.
The present paper is concentrated on the definition, properties, characteristics, and PH representations of ADPH family with finite state space. In this research, a distinct classification is developed for the subclasses of ADPH distributions based on two different definitions of the geometric distribution. The advantage of ADPH classifications is in applying correct representation of each class and preventing miscalculation of probability mass function (pmf), by computing the ADPH family based on geometric and shifted geometric distributions. For example, Esparza et al. (2010) define pmf of Shifted negative binomial distribution while use PH representation of negative binomial distribution. In addition, all the subclasses of ADPH analogues to ACPH are introduced and the properties, characteristics and PH representations related to each subclass are calculated and proven.
The rest of the paper is organized as follows: Sect. 2 describes the basic definitions, notation, and properties of DPH distributions. Section 3 introduces the definition of Table 1 A review of APH studies based on the fitting algorithm and smaller-sized representation

Distribution
Fitting algorithm Smaller-sized representation ADPH distribution and presents two different representations of ADPH family based on two different definitions of the geometric distribution. The subclasses of ADPH distribution compared with the subclasses of ACPH distribution and some properties are proven. Finally, concluding remarks are given in Sect. 4.

Discrete phase-type distribution and their properties
The following subsections summarize the definition and main properties of DPH family of distributions.

Definition and notation
DPH distributions have been introduced and formalized by Neuts (1981) which are defined as the distribution of time until absorption in a discrete-state discrete-time Markov chain (DTMC) with n transient states, and one absorbing state. More precisely, assume that {X(n)} n≥0 denote the DTMC with finite state space S = {0, 1, 2, … , n} , where the absorbing state is numbered 0 and the transient states are numbered 1, 2,…,n. DPH distribution is defined by . The one-step transition probability matrix of the corresponding DTMC can be partitioned as where is a square matrix of dimension n, t is a column vector and 0 is a row vector of dimension n. Since P is a transition probability matrix, we have that T ij ≥ 0 and t i ≥ 0 ∀i, j ∈ S and + = where 1 is the column vector ones of the appropriate dimension n. The initial probability for transient and absorbing states is denoted with the row vector ( , 0 ) and 0 = 1 − .
The cumulative distribution function of the DPH distribution Z ∼ PH d ( , ) is calculated by the probability mass function is and the factorial moment is

Closure properties
One of the appealing features of PH distributions is that the class is closed under a number of operations. The closure properties are a main contributing factor to the popularity of these distributions in stochastic modeling. The DPH distributions inherit many properties from the CPH distributions (Maier 1991), and both of them are closed under addition, finite mixtures, and finite order statistics (Esparza et al. 2010). However, one of the most interesting properties of the DPH distributions is that they can represent in an exact way a number of distributions with finite support. Assume that Z i ∼ PH d ( (i) , (i) ) for i = 1, 2 are two independent DPH distributed random variables of order n i .
(1) Convolution of PH d : the sum has a DPH distribution of order n = n 1 + n 2 with representation Proof See Latouche and Ramaswami (1999), Theorem 2.6.1. Proof See Latouche and Ramaswami (1999), Theorem 2.6.4.
has a DPH distribution of order n = n 1 .n 2 + n 1 + n 2 + 1 with representation where r ∈ ℕ has a DPH distribution of order n = n 1 with representation

Acyclic discrete phase-type distributions and their subclasses
The DPH is defined as an acyclic DPH (ADPH) if its states can be ordered in such a way that matrix T is an upper triangular matrix (Bobbio et al. 2003). Based on this definition, matrix representation ( , ) has n 2 +n 2 parameters for the upper triangular matrix (T) and n − 1 free parameters for the initial probability vector . Same as ACPH, ADPH distributions can be divided into various subclasses depending on the structure of and which are shown in Table 2. The continuous analogous of these ADPH distributions is also illustrated in this table. The simplest DPH distribution is the geometric distribution that is defined by two ways, the geometric distribution and shifted geometric distribution (Kroese et al. 2013). In the following, we will give an overview of ADPH distributions based on geometric distribution and shifted geometric distribution and show some properties and characteristics related to them.

Subclasses of ADPH distributions based on geometric distribution
The DPH distributions are created by a system of one or more inter-related geometric distributions occurring in sequence or phases. The geometric distribution ( X ∼ G(p), with p ∈ (0, 1) ) is used to describe the time of first success in an infinite sequence of independent Bernoulli trials with success probability p. Then, X is the number of Bernoulli trials needed to get first success and its probability mass function is Pr(X = x) = (1 − p) x−1 p, for x = 1, 2, … . The DPH representation of geometric distribution (Kroese et al. 2013) is given by Eq. (11) and shown in Fig. 1. In all figures related to DPH representation, the absorbing state is numbered by 0 and the transient states are numbered by 1, 2, …, n.
The mean and variance of geometric distribution are E[X] = 1 p and Var[X] = 1−p p 2 , respectively. Negative binomial distribution ( X ∼ NB(n, p) ) is defined as a number of Bernoulli trials needed before the nth success and introduced as the sum of n independent random varia- for x = n, n + 1, … . Based on the definition of negative binomial distribution and using Eq. (5), the DPH representation of negative binomial distribution is given by Eq. (12) and illustrated in Fig. 2.  The mean and variance of negative binomial distribution are E[X] = n p and Var[X] = n(1−p) p 2 , respectively. The generalized negative binomial distribution ( X ∼ GNB(n, p i ) ) is considered as the next subclass of ADPH which is the general case of negative binomial distribution. Consider a set of different geometric distributions where their success probabilities p 1 , p 2 , … , p n are not necessarily identical. The GNB distribution is introduced as the sum of n independent random variables of geometric distributions with distinct parameters. The probability mass function is given by Eq. (13).

Derivation of GNB pmf
Let X G 1 , X G 2 , … , X G n be independent geometric random variables where their probability mass function is Pr(X = x) = (1 − p) x−1 p, for x = 1, 2, … . We assume that the probability mass function of S n = X GNB = ∑ n i=1 X G i is calculated by Eq. (13). The proof of this equation follows by induction on n based on Sen and Balakrishnan (1999). Equation (13) is trivially true for n = 1 , where by definition, ∏ n j=1 i≠j p j p j −p i ≡ 1 . Now suppose that the same equality holds Fig. 3 The DPH representation of GNB(n,p i ) when n = m , we shall show that the equation holds for n = m + 1 . Noting that S m+1 = S m + X G m+1 , we have U s i n g t h e g e o m e t r i c s u m f o r m u l a ∑ n 2 x=n 1 a x = a n 2 +1 −a n 1 a−1 , a ≠ 1 , and some simplifications, the above equation reduces to the following: By adding and subtracting the (m+1) st term of the first sum to the entire expression, we get Due to the finite sum of Lagrange polynomials (Yang et al. 2005), the second term on the right-hand side of above equation is equal to zero and the proof is completed. □ By Eq. (5), the convolution of different geometric distributions can be represented as a DPH distribution with Eq. (14) and the graphical representation is demonstrated in Fig. 3.
The mean of the generalized negative binomial distributed random variable is calculated as . For instance, we assume that X ∼ GNB(3, p 1 = 0.2, p 2 = 0.4, p 3 = 0.6) , the pmf of X is The DPH and diagrammatic representation of X are shown in Fig. 4. The mean and variance of X are E[X] = 9.1667 and Var[X] = 24.8611 , respectively. The mixed geometr ic distr ibution ( X ∼ MG (n, p i , i ) ) is a convex mixture of n geometric distributions. The probability mass function is The DPH representation of the mixed geometric distribution which is calculated by Eq. (6) is given by Eq. (15). Diagrammatic representation of mixed geometric distribution is presented in Fig. 5.
The kth factorial moment can be obtained as Thus, the first moment is obtained by and its variance is given by A m i xe d n e g a t i ve b i n o m i a l d i s t r i b u t i o n ( X ∼ MNB(m, n i , p i , i ) ) is considered as a mixture of m mutually independent negative binomial distribution weighted with the initial probabilities 1 , 2 , … , n , where i ≥ 0 and the vector is stochastic, i.e., ∑ n i=1 i = 1 . Let n i denote the number of phases of the ith negative binomial distribution.
Then the probability mass function is Pr( n i transient and one absorbing state. For m = 1 , a single negative binomial distribution is formed and the case that n i = 1 for all 1 ≤ i ≤ m represent a mixed geometric distribution. In order to calculate the DPH representation, the Eqs. (6) and (14) are applied that can be described by where i is calculated based on Eq. (14). Diagrammatic representation of mixed geometric distribution is shown in Fig. 6. Mixtures of general negative binomial and mixed geometric distribution are considered as discrete Coxian distributions ( X ∼ DCo(n i , p i , g i , i ) ). The initial probability vector is given by = (1, 0, … , 0) . It means that the process starts from phase one and then traverses through the n successive phases with different success probabilities p i . From phase i transition into the next phase i + 1 st can occur with probability g i or the absorbing state is reached with the complementary probability 1 − g i . The DPH representation of the discrete Coxian distribution is given by Eq. (18) and illustrated in Fig. 7.

Subclasses of ADPH distributions based on shifted geometric distribution
Shifted geometric distribution ( Y ∼ SG(p), with p ∈ (0, 1) ) is another, nonequivalent, definition of the geometric distribution ( X ∼ G(p) ) which describes the number of failures before the first success in an infinite sequence of independent Bernoulli trials. The shifted geometric distribution is completely characterized by its success probability p and the probability mass function is Pr(Y = y) = (1 − p) y p, for y = 0, 1, 2, … . The DPH representation of shifted geometric distribution is given by Eq. (19) and presented in Fig. 8.

Derivation of SG representation
Based on the definition of geometric and shifted geometric distribution, there is Y = X − 1 . It means that the geometric distribution is shifted by one unit. Therefore, by using Eq. (9), we can calculate the parameters of shifted geometric distribution as following  (20) In order to reach Eq. (21), we must determine the value of n and prove following Equation.
where ( n ) ij is the entry in the ith row and the jth column of a matrix n . To prove Eq. (22), induction on n is applied. Equation (22) is clearly true for n = 1 . Now suppose that the same equality holds when n = m , we shall show that the equation is hold for n = m + 1. (1 − p) n−(j−1) p j−1 for j = 1, … , n The DPH representation of SNB(n,p)

Derivation of SNB representation
By definition of negative binomial and shifted negative binomial distribution, the relation between X ∼ NB(n, p) and Y ∼ SNB(n, p) is Y = X − n . This implies that negative binomial is shifted by n unite to constitute the shifted negative binomial. Therefore, by using Eq. (9), the matrix SNB is equal to the matrix NB in Eq. (12) and the vector SNB is calculated by Eq. (21). □ Conceptual interpretation of the initial probability vector ( j ) is the discrete probability distribution of the number of failures before the (n-j+1) st success. In other words, the jth initial probability ( j ) states the probability of the j failures before the (n-j+1)th success.
The factorial moment of the shifted negative binomial distribution is given by Eqs. (23), is the gamma function defined by: The mean and variance of shifted negative binomial dist r i b u t i o n a r e E[Y] = n(1−p) p a n d Var [Y] = n(1−p) p 2 , respectively.
The generalized shifted negative binomial distribution ( Y ∼ GSNB(n, p i ) ) is considered as a general case of shifted negative binomial distribution. Its probability mass function is given by Eq. (25).

Derivation of GSNB pmf
Let Y SG 1 , Y SG 2 , … , Y SG n be independent shifted geometric random variables where their probability mass function is Pr(Y = y) = (1 − p) y p, for y = 0, 1, 2, … . The probability mass function (pmf) of generalized shifted negative binomial distribution is calculated by S n = Y GSNB = ∑ n i=1 Y SG i and Eq. (25). Same as the proof of pmf for generalized negative binomial distribution, the proof of Eq. (25) follows by induction on n. Equation (25)  Using the geometric sum formula, and some simplifications, the above equation reduces to the following: (23) By adding and subtracting the (m+1)th term of the first sum to the entire expression, we get Due to the finite sum of Lagrange polynomials, the second term on the right-hand side of above equation is equal to zero. Therefore, the proof is completed.
□ The DPH representation of generalized shifted negative binomial distribution is calculated by Eq. (9) and given by The mean of the generalized shifted negative binomial distributed random variable is calculated as Figure 10 shows the DPH representation of generalized shifted negative binomial distribution.
The mixed shif ted geometr ic distr ibution ( Y ∼ MSG(n, p i , i ) ) is a convex mixture of n shifted geometric distributions. The probability mass function is This distribution is also the mixed geometric distribution with shifted by one unit ( Y = X − 1 ), and its DPH representation and factorial moment are given by Eqs. (27) and (28) and shown in Fig. 11.

Fig. 11
The DPH representation of MSG(n, p i , π i ) Fig. 12 The DPH representation of MSG(3, p 1 = 0.2, p 2 = 0.3, p 3 = 0.7, π 1 = 0.1, π 2 = 0.5, π 3 = 0.4) The first moment is obtained as , and i t s v a r i a n c e i s g i v e n b y Var For instance, we assume that The mixed shifted negative binomial distribution ( Y ∼ MSNB(m, n i , p i , i ) ) is considered a mixture of m mutually independent shifted negative binomial distributions weighted with the probabilities 1 , 2 , … , n , where i ≥ 0 and the vector is stochastic, i.e., ∑ n i=1 i = 1 . Let n i denote the number of phases of the ith shifted negative binomial distribution. Then the probability mass function is 1, … . The state space includes ∑ m i=1 n i transient and one absorbing state. The DPH representation of the mixed shifted negative binomial distribution can be described by Eq. (29).
where i is calculated based on Eq. (14).

Conclusions and suggestions for future research
In this paper, we presented the definition, properties, characteristics and PH representations of acyclic discrete phasetype (ADPH) distributions and their subclasses (ADPH family). The simplest ADPH distribution is the geometric distribution defined by either of the two discrete probability distributions, the geometric or the shifted geometric distribution. Based on the two definitions of the geometric distribution, we proposed a distinct classification for the ADPH subclasses and introduced their definitions. The advantage of our proposed classifications is in applying precise representations of each subclass and preventing miscalculation of the probability mass function, by computing the ADPH family based on geometric and shifted geometric distributions. To this end, we developed the PH representation for each subclass and proved them by using the closure properties of ADPH, especially "shifted DPH." In addition, all the subclasses of ADPH analogous to ACPH are considered and their properties and characteristics are discussed. For further research, applying the proposed classification in real stochastic modeling and developing fitting algorithms based on the ADPH subclasses are suggested.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.