Reward Algorithms for Semi-Markov Processes

New algorithms for computing power moments of hitting times and accumulated rewards of hitting type for semi-Markov processes are developed. The algorithms are based on special techniques of sequential phase space reduction and recurrence relations connecting moments of rewards. Applications are discussed as well as possible generalizations of presented results and examples.


Introduction
In this paper, we study recurrent relations for power moments of hitting times and accumulated rewards of hitting type for semi-Markov processes and present effective algorithms for computing these moments.These algorithms are based on procedures of sequential of phase space reduction for semi-Markov processes.
Hitting times are often interpreted as transition times for different stochastic systems describing by Markov-type processes, for example, occupation times or waiting times in queuing systems, life times in reliability models, extinction times in population dynamic models, etc.We refer to works by Korolyuk, Brodi and Turbin (1974), Kovalenko (1975), Korolyuk andTurbin (1976, 1978), Courtois (1977), Silvestrov (1980b), Anisimov, Zakusilo and Donchenko (1987), Ciardo, Raymonf, Sericola and Trivedi (1990), Kovalenko, Kuznetsov and Pegg (1997), Korolyuk, V.S. and Korolyuk, V.V. (1999), Limnios andOprişan (2001, 2003), Barbu, Boussemart and Limnios (2004), istics for reduced semi-Markov processes, We also prove invariance of hitting times and their moments with respect to the above procedure of phase space reduction.In Section 4, we describe a procedure of sequential phase space reduction for semi-Markov process and derive recurrent formulas for computing power moments of hitting times for semi-Markov processes.In Section 5, we present useful generalizations of the above results to real-valued and vector accumulated rewards of hitting type, general hitting times with hitting state indicators, place-dependent and time-dependents hitting times and accumulated rewards of hitting type and give a numerical example for the corresponding recurrent algorithms for computing power moments of hitting times and accumulated rewards of hitting type for semi-Markov processes.

Semi-Markov processes and hitting times
In this section, we introduce Markov renewal processes and semi-Markov processes.We define also hitting times and accumulated rewards of hitting times, and give basic recurrent system of linear equations for their power moments, which are the main objects of our studies.
2.1.Markov renewal processes and semi-Markov processes.Let X = {0, . . ., m} and (J n , X n ), n = 0, 1, . . .be a Markov renewal process, i.e., a homogeneous Markov chain with the phase space X × [0, ∞), an initial distribution p = p i = P{J 0 = i, X 0 = 0} = P{J 0 = i}, i ∈ X and transition probabilities, Q ij (t) = P{J 1 = j, X 1 ≤ t/J 0 = i, X 0 = s}, (i, s), (j, t) ∈ X × [0, ∞).(1) In this case, the random sequence η n is also a homogeneous (embedded) Markov chain with the phase space X and the transition probabilities, As far as random variable X n is concerned, it can be interpreted as sojourn time in state J n , for n = 1, 2, . ... We assume that the following communication conditions hold: A: X is a communicative class of states for the embedded Markov chain J n .
We also assume that the following condition excluding instant transitions holds: Let us now introduce a semi-Markov process, where ) is a number of jumps in the time interval [0, t], for t ≥ 0, and . ., are sequential moments of jumps, for the semi-Markov process J(t).This process has the phase space X, the initial distribution p = p i = P{J(0) = i}, i ∈ X and transition probabilities Q ij (t), t ≥ 0, i, j ∈ X.
2.2.Hitting times and accumulated rewards of hitting type.Let us also introduce moments of sojourn times, Here and henceforth, notations P i and E i are used for conditional probabilities and expectations under condition J(0) = i.
Note that, e We assume that the following condition holds, for some integer d ≥ 1: The first hitting time to state 0 for the semi-Markov process J(t) can be defined as, where U 0 = min(n ≥ 1 : J n = 0) is the first hitting time to state 0 for the Markov chain J n .
The random variable W 0 can also be interpreted as a reward accumulated on trajectories of Markov chain J n up to its first hitting to state 0.
The main object of our studies are power moments for the first hitting times, Note that, As well known, conditions A, B and C d imply that, In what follows, symbol Y d = Z is used to denote that random variables or vectors Y and Z have the same distribution.
The Markov property of the Markov renewal process (J n , X n ) implies that following system of stochastic equalities takes place for hitting times, where: (a) W i,0 is a random variable which has distribution P{W i,0 ≤ t} = ) is a random vector, which takes values in space X × [0, ∞) and has the distribution , j ∈ X, t ≥ 0, for every i ∈ X; (c) the random variables W i,0 and the random vector (J i,1 , X i,1 ) are independent, for every i ∈ X.By taking expectations in stochastic relations (10) we get the following system of linear equations for expectations of hitting times In general, taking moments of the order r in stochastic relations (10) we get the following system of linear equations for moments where The system of linear equation given in (12) has, for r = 1, . . ., d, the same matrix of coefficients I − P 0 , where I = I(i = j) is the unit matrix and matrix P 0 = p ij I(j = 0) .This is readily seen that matrix P n 0 = P i {U 0 > n, J n = j} .Condition A implies that P i {U 0 > n, J n = j} → 0 as n → ∞, for i, j ∈ X and, thus, det(I − P 0 ) = 0. Therefore, moments E (r) i0 , i ∈ X are the unique solution for the system of linear equation (12), for every r = 1, . . ., d.
This is useful to note that the above remarks imply that condition A can be replaced by simpler hitting condition: Let denote matrix [I − P 0 ] −1 = g i0j .The elements of this matrix have the following probabilistic sense, The recurrent formulas for moments E (r) i0 , i ∈ X have the following form, for r = 1, . . ., d, This method of computing moments E (r) i0 , i ∈ X requires to compute the inverse matrix [I − P 0 ] −1 .
In this paper, we propose an alternative method, which can be considered as a stochastic analogue of Gauss elimination method for solving of the recurrent systems of linear equations (12).

Semi-Markov processes with reduced phase spaces
In this section, we describe an one-step algorithm for reduction of phase space for semi-Markov processes.We also give recurrent systems of linear equations for power moments of hitting times for reduced semi-Markov processes.
3.1.Reduced semi-Markov processes.Let us choose some state k ∈ X and consider the reduced phase space k X = X \ {k}, with the state k excluded from the phase space X.
Let us define the sequential moments of hitting the reduced space k X by the embedded Markov chain J n , Now, let us define the random sequence, This sequence is also a Markov renewal process with phase space Here, symbol * is used to denote the convolution of distribution functions (possibly improper), and Q In this case, the Markov chain k J n has the transition probabilities, Note that condition A implies that probabilities p kk ∈ [0, 1), k ∈ X.
The transition distributions for the Markov chain k J n are concentrated on the reduced phase space k X, i.e., for every i ∈ X, If the initial distribution p is concentrated on the phase space k X, i.e., p k = 0, then the random sequence ( k J n , k X n ), n = 0, 1, . . .can be considered as a Markov renewal process with the reduced phase If the initial distribution p is not concentrated on the phase space k X, i.e., p k > 0, then the random sequence ( k J n , k X n ), n = 0, 1, . . .can be interpreted as a Markov renewal process with so-called transition period.
Let us now introduce the semi-Markov process, where . .are sequential moments of jumps, for the semi-Markov process k J(t).
As follows from the above remarks, the semi-Markov process k J(t), t ≥ 0 has transition probabilities k Q ij (t), t ≥ 0, i, j ∈ X concentrated on the reduced phase space k X, which can be interpreted as the actual "reduced" phase space of this semi-Markov process k J(t).
If the initial distribution p is concentrated on the phase space k X, then process k J(t), t ≥ 0 can be considered as the semi-Markov process with the reduced phase k X, the initial distribution According to the above remarks, we can refer to the process k J(t) as a reduced semi-Markov process.
If the initial distribution p is not concentrated on the phase space k X, then the process k J(t), t ≥ 0 can be interpreted as a reduced semi-Markov process with transition period.
3.2.Transition characteristics for reduced semi-Markov processes.Relation (18) implies the following formulas, for probabilities k p kj and k p ij , i, j ∈ k X, It is useful to note that the second formula in relation (21) reduces to the first one, if to assign i = k in this formula Taking into account that k V 1 is Markov time for the Markov renewal process (J n , X n ), we can write down the following system of stochastic equalities, for every i, j ∈ k X, where: (a) (J i,1 , X i,1 ) is a random vector, which takes values in space X × [0, ∞) and has the distribution ) is a random vector which takes values in the space k X × [0, ∞) and has distribution ) and ) are independent random vectors, for every i ∈ X.
Let us denote, Note that, By taking moments of the order r in stochastic relations (22) we get, for every i, j ∈ k X, the following system of linear equations for the moments It is useful to note that the second formula in relation (26) reduces to the first one, if to assign i = k in this formula.
3.3.Hitting times for reduced semi-Markov processes.Let us assume that k = 0 and introduce the first hitting time to state 0 for the reduced semi-Markov process k J(t), where k U 0 = min(n ≥ 1 : k J n = 0) is the first hitting time to state 0 by the reduced Markov chain k J n .Let also introduce moments, Note that, The following theorem plays the key role in what follows.Theorem 1. Conditions A, B and C d assumed to hold for the semi-Markov process J(t) also hold for the reduced semi-Markov process k J(t), for any state k = 0.Moreover, the hitting times W 0 and k W 0 to the state 0, respectively, for semi-Markov processes J(t) and k J(t), coincide, and, thus, for every r = 1, . . ., d and i ∈ X, Proof.Holding of conditions A and B for the semi-Markov process k J(t) is obvious.Holding of condition C d for the semi-Markov process k J(t) follows from relation (26).
The first hitting times to a state 0 are connected for Markov chains J n and k J n by the following relation, where k U 0 = min(n ≥ 1 : k J n = 0).The above relations imply that the following relation holds for the first hitting times to state 0, for the semi-Markov processes J(t) and k J(t), The equality for moments of the first hitting times is an obvious corollary of relation (32).
We can write down the recurrent systems of linear equations ( 12) for moments k E (r) k0 and k E (r) i0 , i ∈ k X of the reduced semi-Markov process k J(t), which should be solved recurrently, for r = 1, . . ., d, where Theorem 1 makes it possible to compute moments i0 , i ∈ X, r = 1, . . ., d in the way alternative to solving recurrent systems of linear equations (12).
Instead of this, we can, first, compute transition probabilities and moments of transition times for the reduced semi-Markov process k J(t) using, respectively, relations ( 21) and ( 26), and, then, by solving the systems of linear equations (33) sequentially for r = 1, . . ., d.
Note that every system given in (12) has m equations for moments E (r) i0 , i ∈ X, i = 0 plus the explicit formula for computing moment E (r) 00 as function of moments E (r) i0 , i ∈ X, i = 0. While, every system given in (33) has, in fact, m − 1 equations for mo-

4, Algorithms of sequential phase space reduction
In this section, we present a multi-step algorithm for sequential reduction of phase space for semi-Markov processes.We also present the recurrent algorithm for computing power moments of hitting times for semi-Markov processes, which are based on the above algorithm of sequential reduction of the phase space.
Let us assume that p 0 + p i = 1.Denote as ki,0 J(t) = J(t), the initial semi-Markov process.Let us exclude state k i,1 from the phase space ki,0 X = X of semi-Markov process ki,0 J(t) using the time-space screening procedure described in Section 3. Let ki,1 J(t) be the corresponding reduced semi-Markov process.The above procedure can be repeated.The state k i,2 can be excluded from the phase space of the semi-Markov process ki,1 J(t).Let ki,2 J(t) be the corresponding reduced semi-Markov process.By continuing the above pro-cedure for states k i,3 , . . ., k i,n , we construct the reduced semi-Markov process ki,n J(t).
The process ki,n J(t) has, for every n = 1, . . ., m, the actual "reduced" phase space, The transition probabilities ki,n p k i,n ,j ′ , ki,n p i ′ j ′ , i ′ , j ′ ∈ kn X, and the moments ki,n e (r) . ., d are determined for the semi-Markov process ki,n J(t) by the transition probabilities and the expectations of sojourn times for the semi-Markov process ki,n−1 J(t), respectively, via relations ( 21) and ( 26), which take the following recurrent forms, for i ′ , j ′ ∈ ki,n X, r = 1, . . ., d and n = 1, . . ., m, and 4.2.Recurrent algorithms for computing of moments of hitting times.Let us ki,n W 0 be the first hitting time to state 0 for the reduced semi-Markov process ki,n J(t) and ki,n E (r) . ., d be the moments for these random variables.
By Theorem 1, the above moments of hitting time coincide for the semi-Markov processes ki,0 J(t), ki,1 J(t), . . ., ki,n J(t), i.e., for n ′ = 0, . . ., n, Moreover, the moments of hitting times kj,n E (r) i ′ 0 , i ′ ∈ ki,n X, r = 1, . . ., d resulted by the recurrent algorithm of sequential phase space reduction described above, are invariant with respect to any permutation k′ i,n = k ′ i,1 , . .., k ′ i,n of sequence ki,n = k i,1 , . . ., k i,n .Indeed, for every permutation k′ i,n of sequence ki,n , the corresponding reduced semi-Markov process k′ i,n J(t) is constructed as the sequence of states for the initial semi-Markov process J(t) at sequential moment of its hitting into the same reduced phase space k′ The times between sequential jumps of the reduced semi-Markov process k′ i,n J(t) are the times between sequential hitting of the above reduced phase space by the initial semi-Markov process J(t).
This implies that the transition probabilities ki,n p k i,n j ′ , ki,n p i ′ j ′ , i ′ , j ′ ∈ ki,n X and the moments ki,n e (r) . ., d are, for every n = 1, . . ., m, invariant with respect to any permutation k′ i,n of the sequence ki,n .
Let us now choose n = m.In this case, the reduced semi-Markov process ki,m J(t) has the one-state phase space ki,m X = {0} and state k i,m = i.
In this case, the reduced semi-Markov process ki,m J(t) return to state 0 after every jump and hitting time to state 0 coincides with the sojourn time in state ki,m J(0).
Also, by Theorem 1, moments, The above remarks can be summarized in the following theorem, which presents the recurrent algorithm for computing of power moments for hitting times.

Generalizations and examples
In this section, we describe several variants for generalization of the results concerned recurrent algorithms for computing power moments of hitting times and accumulated rewards of hitting type.
5.1.Real-valued accumulated rewards of hitting type.First, we would like to mention that Theorems 1 and 2 can be generalized on the model, where of the Markov renewal process (J n , X n ), n = 0, 1, . . .has the phase space X × R 1 , an initial distribution p = p i = P{J 0 = i, X 0 = 0} = P{J 0 = i}, i ∈ X and transition probabilities, In this case, we the random variable, can be be interpreted as a reward accumulated on trajectories of Markov chain J n up to its first hitting time U 0 = min(n ≥ 1, J n = 0) of this Markov chain to the state 0. Condition C d should be replaced by condition: As well known, in this case moments Ė(d given in Sections 3 -4, as well as Theorems 1 and 2 take the same forms as in the case of nonnegative rewards.
5.2.Vector accumulated rewards of hitting type.Second, we would like to show, how the above results can be generalized on the case of vector accumulated rewards.
It can be shown that the above linear system has the non-zero determinant.Thus, the moments E (r) i (a p ), a p = p r , p = 0, . . ., r uniquely determine the mixed moments E (q,r) i , 0 ≤ q ≤ r, for every r = 1, . . ., d, i ∈ X. 5.3.General hitting times with hitting state indicators.Third, the above results can be generalized on the case of more general hitting times, where U D = min(n ≥ 1, J n ∈ D), for some nonempty set D ⊂ X.
In this case main object of studies are power moments for the first hitting times with hitting state indicators, Note that, E As well known, conditions A, B and C d imply that, for any nonempty set D ⊂ X, E Note that the simpler condition A can, in fact, be replaced by a simpler condition: In this case, theorems, analogous of Theorems 1 and 2, take place, and recurrent systems of linear equations and recurrent formulas analogous to those given in Sections 2 -4 can be written down.
For example, let k E (r) D,ij , r = 1, . . ., d, i ∈ X, j ∈ D be the moments E (r) D,ij < ∞, r = 1, . . ., d, i ∈ X, j ∈ D computed for the reduced semi-Markov process k J(t), for some k / ∈ D. The key recurrent systems of linear equations analogous to (33) take, for every j ∈ D, nonempty set D ⊂ X and k / ∈ D, the following form, for r = 0, . . ., d, where The difference with the recurrent systems of linear equations (33) is that, in this case, the corresponding system of linear equations for hitting probabilities E (0) D,ij , i ∈ X should also be solved.Also, the corresponding changes caused by replacement of the hitting state 0 by state j ∈ D and set k X\{0} by set k X\D sould be taken into account when writing down systems of linear equations (52) instead of systems of linear equations (33).
5.4.Place-dependent hitting times.Fourth, the above results can be generalized on so-called place-dependent hitting times, where . This representation explains using of the term "place-dependent hitting time".
In fact, the above model can be embedded in the previous one, if to consider the new Markov renewal process ( Jn , X n ) = ((J n−1 , J n ), X n ), n = 0, 1, . . .constructed from the initial Markov renewal process (J n , X n ), n = 0, 1, . . .by aggregating sequential states for the initial embedded Markov chain J n .
The Markov renewal process ( Jn , X n ) has the phase space (X×X)×[0, ∞).For simplicity, we can take the initial state J0 = (J −1 , J 0 ), where J −1 is a random variable taking values in space X and independent on the Markov renewal process (J n , X n ).
Note that the simpler condition A can, in fact, be replaced by a simpler condition: The above assumption, that domain G is hittable, is implied by condition A, for any domain G containing a pair of states (i, j) such that p ij > 0.
The results concerned moments of usual accumulated rewards W D can be expanded to the place-depended accumulated rewards Y G for hittable domains, using the above embedding procedure.