Laws of Large Numbers for Non-Homogeneous Markov Systems

In the present we establish Laws of Large Numbers for Non-Homogeneous Markov Systems and Cyclic Non-homogeneous Markov systems. We start with a theorem, where we establish, that for a NHMS under certain conditions, the fraction of time that a membership is in a certain state, asymptotically converges in mean square to the limit of the relative population structure of memberships in that state. We continue by proving a theorem which provides the conditions under which the mode of covergence is almost surely. We continue by proving under which conditions a Cyclic NHMS is Cesaro strongly ergodic. We then proceed to prove, that for a Cyclic NHMS under certain conditions the fraction of time that a membership is in a certain state, asymptotically converges in mean square to the limit of the relative population structure in the strongly Cesaro sense of memberships in that state. We then proceed to establish a founding Theorem, which provides the conditions under which, the relative population structure asymptotically converges in the strongly Cesaro sense with geometrical rate. This theorem is the basic instrument missing to prove, under what conditions the Law of Large Numbers for a Cycl-NHMS is with almost surely mode of convergence. Finally, we present two applications firstly for geriatric and stroke patients in a hospital and secondly for the population of students in a University system.


Introductory Notes
One of the most celebrated theorems in probability theory is the Law of Large Numbers (Grimmett and Stirzaker 2001). The Law of Large Numbers were also studied for finite Markov chains (Kemeny and Snell 1981). The Law of Large Numbers for a regular homogeneous Markov chain states, that if π j is the limiting probability of being in state j independent of the initial state, then also π j represents the fraction of time, that the process can be expected to be in state j for a large number of steps. The Law of Large Numbers for Markov chains is also linked with the Martingale Convergence Theorem (Kemeny et al. 1976). Laws of Large Numbers were also studied for non-homogeneous semi-Markov processes by Vadori and Swishchuk (2015). For Markov chains in general state spaces there exists a chapter on Laws of Large Numbers in Meyn and Tweedie (2009), where the theory of martingales is the main instrument for proving various types of LLN. These laws are of value for Markov chains exactly as they are for all stochastic processes: the LLN and CLT, in particular, provide the theoretical basis for many results in the statistical analysis of chains as they do in related fields. For this and other applications, the reader is referred to Hall and Heyde (1980).
In the present paper we will study the Laws of Large Numbers for Non-Homogeneous Markov Systems and for Cyclic Non-Homogeneous Markov systems. The theory of NHMSs has its roots in the use of Markov models in manpower systems, which started with the work of Young and Almond (1961) and Bartholomew (1963). Young's motive was the application of homogeneous Markov chain models in the British University system. Bartholomew created important multiple renewal theory models for various social processes and his first related book Bartholomew (1967) among other things, provided an important theoretical reference of applied probability style for everyone. The concept of Non-homogeneous Markov systems was first introduced in Vassiliou (1982). From then onwards a vast literature in a great variety of journals was created by many authors, a sample of which could be found in the review papers by Vassiliou (1997) and Ugwuowo and McClean (2000). The motive was to provide a more general framework for a number of Non-homogeneous Markov chain models in manpower systems. There is also a great variety of applied probability models, that could be accommodated in this general framework. Let us consider a population (system), which is stratified into classes (states) according to various characteristics. The members of the system could be sections of human societies, parts of the animal kingdom, populations of fisheries, biological micro-organisms, particles in a physical phenomenon, various types of machines, various types of cells or viruses of the human body etc. The members of the system are categorized into various states, according to the problem at hand. The set of states are assumed to be exclusive, so that each member of the system may be in one and only one state at a given time. We call population structure, the vector containing the number of members of each state in the system. Members are leaving the system in a stochastic way and also new members are entering the system in a stochastic way. In fact a non-homogeneous Markov chain is a NHMS with one particle as a member, which never leaves the system and in which no other particles enter.
There are a large number of applications of the theory of NHMS and in quite diverse areas, where the present results will have an impact. We will only refer to some of these applications that contribute to the health care of human beings. For example, applications to the evolution of the population of HIV virus within the human of T-cells in Mathiew et al. (2006), and Foucher et al. (2005); gene expression sequences in McClean et al. (2003); in hospital and geriatric patient care McClean et al. (1998a, b), Taylor et al. (2000), Faddy and McClean (2005), Garg et al. (2010), McClean and Millard (2007), Marshall et al. (2002), McClean (2003, 2004). Garg et al. (2009), Lalit et al. (2010, McClean et al. (2014a) and McClean et al. (2014b).
The paper is organized as follows: In Section 2 we provide basic concepts and useful results for a NHMS, which are known or slightly amended. Also we provide some useful in what follows theorems on the various modes of convergence of random variables in a probability space. The results in this section will be used repeatedly in the sections that follow. In Section 3 we first prove a theorem which is a Law of Large Numbers for a NHMS. We prove, that for a NHMS under certain conditions the fraction of time that a membership is in a certain state, asymptotically converges in mean square to the limit of the relative population structure of membership in that state. In a second theorem in the same section we provide and prove under what conditions the mode of convergence in the previous basic result is almost surely. In Section 4 we study the important category of Cyclic NHMS, a concept which was motivated by the work of Gani (1963) on students enrolment at Michigan state University and Bartholomew (1982). We prove in two theorems, under what conditions the relative population structure of a Cycl-NHMS asymptotically converges in the strongly Cesaro sense. In Section 5 we first prove a theorem which is a Law of Large Numbers for a Cycl-NHMS. We prove, that for a Cycl-NHMS under certain conditions the fraction of time that a membership is in a certain state, asymptotically converges in mean square to the limit of the relative population structure in the strongly Cesaro sense of membership in that state. We then proceed to establish a founding Theorem, which provides the conditions under which the relative population structure asymptotically converges in the strongly Cesaro sense with geometrical rate. This theorem is the basic instrument missing to prove, under what conditions the Law of Large Numbers for a Cycl-NHMS is with almost surely mode of convergence. In Section 6 we provide applications of the present results in Section 3 to geriatric and stroke patients. Also, we provide applications of the results in Sections 4 and 5 for the movements of students in a University system.

Basic Concepts and Useful Results for a NHMS in Discrete Time
We firstly recall the concept of an NHMS and introduce concepts and known results necessary for the study of the Law of Large numbers for NHMSs. Consider a population (system) which is stratified into classes (states) according to various characteristics. Let S = {1, 2, ..., k} be the set of states, that are assumed to be exclusive and exhaustive. Let, that we have a discrete time scale t = 0, 1, 2, ... and {P (t)} ∞ t=0 be the sequence of transition probability matrices between the states. Assume, that we have wastage from the system and denote by ω the state which represents the external environment of the system to which the population members, who leave the system go. Let {p ω (t)} ∞ t=0 be the vector of probabilities of wastage from the various states of the system. Let {T (t)} ∞ t=0 be the total number of memberships of the system at time t, which is assumed to be a realization of a known stochastic process. We assume that each member holds a membership, which is left, when the member leaves the system and is taken by new members entering the system to replace leavers or to expand the system. Apparently, T (t) ≥ 0 and it is assumed that t=0 be the vector of probabilities of allocation of replacements and new memberships, in the various states of the system, which is being done independently of internal movements. Denote by Q (t) = P (t) + p ω (t) p 0 (t); then Q (t) is a stochastic matrix, and the non-homogeneous Markov chain defined by the sequence {Q (t)} ∞ t=0 will be called the imbedded non-homogeneous Markov chain of the NHMS. Define by N i (t) the random variable representing the number of memberships in state i at time t; N (t) = [N 1 (t) , N 2 (t) , ..., N k (t)] the vector of the random variables representing the population structure of the NHMS. Let q (t) = N (t) /T (t) be the relative population structure. Define by q (s, t) = [q 1 (s, t) , q 2 (s, t) , ..., q k (s, t)] , (2.2) then from Georgiou and Vassiliou (1992) p.140 we get that (2.4) also we get that from which recursively we get (see Georgiou and Vassiliou (1992) We set Q (s, t) = I the identity matrix for s > t. Note also that we set q (s, t) = 0 for s > t.
We denote by and apparently we have Let M n,m (R) be the vector space of all n × m real matrices SM n,n ; the vector space of all n×n stochastic matrices. Let Q ∈ SM n,n ,then it is regular if it's states consist of a single communicating class which is aperiodic or equivalently Q has 1 as the only eigenvalue with modulus 1 and with geometric multiplicity one. For A ∈ M n,n (R) we define the norm . Vassiliou (1981) we get the following theorem:

Theorem 1 Let a NHMS and let that
t=0 be a sequence of transition matrices corresponding to a nonhomogeneous Markov chain. If lim t→∞ Q (t) − Q = 0 where Q is weakly ergodic, then the chain is strongly ergodic.
Following the steps of the proof of Theorem 1 in Vassiliou (1981) and using Theorems 2 and 3 we arrive at Let a probability space ( , F, P) and a sequence of random variables {X n } ∞ n=0 with X n : → R. It is well known that there are various modes of convergence of the sequence {X n } ∞ n=0 to a random variable X : → R. We now provide the formal definition of three of these modes.
Definition 1 Let a probability space ( , F, P) and a sequence of random variables {X n } ∞ n=0 with X n : → R and a random variable X : → R. We say that the sequence of random variables {X n } ∞ n=0 converge almost surely to the random variable X if the event {ω ∈ : X n (ω) → X (ω) as n → ∞} , has probability one. We will denote this type of convergence by X n a.s. → X or lim n→∞ X n = X a.s.
Definition 2 Let a probability space ( , F, P) and a sequence of random variables {X n } ∞ n=0 with X n : → R and a random variable X : → R. If |X n | and |X| are in L p where 1 ≤ p ≤ ∞, i.e., E X p n < ∞ for all n and E [|X|] < ∞, then we say that the sequence of random variables {X n } ∞ n=0 converges to X in p-th mean and we denote it by One of the most useful modes of convergence is the mean square, that is, for p = 2 we have Definition 3 Let a probability space ( , F, P) and a sequence of random variables {X n } ∞ n=0 with X n : → R and a random variable X : → R. We say that X n → X in probability, and we write X n From Grimmett and Stirzaker (2001) p.311 we get the following Theorem.

Theorem 5 Let a probability space ( , F, P) and a sequence of random variables
Note that any sequence {X n } ∞ n=0 which satisfies X n P → X necessarily contains a subsequence X n i : 1 ≤ i < ∞ which converge almost surely. From Grimmett and Stirzaker (2001) p.314 we get the following Theorem.
Theorem 6 Let a probability space ( , F, P) and a sequence of random variables {X n } ∞ n=0 with X n : → R and a random variable X : → R. If X n P → X, there exists a non-random increasing sequence of integers n 1 , n 2 , ... such that X n i a.s.
Also from Grimmett and Stirzaker (2001) p.310 we get the following Theorem.
Theorem 7 Let a probability space ( , F, P) and a sequence of random variables {X n } ∞ n=0 with X n : → R and a random variable X : Theorem 8 (Chebychov inequality). Let a probability space ( , F, P) and a random variable X : → R. Then From Huang et al. (1976) we amend slightly the basic theorem to get that Theorem 9 Let a probability space ( , F, P) and a non-homogeneous Markov chain be defined by the sequence of transition matrices{Q (s, t)} s,t . Let lim t→∞ Q (t) − Q = 0 geometrically fast with Q a regular stochastic matrix. Then Q (s, t) − Q = 0 geometrically fast uniformly in s. That is, for every s there exists constants c > 0 and 0 From Vassiliou and Georgiou (1990) p.541 we get the following theorem: Theorem 10 Let an NHMS be given with the rate of convergence is geometric in all cases and Q = P + p ω p 0 is regular. Also, converges to zero geometrically fast.
Then the sequence of relative structures converges to q

Laws of Large Numbers for a NHMS
In the present section we will study the Law of Large Numbers for a NHMS. We will start with the mode of mean square convergence and then we will proceed to prove almost sure convergence. Let X t the random variable representing the state of a membership at time t. Define by (3.1) also let y j (t) be the random variable representing the number of times the membership is in state j up to time t, i.e., X s = j , 1 ≤ s ≤ t; ν j (t) be the random variable representing the fraction of time the membership is in state j up to time t. We have that Denote by and We will now provide and prove the following theorem of the Law of Large Numbers for a NHMS Theorem 11 Let a probability space ( , F, P) and a NHMS be defined in Section 2. Assume that a) lim t→∞ Q (t) − Q = 0 and Q a regular stochastic matrix; b) Proof It is equivalent to show that Since the dimension of the vectors is finite it is equivalent to show that (3.10) Similarly we get that It is easy to see that Finally it remains to find Define by n ∧ l = max {n, l} and n ∨ l = min {n, l} then we have that Hence from Eqs. 3.13 and 3.14 we get that j (n, l) = (by Theorems 1,2 and 3) Hence from Eqs. 3.6, 3.7,..., 3.15 we get Eq. 3.4 which completes the proof.
Hence, we have actually proved, that under certain conditions the fraction of time the membership of an NHMS stays in a state after a large number of steps, converges in mean square to the limit of the relative population structure in that state. This result constitutes the Weak Law of Large Numbers for a NHMS. We are now going to proceed and prove under which conditions the mode of convergence is almost surely.
Theorem 12 Let a probability space ( , F, P) and an NHMS as defined in Section 2. Assume that a) lim t→∞ Q (t) − Q = 0 geometrically fast and Q a regular stochastic Proof In Theorem 11 we have actually proved that from Theorem 5 (a) and Eq. 3.17 we get that from Theorem 5 (b) and Eq. 3.18 we get that By Theorem 6 there exists a non-random increasing sequence of integers t 1 , t 2 , ... such that (3.20) We will now prove that such a choice of subsequence is t i = i 2 for i = 1, 2, .... In order to do so it is sufficient by Theorem 7 to show that (3.21) By Theorem 8 , that is, Chebychov inequality we get that (3.22) Therefore, we should prove that (3.23) From Eqs. 3.10, 3.11, 3.12 and 3.15 we get that (3.24) From Theorem 10 we get that there exists constants c > 0 and 0 < b < 1 such that (3.25) From Theorem 9 we get that there exists constants c 1 > 0 and 0 < b 1 < 1 such that (3.26) From Eqs. 3.24, 3.25 and 3.26 we get that Hence, we have proved that ( 3.27) We have that from which we get

Convergence in the Cesaro Sense for Cyclic NHMS
In the present section we study convergence of the relative population structure in the Cesaro sense for an NHMS which undergoes a cyclic behavior. This is a founding step in order to study Laws of Large Numbers in Cyc-NHMS in the next section. The importance of cyclic behavior was firstly stressed in Bartholomew (1982) p.71. The motive was Gani's (1963) study of student enrolment at Michigan state University. A general theorem for the limiting behavior of the expected population structure for a Cyc-NHMS was given in Vassiliou (1984). Also, the asymptotic variability of nonhomogeneous Markov systems under cyclic behavior was studied in Vassiliou (1986). Georgiou and Tsantas (1996) studied asymptotic attainability of nonstationary cyclic Markov systems as a natural extension of Cyc-NHMS. We now provide the definition of a Cyc-NHMS Definition 4 Let a probability space ( , F, P) and an NHMS as defined in Section 2. We say that the NHMS undergoes a cyclic behavior with period d if and only if for all m = 1, 2, ... and s = 0, 1, ..., d − 1 P (md + s) = P (s) ; p k+1 (md + s) = p k+1 ; p 0 (md + s) = p 0 . (4.1) It is apparent that for a Cyc-NHMS with period d we have that for all m = 1, 2, ... and s = 0, 1, ..., d − 1 (4.2) We now define the following stochastic matrices It is well known that if Q 0 is a regular stochastic matrix then lim t→∞ Q t 0 = Q ∞ 0 a stable matrix or equivalently We will now provide the following Proposition: Proposition 1 Let a probability space ( , F, P) and a Cyc-NHMS. If Q 0 is a regular stochastic matrix then Proof Since Q ∞ 0 is a stable stochastic matrix we have that therefore we have that hence, for every > 0 there is a t 0 such that for t ≥ t 0 From Vassiliou and Georgiou (1990) p.541 we get the following Lemma

Lemma 1 Let a probability space ( , F, P) and a NHMS be defined in Section 2. Suppose that the sequence T (t) T (t)
∞ t=0 converges to zero geometrically fast with T (t) ≥ T (t − 1). Then {T (t)} ∞ t=0 converges geometrically fast.

Remark 1 The assumption lim t→∞ T (t) T (t) = 0 allows for lim t→∞ T (t) = ∞.
We will now prove the following theorem

) is a regular stochastic matrix; lim t→∞ T (t) T (t) = 0 geometrically fast then the sequence E [q (0, t)] splits into d subsequences with limits
Proof Without loss of generality assume that t = md + s. Due to the fact that we have a Cyc-NHMS we get that Since Q 0 is a regular stochastic matrix it is easy using Eq. 4.6 to see that Let r ≤ s − 1 then it is not difficult to see that On the other hand when r > s − 1 then (4.14) The expression in Eq. 4.10 could be written as is bounded by the series We now have that (4.20) From Proposition 1, Eqs. 4.16, 4.18 and 4.20 we get that In a similar way we arrive at (4.23) Now from Eqs. 2.6 and 4.23 we get that for lim t→∞ (4.24) for s = 0, 1, ..., d − 1.
We will now introduce the concept of Cesaro strongly ergodic for a Cycl-NHMS: We call the q (∞) the Cyclic strong run distribution for the NHMS.
We will now provide a basic theorem on the Cesaro convergence for a Cycl-NHMS.

Theorem 14 Consider a Cycl-NHMS and let that:
then the Cycl-NHMS is Cesaro strongly ergodic in the sense that T (t) = 0 geometrically fast then the Cycl-NHMS is Cesaro strongly ergodic in the sense that Proof We start with the first part, that is (a) and (b) hold. Since the Cycl-NHMS is of finite size it is sufficient to show that Let a b the integer part of the division then we have that From (a), (b), Theorem 13 and the fact that the series is an arithmetic mean we get that Also lim t→∞ [t/d] /t = 1/d therefore from Eqs. 4.28 and 4.29 we get that Now it is easy to see that The second part of the Theorem is proved in a similar way.

Laws of Large Numbers for a Cycl-NHMS
We are now in a position to study the first Law of Large Numbers for a Cycl-NHMS. We will start with the mode of mean square convergence. Let X t , u j (t) , y j (t) and ν j (t) be defined as in Section 3. We will now provide and prove the following theorem of the Law of Large Numbers We have that (5.7) We start with relation (5.5) (5.8) Similarly we get that It is easy to see that (5.10) The term (5.4) could be written as Now from Eqs. 5.11 and 3.14 we get that = (by Theorem 14 and 13) (5.12) From Eqs. 5.3, 5.4,..., 5.12 we get Eq. 5.1 and that completes the proof.
We will now establish under what conditions the L 2 convergence of the Law of Large numbers we proved in Theorem 15 holds for almost sure convergence also. In order to do so we need the following founding Theorem which provides the conditions under which the Cesaro convergence in Theorem 14 is with geometrical rate.
Proof From the fact that Q 0 is a regular stochastic matrix we know that There exists c 0 > 0 and 0 From the end of the proof of Proposition 1 we get that hence the convergence is geometric. Now we have From Vassiliou and Georgiou (1990) we know that since lim t→∞ T (t) /T (t) = 0 geometrically fast, then the sequence {T (t)} ∞ t=0 converges geometrically fast to a positive scalar T and so there exists c 1 > 0 and 0 < b 1 < 1 such that We start with the first part of the right hand side of Eq. 2.6 for the case of Cycl-NHMS: (from (5.15) and (5.16)) ≤ c 2 b 2 with c 2 > 0 and 0 < b 2 < 1.
(5.17) from which we get that I 2 goes geometrically fast to zero. Therefore relation (5.18) converges geometrically fast to zero. In a similar way one could prove, that the convergence in (4.22) is geometrically fast. Hence, we get that Now following the steps of Theorem 14 it is easy to show that in a geometrical rate.
Having proved this basic result, we are now in a position following the steps of the proof of Theorem 12 where the role of Theorems 9 and 10 is now played by Theorem16 to arrive at the following theorem.

Geriatric and Stroke Patients
In the present section we present two types of applications. The first one in the present subsection is a general Coxian phase type model, special forms of which has been used as stochastic models for geriatric patients and stroke patients by McClean and her co-authors McClean et al. (1998a, b), Taylor et al. (2000), Marshall et al. (2002), McClean (2003, 2004), Garg et al. (2010), McClean et al. (2014a. In these applications in the basic model, we distinguish three states which are called hospital pathways. In the case of geriatric patients the states are the "Acute Care", the "Rehabilative" and the "Long Stay". From each state we have movements outside the hospital due to discharge or death. Also, geriatric patients may be thought of as progressing through stages of acute care, rehabilitation and long-stay care, where most patients are eventually rehabilitated and discharged. Geriatric medical services are an important asset in the care of the elderly, while at the same time they can be easy victims of the political pressure on savings in health care expenditure. Note that the number of pathways could be increased and the criterion is what best fits the data. However, there is no reason to consider a larger number of states in here due to the restriction of space. It is of importance in the best management of hospital resources and certainly to the benefit of geriatric patients to know the tendencies of the system in the long run. That is, what proportion of the total population is going to be in each state. In the case of stroke patients there are more types of transitions due to the nature of stroke, which allows for relapses and hence more transitions among the hospital pathways. The model we will illustrate in what follows could be easily adjusted for both cases. Consider a hospital which starts with T (0) = 400 patients and in a very short time reaches its full capacity of 435 patients. That is T (1) = 420, T (2) = 430, T (3) = 435. Assume three hospital pathways and let that the initial relative population structure is q (0) = (0.5 0.25 0.25).
The vast majority of new patients enter the system in hospital pathway one, either by taking an empty place or as a virtual replacement of a discharged patient, that is q 11 (t) = p 11 (t)+ p 14 (t) p 01 (t). In here state 4 expresses the external environment. The entrance probabilities are p 0 (0) = (0.6 0.3 0.1); p 0 (1) = (0.5 0.3 0.2); p 0 (2) = (0.75 0.25 0.1) and p 0 = p 0 (t) = (0.7 0.2 0.1); for t = 3, 4, .... The form of the transition probability matrices according to the stochastic model for movements in the hospital is the following also the inherent non-homogeneous Markov chain will evolve with the sequence of stochastic matrices We get the following typical set of Q (t)'s which are easily estimated from the data by its maximum likelihood estimates: where the convergence is geometrically fast, that is for t = 5 it already converges. Now simulating Theorem 11 we find that 0 .01380 t = 6 0 .0007 ===== ================ From the above table it is apparent that Analogous results are found also for the remaining of the hospital pathways. That is ν (t) L 2 → (0.41 0.37 0.22). Now, since as we have seen T (t) /T (t) → 0 geometrically fast, that is, for t = 3, and since as we have seen lim t→∞ Q t converges to Q (∞) geometrically fast, that is, for t = 5, then according to Theorem 12 we have ν (t) a.s → (0.41 0.37 0.22).
One of the uses for hospital planning based on the above result is, that a membership of a patient remains in hospital pathway 1, in the long run, almost surely 0.41 of the time the hospital is in operation. Another useful physical meaning for hospital planning, is that the relative population structure of the memberships in the various hospital pathways tends asymptotically almost surely to (0.41 0.37 0.22).

A University System
In this subsection we illustrate an application of a cyclic non-homogeneous Markov system in a University system. The importance of cyclic behavior was firstly stressed in Bartholomew (1982), p.71, where he also provided an interesting application of this concept which arose in Gani's (1963) study of student enrolment at Michigan State University. We consider the university system in Vassiliou and Tsantas (1984) with 3 years of study where the students that fail their year repeat it in the following year. The estimates of the transition probability matrices taken from Vassiliou and Tsantas (1984)  We observe that E [q (0, 3t)] converges. Since, T (t) = 0 for t = 3, 4, ... and Q 0 is a regular stochastic matrix then the conditions of Theorem 13b are satisfied and hence q 0 (∞) as given by Theorem 13b should coincide with E [q (0, 3t)]. This was found to be true. The same was found with E [q (0, 3t + 1)] → t→∞ (0.340 0.295 0.365), E [q (0, 3t + 1)] → t→∞ (0.340 0.295 0.365), which were found equal with q 1 (∞) and q 2 (∞) given by Theorem 13b.
The One of the uses for University planning physical meaning of the above result is, that a membership of a student place remaining in the first year of study, in the long run, almost surely 0.278 of the time the University is working. Another useful physical meaning for University planning, is that the relative population structure of the memberships in the various years of study tends asymptotically almost surely to (0.278 0.331 0.391).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.