Application: Ranking the most relevant pages in web search

Topics: Finite Discrete Time Markov Chains, SLLN

Background:

  • probability space (B.1.1);

  • conditional probability (B.1.5);

  • discrete random variable (B.2.1);

  • expectation and conditional expectation for discrete RVs (B.2.2), (B.3.5).

1.1 Model

The World Wide Web is a collection of linked web pages (Fig. 1.1). These pages and their links form a graph. The nodes of the graph are pages \({\mathcal {X}}\) and there is an arc (a directed edge) from i to j if page i has a link to j.

Fig. 1.1
figure 1

Pages point to one another in the web. Here, P(A, B) = 1∕2 and P(D, E) = 1∕3

Intuitively, a page has a high rank if other pages with a high rank point to it. (The actual ordering of search engines results depends also on the presence of the search keywords in the pages and on many other factors, in addition to the rank measure that we discuss here.) Thus, the rank π(i) of page i is a positive number and

$$\displaystyle \begin{aligned} \pi(i) = \sum_{j \in {\mathcal{X}}} \pi(j) P(j, i), i \in {\mathcal{X}}, \end{aligned}$$

where P(j, i) is the fraction of links in j that point to i and is zero if there is no such link. In our example, P(A, B) = 1∕2, P(D, E) = 1∕3, P(B, A) = 0, etc. (The basic idea of the algorithm is due to Larry Page (Fig. 1.2), hence the name PageRank. Since it ranks pages, the name is doubly appropriate.)

Fig. 1.2
figure 2

Larry page

We can write these equations in matrix notation as

$$\displaystyle \begin{aligned} \pi = \pi P, \end{aligned} $$
(1.1)

where we treat π as a row vector with components π(i) and P as a square matrix with entries P(i, j) (Figs. 1.3, 1.4 and 1.5).

Fig. 1.3
figure 3

Balance equations?

Fig. 1.4
figure 4

Copy of Fig. 1.2. Recall that P(A, B) = 1∕2 and P(D, E) = 1∕3, etc.

Fig. 1.5
figure 5

Andrey Markov. 1856–1922

Equations (1.1) are called the balance equations. Note that if π solves these equations, then any multiple of π also solves the equations. For convenience, we normalize the solution so that the ranks of the pages add up to one, i.e.,

$$\displaystyle \begin{aligned} \sum_{i \in {\mathcal{X}}} \pi(i) = 1. \end{aligned} $$
(1.2)

For the example of Fig. 1.4, the balance equations are

$$\displaystyle \begin{aligned} & \pi(A) =\pi(C) + \pi(D)(1/3)\\ & \pi(B) = \pi(A)(1/2) + \pi(D)(1/3) + \pi(E)(1/2)\\ & \pi(C) = \pi(B) + \pi(E)(1/2) \\ & \pi(D) = \pi(A)(1/2)\\ & \pi(E) = \pi(D)(1/3).\end{aligned} $$

Solving these equations with the condition that the numbers add up to one yields

$$\displaystyle \begin{aligned} \pi = [\pi(A), \pi(B), \pi(C), \pi(D), \pi(E)] = \frac{1}{39} [12, 9, 10, 6, 2].\end{aligned} $$

Thus, page A has the highest rank and page E has the smallest. A search engine that uses this method would combine these ranks with other factors to order the pages. Search engines also use variations on this measure of rank.

1.2 Markov Chain

Imagine that you are browsing the web. After viewing a page i, say for one unit of time, you go to another page by clicking one of the links on page i, chosen at random. In this process, you go from page i to page j with probability P(i, j) where P(i, j) is the same as we defined earlier. The resulting sequence of pages that you visit is called a Markov chain, a model due to Andrey Markov (Fig. 1.4).

1.2.1 General Definition

More generally, consider a finite graph with nodes \({\mathcal {X}} = \{1, 2, \ldots , N\}\) and directed edges. In this graph, some edges can go from a node to itself. To each edge (i, j) one assigns a positive number P(i, j) in a way that the sum of the numbers on the edges out of each node is equal to one. By convention, P(i, j) = 0 if there is no edge from i to j.

The corresponding matrix P = [P(i, j)] with nonnegative entries and rows that add up to one is called a stochastic matrix. The sequence {X(n), n ≥ 0} that goes from node i to node j with probability P(i, j), independently of the nodes it visited before, is then called a Markov chain. The nodes are called the states of the Markov chain and the P(i, j) are called the transition probabilities. We say that X(n) is the state of the Markov chain at time n, for n ≥ 0. Also, X(0) is called the initial state. The graph is the state transition diagram of the Markov chain.

Figure 1.6 shows the state transition diagrams of three Markov chains.

Fig. 1.6
figure 6

Three Markov chains with three states {1, 2, 3} and different transition probabilities. (a) is irreducible, periodic; (b) is irreducible, aperiodic; (c) is not irreducible

Thus, our description corresponds to the following property:

$$\displaystyle \begin{aligned} P[X(n+1) = j | X(n) = i, X(m), m < n] = P(i, j), \forall i, j \in {\mathcal{X}}, n \geq 0. \end{aligned} $$
(1.3)

The probability of moving from i to j does not depend on the previous states. This “amnesia” is called the Markov property. It formalizes the fact that X(n) is indeed a “state” in that it contains all the information relevant for predicting the future of the process.

1.2.2 Distribution After n Steps and Invariant Distribution

If the Markov chain is in state j with probability π n(j) at step n for some n ≥ 0, it is in state i at step n + 1 with probability π n+1(i) where

$$\displaystyle \begin{aligned} \pi_{n+1}(i) = \sum_{j \in {\mathcal{X}}} \pi_n(j) P(j, i), i \in {\mathcal{X}}. \end{aligned} $$
(1.4)

Indeed, the event that the Markov chain is in state i at step n + 1 is the union over all j of the disjoint events that it is in state j at step n and in state i at step n + 1. The probability of a disjoint union of events is the sum of the probabilities of the individual events. Also, the probability that the Markov chain is in state j at step n and in state i at step n + 1 is π n(j)P(j, i).

Thus, in matrix notation,

$$\displaystyle \begin{aligned} \pi_{n+1} = \pi_n P, \end{aligned}$$

so that

$$\displaystyle \begin{aligned} \pi_n = \pi_0 P^n, n \geq 0. \end{aligned} $$
(1.5)

Observe that π n(i) = π 0(i) for all n ≥ 0 and all \(i \in {\mathcal {X}}\) if and only if π 0 solves the balance equations (1.1). In that case, we say that π 0 is an invariant distribution. Thus, an invariant distribution is a nonnegative solution π of (1.1) whose components sum to one.

1.3 Analysis

Natural questions are

  • Does there exist an invariant distribution?

  • Is it unique?

  • Does π n approach an invariant distribution?

The next sections answer those questions.

1.3.1 Irreducibility and Aperiodicity

We need the following definitions.

Definition 1.1 (Irreducible, Aperiodic, Period)

  1. (a)

    A Markov chain is irreducible, if it can go from any state to any other state, possibly after many steps.

  2. (b)

    Assume the Markov chain is irreducible and let

    $$\displaystyle \begin{aligned} d(i) := g.c.d.\{n \geq 1 \mid P^n(i, i) > 0\}. \end{aligned} $$
    (1.6)

    (If S is a set of positive integers, g.c.d.(S) is the greatest common divisor of these integers.)

Then d(i) has the same value d for all i, as shown in Lemma 2.2. The Markov chain is aperiodic if d = 1. Otherwise, it is periodic with period d. ◇

The Markov chains (a) and (b) in Fig. 1.6 are irreducible and (c) is not. Also, (a) is periodic and (b) is aperiodic.

1.3.2 Big Theorem

Simple examples show that the answers to Q2–Q3 can be negative. For instance, every distribution is invariant for a Markov chain that does not move. Also, a Markov chain that alternates between the states 0 and 1 with π 0(0) = 1 is such that π n(0) = 1 when n is even and π n(0) = 0 when n is odd, so that π n does not converge.

However, we have the following key result.

Theorem 1.1 (Big Theorem for Finite Markov Chains)

  1. (a)

    If the Markov chain is finite and irreducible, it has a unique invariant distribution π and π(i) is the long-term fraction of time that X(n) is equal to i.

  2. (b)

    If the Markov chain is also aperiodic, then the distribution π n of X(n) converges to π.

\({\blacksquare }\)

In this theorem, the long-term fraction of time that X(n) is equal to i is defined as the limit

$$\displaystyle \begin{aligned} \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=0}^{N-1} 1\{X(n) = i\}. \end{aligned}$$

In this expression, 1{X(n) = i} takes the value 1 if X(n) = i and the value 0 otherwise. Thus, in the expression above, the sum is the total time that the Markov chain is in state i during the first N steps. Dividing by N gives the fraction of time. Taking the limit yields the long-term fraction of time.

The theorem says that, if the Markov chain is irreducible, this limit exists and is equal to π(i). In particular, this limit does not depend on the particular realization of the random variables. This means that every simulation yields the same limit, as you will verify in Problem 1.8.

1.3.3 Long-Term Fraction of Time

Why should the fraction of time that a Markov chain spends in one state converge? In our browsing example, if we count the time that we spend on page A over n time units and we divide that time by n, it turns out that the ratio converges to π(A).

This result is similar to the fact that, when we flip a fair coin repeatedly, the fraction of “heads” converges to 50%. Thus, even though the coin has no memory, it makes sure that the fraction of heads approaches 50%. How does it do it?

These convergence results are examples of the Law of Large Numbers. This law is at the core of our intuitive understanding of probability and it captures our notion of statistical regularity. Even though outcomes are uncertain, one can make predictions. Here is a statement of the result. We discuss it in Chap. 2.

Theorem 1.2 (Strong Law of Large Numbers)

Let {X(n), n ≥ 1} be a sequence of i.i.d. random variables with mean μ. Then

$$\displaystyle \begin{aligned} \frac{X(1) + \cdots + X(n)}{n} \rightarrow \mu \mathit{\mbox{ as }} n \rightarrow \infty, \mathit{\mbox{ with probability }} 1.\end{aligned} $$

\({\blacksquare }\)

Thus, the sample mean values Y (n) := (X(1) + ⋯ + X(n))∕n converge to the expected value, with probability 1. (See Fig. 1.7.) Note that the sample mean values Y (n) are random variables: for each n, the value of Y (n) depends on the particular realization of the random variables X(m); if you repeat the experiment, the values will probably be different. However, the limit is always μ, with probability 1. We say that the convergence is almost sure.Footnote 1

Fig. 1.7
figure 7

When rolling a balanced die, the sample mean converges to 3.5

1.4 Illustrations

We illustrate Theorem 1.1 for the Markov chains in Fig. 1.6. The three situations are different and quite representative. We explore them one by one.

Figures 1.8, 1.9 and 1.10 correspond to each of the three Markov chains in Fig. 1.6, as shown on top of each figure. The top graph of each figure shows the successive values of X n for n = 0, 1, …, 100. The middle graph of the figure shows, for n = 0, …, 100, the fraction of time that X m is equal to the different states during {0, 1, …, n}. The bottom graph of the figure shows, for n = 0, …, 100, the probability that X n is equal to each of the states.

Fig. 1.8
figure 8

Markov chain (a) in Fig. 1.6

Fig. 1.9
figure 9

Markov chain (b) in Fig. 1.6

Fig. 1.10
figure 10

Markov chain (c) in Fig. 1.6

In Fig. 1.8, the fraction of time that the Markov chain is equal to each of the states {1, 2, 3} converges to positive values. This is the case because the Markov chain is irreducible. (See Theorem 1.1(a).) However, the probability of being in a given state does not converge. This is because the Markov chain is periodic. (See Theorem 1.1(b).)

For the Markov chain in Fig. 1.9, the probabilities converge, because the Markov chain is aperiodic. (See again Theorem 1.1.)

Finally, for the Markov chain in Fig. 1.10, eventually X n = 3; the fraction of time in state 3 converges to one and so does the probability of being in state 3. What happens in this case is that state 3 is absorbing: once the Markov chain gets there, it cannot leave.

1.5 Hitting Time

Say that you start in page A in Fig. 1.2 and that, at every step, you follow each outgoing link of the page where you are with equal probabilities. How many steps does it take to reach page E? This time is called the hitting time , or first passage time, of page E and we designate it by T E. As we can see from the figure, T E can be as small as 2, but it has a good chance of being much larger than 2 (Fig. 1.11).

Fig. 1.11
figure 11

This is not what we mean by hitting time!

1.5.1 Mean Hitting Time

Our goal is to calculate the average value of T E starting from X 0 = A. That is, we want to calculate

$$\displaystyle \begin{aligned} \beta (A) := E[T_E \mid X_0 = A]. \end{aligned}$$

The key idea to perform this calculation is to in fact calculate the mean hitting time for all possible initial pages. That is, we will calculate β(i) for i = A, B, C, D, E where

$$\displaystyle \begin{aligned} \beta(i) := E[T_E \mid X_0 = i]. \end{aligned}$$

The reason for considering these different values is that the mean time to hit E starting from A is clearly related to the mean hitting time starting from B and from D. These in turn are related to the mean hitting time starting from C. We claim that

$$\displaystyle \begin{aligned} \beta(A) = 1 + \frac{1}{2} \beta(B) + \frac{1}{2} \beta(D). \end{aligned} $$
(1.7)

To see this, note that, starting from A, after one step, the Markov chain is in state B with probability 1∕2 and it is in state D with probability 1∕2. Thus, after one step, the average time to hit E is the average time starting from B, with probability 1∕2, and it is the average time starting from D, with probability 1∕2.

This situation is similar to the following one. You flip a fair coin. If the outcome is heads you get a random amount of money equal to X and if it is tails you get a random amount Y . On average, you get

$$\displaystyle \begin{aligned} \frac{1}{2} E(X) + \frac{1}{2} E(Y). \end{aligned}$$

Similarly, we can see that

$$\displaystyle \begin{aligned} & \beta(B) = 1 + \beta(C) \\ & \beta(C) = 1 + \beta(A) \\ & \beta(D) = 1 + \frac{1}{3} \beta(A) + \frac{1}{3} \beta(B) + \frac{1}{3} \beta(E) \\ & \beta(E) = 0. \end{aligned} $$

These equations, together with (1.7), are called the first step equations (FSE). Solving them, we find

$$\displaystyle \begin{aligned} \beta(A) = 17, \beta(B) = 19, \beta(C) = 18, \beta(D) = 13 \mbox{ and } \beta(E) = 0. \end{aligned}$$

1.5.2 Probability of Hitting a State Before Another

Consider once again the same situation but say that we are interested in the probability that starting from A we visit state C before E. We write this probability as

$$\displaystyle \begin{aligned} \alpha(A) = P[ T_C < T_E \mid X_0 = A]. \end{aligned}$$

As in the previous case, it turns out that we need to calculate α(i) for i = A, B, C, D, E. We claim that

$$\displaystyle \begin{aligned} \alpha(A) = \frac{1}{2} \alpha(B) + \frac{1}{2} \alpha(D). \end{aligned} $$
(1.8)

To see this, note that, starting from A, after one step you are in state B with probability 1∕2 and you will then visit C before E with probability α(B). Also, with probability 1∕2, you will be in state D after one step and you will then visit C before E with probability α(D). Thus, the event that you visit C before E starting from A is the union of two disjoint events: either you do that by first going to B or by first going to D. Adding the probabilities of these two events, we get (1.8).

Similarly, one finds that

$$\displaystyle \begin{aligned} & \alpha(B) = \alpha(C) \\ & \alpha(C) = 1 \\ & \alpha(D) = \frac{1}{3} \alpha(A) + \frac{1}{3} \alpha (B) + \frac{1}{3} \alpha(E) \\ & \alpha(E) = 0. \end{aligned} $$

These equations, together with (1.8), are also called the first step equations. Solving them, we find

$$\displaystyle \begin{aligned} \alpha(A) = \frac{4}{5}, \alpha(B) = 1, \alpha(C) = 1, \alpha(D) = \frac{3}{5}, \alpha(E) = 0. \end{aligned}$$

1.5.3 FSE for Markov Chain

Let us generalize this example to the case of a Markov chain on \({\mathcal {X}} = \{1, 2, \ldots , N\}\) with transition probability matrix P. Let T i be the hitting time of state i. For a set \(A \subset {\mathcal {X}}\) of states, let \(T_A = \min \{n \geq 0 \mid X(n) \in A\}\) be the hitting time of the set A.

First, we consider the mean value of T A. Let

$$\displaystyle \begin{aligned} \beta(i) = E[T_A \mid X_0 = i], i \in {\mathcal{X}}. \end{aligned}$$

The FSE are

$$\displaystyle \begin{aligned} \beta(i) = \left\{ \begin{array}{l l} 1 + \sum_j P(i, j) \beta(j), & \mbox{ if } i \notin A \\ 0, & \mbox{ if } i \in A. \end{array} \right. \end{aligned}$$

Second, we study the probability of hitting a set A before a set B, where \(A, B \subset {\mathcal {X}}\) and A ∩ B = ∅. Let

$$\displaystyle \begin{aligned} \alpha(i) = P[T_A < T_B \mid X_0 = i], i \in {\mathcal{X}}. \end{aligned}$$

The FSE are

$$\displaystyle \begin{aligned} \alpha(i) = \left\{ \begin{array}{l l} \sum_j P(i, j) \alpha(j), & \mbox{ if } i \notin A \cup B \\ 1, & \mbox{ if } i \in A \\ 0, & \mbox{ if } i \in B. \end{array} \right. \end{aligned}$$

Third, we explore the value of

$$\displaystyle \begin{aligned} Y = \sum_{n=0}^{T_A} h(X(n)).\end{aligned} $$

That is, you collect an amount h(i) every time you visit state i, until you enter set A. Let

$$\displaystyle \begin{aligned} \gamma(i) = E[Y \mid X_0 = i], i \in {\mathcal{X}}.\end{aligned} $$

The FSE are

$$\displaystyle \begin{aligned} \gamma(i) = \left\{ \begin{array}{l l} h(i) + \sum_j P(i, j) \gamma(j), & \mbox{ if } i \notin A \\ h(i), & \mbox{ if } i \in A. \end{array} \right.\end{aligned} $$
(1.9)

Fourth, we consider the value of

$$\displaystyle \begin{aligned} Z = \sum_{n=0}^{T_A} \beta^n h(X(n)),\end{aligned} $$

where β can be thought of as a discount factor. Let

$$\displaystyle \begin{aligned} \delta(i) = E[Z \mid X_0 = i].\end{aligned} $$

The FSE are

$$\displaystyle \begin{aligned} \delta(i) = \left\{ \begin{array}{l l} h(i) + \beta \sum_j P(i, j) \delta(j), & \mbox{ if } i \notin A \\ h(i), & \mbox{ if } i \in A. \end{array} \right.\end{aligned} $$

Hopefully these examples give you a sense of the variety of questions that can be answered for finite Markov chains. This is very fortunate, because Markov chains can be used to model a broad range of engineering and natural systems.

1.6 Summary

  • Markov Chains: states, transition probabilities, irreducible, aperiodic, invariant distribution, hitting times;

  • Strong Law of Large Numbers;

  • Big Theorem: irreducible implies unique invariant distribution equal to the long-term fraction of time in the states; convergence to invariant distribution if irreducible and aperiodic;

  • Hitting Times: first step equations.

1.6.1 Key Equations and Formulas

Table 1

1.7 References

There are many excellent books on Markov chains. Some of my favorites are Grimmett and Stirzaker (2001) and Bertsekas and Tsitsiklis (2008). The original patent on PageRank is Page (2001). The online book Easley and Kleinberg (2012) is an inspiring discussion of social networks. Chapter 14 of that reference discusses PageRank.

1.8 Problems

Problem 1.1

Construct a Markov chain that is not irreducible but whose distribution converges to its unique invariant distribution.

Problem 1.2

Show a Markov chain whose distribution converges to a limit that depends on the initial distribution.

Problem 1.3

Can you find a finite irreducible aperiodic Markov chain whose distribution does not converge?

Problem 1.4

Show a finite irreducible aperiodic Markov chain that converges very slowly to its invariant distribution.

Problem 1.5

Show that a function Y (n) = g(X(n)) of a Markov chain X(n) may not be a Markov chain.

Problem 1.6

Construct a Markov chain that is a sequence of i.i.d. random variables. Is it irreducible and aperiodic?

Problem 1.7

Consider the Markov chain X(n) with the state diagram shown in Fig. 1.11 where a, b ∈ (0, 1).

Fig. 1.11
figure 12

Markov chain for Problem 1.7

  1. (a)

    Show that this Markov chain is aperiodic;

  2. (b)

    Calculate P[X(1) = 1, X(2) = 0, X(3) = 0, X(4) = 1∣X(0) = 0];

  3. (c)

    Calculate the invariant distribution;

  4. (d)

    Let \(T_i = \min \{n \geq 0 \mid X(n) = i\}\). Calculate E[T 2X(0) = 1].

Problem 1.8

Use Python to write a simulator for a Markov chain {X(n), n ≥ 1} with K states, initial distribution π, and transition probability matrix P. The program should be able to do the following:

  1. 1.

    Plot {X(n), n = 1, …, N};

  2. 2.

    Plot the fraction of time that X(n) is in some chosen states during {1, 2, …, m} as a function of m, for m = 1, …, N;

  3. 3.

    Plot the probability that X(n) is equal to some chosen states, for n = 1, …, N;

  4. 4.

    Use this program to simulate a periodic Markov chain with five states;

  5. 5.

    Use the program to simulate an aperiodic Markov chain with five states.

Problem 1.9

Use your simulator to simulate the Markov chains of Figs. 1.2 and 1.6.

Problem 1.10

Find the invariant distribution for the Markov chains of Fig. 1.6.

Problem 1.11

Calculate d(1), d(2), and d(3), defined in (1.6), for the Markov chains of Fig. 1.6.

Problem 1.12

Calculate d(A), defined in (1.6), for the Markov chain of Fig. 1.2.

Problem 1.13

Let {X n, n ≥ 0} be a finite Markov chain. Assume that it has a unique invariant distribution π and that π n converges to π for every initial distribution π 0. Then (choose the correct answers, if any)

  • X n is irreducible;

  • X n is periodic;

  • X n is aperiodic;

  • X n might not be irreducible.

Problem 1.14

Consider the Markov chain {X n, n ≥ 0} on {0, 1} with P(0, 1) = 0.1 and P(1, 0) = 0.3. Then (choose the correct answers, if any)

  • The invariant distribution of the Markov chain is [0.75, 0.25];

  • Let \(T_1 = \min \{n \geq 0 | X_n = 1\}\). Then E[T 1|X 0 = 0] = 1.2;

  • E[X 1 + X 2|X 0 = 0] = 0.8.

Problem 1.15

Consider the MC with the state transition diagram shown in Fig. 1.12.

Fig. 1.12
figure 13

MC for Problem 1.15

  1. (a)

    What is the period of this MC? Explain.

  2. (b)

    Find all the invariant distributions for this MC.

  3. (c)

    Does π n, the distribution of X n, converge as n →? Explain.

  4. (d)

    Do the fractions of time the MC spends in the states converge? If so, what is the limit?

Problem 1.16

Consider the MC with the state transition diagram shown in Fig. 1.13.

Fig. 1.13
figure 14

MC for Problem 1.16

  1. (a)

    Find all the invariant distributions of this MC.

  2. (b)

    Assume π 0(3) = 1. Find limn π n.

Problem 1.17

Consider the MC with the state transition diagram shown in Fig. 1.14.

Fig. 1.14
figure 15

MC for Problem 1.17

  1. (a)

    Find all the invariant distributions of this MC.

  2. (b)

    Does π n converge as n →? If it does, prove it.

  3. (c)

    Do the fractions of time the MC spends in the states converge? Prove it.

Problem 1.18

Consider the MC shown in Fig. 1.15.

Fig. 1.15
figure 16

MC for Problem 1.18

  1. (a)

    Find the invariant distribution π of this Markov chain.

  2. (b)

    Calculate the expected time from 0 to 2.

  3. (c)

    Use Python to plot the probability that, starting from 0, the MC has not reached 2 after n steps.

  4. (d)

    Use Python to simulate the MC and plot the fraction of time that it spends in the different states after n steps.

  5. (e)

    Use Python to plot π n.

Problem 1.19

For the Markov chain {X n, n ≥ 0} with transition diagram shown in Fig. 1.16, assume that X 0 = 0. Find the probability that X n hits 2 before it hits 1 twice.

Fig. 1.16
figure 17

MC for Problem 1.19

Problem 1.20

Draw an irreducible aperiodic MC with six states and choose the transition probabilities. Simulate the MC in Python. Plot the fraction of time in the six states. Assume you start in state 1. Plot the probability of being in each of the six states.

Problem 1.21

Repeat Problem 1.20, but with a periodic MC.

Problem 1.22

How would you trick the PageRank algorithm into believing that your home page should be given a high rank?

Hint

Try adding another page with suitable links.

Problem 1.23

Show that the holding time of a state is geometrically distributed.

Problem 1.24

You roll a die until the sum of the last two rolls is exactly 10. How many times do you have to roll, on average?

Problem 1.25

You roll a die until the sum of the last three rolls is at least 15. How many times do you have to roll, on average?

Problem 1.26

A doubly stochastic matrix is a nonnegative matrix whose rows and columns add up to one. Show that the invariant distribution is uniform for such a transition matrix.

Problem 1.27

Assume that the Markov chain (c) of Fig. 1.6 starts in state 1. Calculate the average number of times it visits state 1 before being absorbed in state 3.

Problem 1.28

A man tries to go up a ladder that has N rungs. Every step he makes, he has a probability p of dropping back to the ground and he goes up one rung otherwise. Use the first step equations to calculate analytically the average time he takes to reach the top, for N = 1, …, 20 and p = 0.05, 0.1, and 0.2. Use Python to plot the corresponding graphs.

Problem 1.29

Let {X n, n ≥ 0} be a finite irreducible Markov chain with transition probability matrix P and invariant distribution π. Show that, for all i, j,

$$\displaystyle \begin{aligned} \frac{1}{N} \sum_{n = 0}^{N-1} 1\{X_n = i, X_{n+1} = j\} \rightarrow \pi(i) P(i, j), \mbox{ w.p. } 1 \mbox{ as } N \to \infty. \end{aligned}$$

Problem 1.30

Show that a Markov chain {X n, n ≥ 0} can be written as

$$\displaystyle \begin{aligned} X_{n+1} = f(X_n, V_n), n \geq 0, \end{aligned}$$

where the V n are i.i.d. random variables independent of X 0.

Problem 1.31

Let P and \(\tilde P\) be two stochastic matrices and π a pmf on the finite set \({\mathcal {X}}\). Assume that

$$\displaystyle \begin{aligned} \pi(i)P(i, j) = \pi(j)\tilde P(j, i), \forall i, j \in {\mathcal{X}}. \end{aligned}$$

Show that π is invariant for P.

Problem 1.32

Let X n be a Markov chain on a finite set \({\mathcal {X}}\). Assume that the transition diagram of the Markov chain is a tree, as shown in Fig. 1.17. Show that if π is invariant and if P is the transition matrix, then it satisfies the following detailed balance equations:

$$\displaystyle \begin{aligned} \pi(i)P(i, j) = \pi(j)P(j, i), \forall i, j. \end{aligned}$$
Fig. 1.17
figure 18

A transition diagram that is a tree

Problem 1.33

Let X n be a Markov chain such that X 0 has the invariant distribution π and the detailed balance equations are satisfied. Show that

$$\displaystyle \begin{aligned} P(X_0 {=} x_0, X_1 {=} x_1, \ldots, X_n {=} x_n) {=} P(X_N = x_0, X_{N - 1} = x_1, \ldots, X_{N- n} = x_n) \end{aligned}$$

for all n, all N ≥ n, and all x 0, …, x n. Thus, the evolution of the Markov chain in reverse time (N, N − 1, N − 2, …, N − n) cannot be distinguished from its evolution in forward time (0, 1, …, n). One says that the Markov chain is time-reversible.

Problem 1.34

Let {X n, n ≥ 0} be a Markov chain on {−1, 1} with P(−1, 1) = P(1, −1) = a for a given a ∈ (0, 1). Define

$$\displaystyle \begin{aligned} Y_n = X_0 + \cdots + X_n, n \geq 0. \end{aligned}$$
  1. (a)

    Is {Y n, n ≥ 0} a Markov chain? Prove or disprove.

  2. (b)

    How would you calculate

    $$\displaystyle \begin{aligned} E[ \tau | Y_0 = 1] \mbox{ where } \tau = \min \{n > 0 \mid Y_n = -5 \mbox{ or } Y_n = 30\}? \end{aligned}$$

Problem 1.35

You flip a fair coin repeatedly, forever. Show that the probability that the number of heads is always ahead of the number of tails is zero.