# Probability Theory

• Kun Il Park
Chapter

## Abstract

This chapter defines the central concepts and terms used in probability theory including the random experiment, space, event, elementary event, combined experiment, Cartesian product, etc. This chapter presents the axiomatic formulation of probability theory based on three axioms and shows how set operations are used in probability theory. This chapter also discusses the conditional probability, the total probability theorem, the Bayes’ theorem, and the independence of random events. Examples of a reliability problem and a communications signal detection problem are discussed.

This chapter defines the central concepts and terms used in probability theory including random experiment, space, event, elementary event , combined experiment , Cartesian product, etc. This chapter presents the axiomatic formulation of probability theory based on three axioms and shows how set operations are used in probability theory.

## 3.1 Random Experiments

A random experiment consists of executing a certain action or procedure under controlled conditions and taking an observation or a measurement on the outcome produced by the action. The experiment is called random because the outcome is unpredictable.

### 3.1.1 Space Ω

In probability theory, we are concerned about random experiments in which all possible outcomes are known in advance. The set of all possible outcomes of a random experiment is called the sample space , or simply the space, and is denoted by
$$\Omega =\left\{{s}_1,{s}_2,\dots, {s}_i,\dots, {s}_n\right\}$$
where s i ’s are experimental outcomes. Sometimes, the space Ω is used to denote the experiment as in “experiment Ω1,” “experiment Ω2,” etc. An example of a random experiment is a die-throwing experiment in which a die is thrown and the number of dots that the top side shows is observed. The space of this experiment is Ω = {1, 2, 3, 4, 5, 6}, where the six numbers denote all possible numbers of dots that may show after a die is thrown.

Each execution of a random experiment produces one outcome . A single execution of a random experiment that produces an outcome is called a trial . For example, in a die-throwing experiment, a trial produces exactly one of the six possible outcomes in Ω.

### 3.1.2 Event

In the dictionary, the word event is defined as outcome . We have already encountered the word outcome while discussing a random experiment. Therefore, we need to have a clear understanding of the difference between the two words,event and outcome, before we proceed.

An event is the result of an experiment that is of interest or concern. To take an example, suppose that, in a die-throwing game, you would win $10 if your die-throwing shows an outcome less than four. Here, the word “outcome” is a specific showing of the die face. The event of your interest is “winning$10.” Your die-throwing, a trial , would produce one outcome. If that outcome is either 1, 2, or 3, you would win the prize: the event of “winning $10” would occur if the outcome of the trial is 1, 2, or 3. Among all possible outcomes of the die-throwing experiment, that is, Ω = {1, 2, 3, 4, 5, 6}, there are three specific outcomes, 1, 2, and 3, that would make the event happen. These three numbers are members of a subset of Ω, A = {1, 2, 3}. The event “winning$10” is represented by the subset A.

So, “event ” is defined as follows: an event is a subset of the space Ω consisting of the elements that make the event happen. To form the subset defining an event, consider the elements of Ω and determine whether the elements would make the event happen or not. If yes, the elements are included in the subset.

The event consisting of all possible elements of an experiment, that is, Ω, is called a certain event and the event which has no element , that is, {∅}, an impossible event.

An event consisting of a single element is called an elementary event , for example, in a die-throwing experiment, {1}, {2}, {3}, {4}, {5}, and {6}, and in a coin-tossing experiment, {heads} and {tails}. A key distinction to make here is that an element written by itself as “1,” “2,” etc. is an outcome, whereas a single outcome shown in braces as in {1} is an elementary event.

### 3.1.3 Combined Experiments

Whether an event occurs or not is determined by the single outcome of a trial of an experiment. If an event under consideration involves the outcomes of multiple trials of a single experiment or a single or multiple trials of multiple experiments, a new experiment may be defined by combining the original experiments. This new experiment may be called a combined experiment in which a new space is defined as the set of all possible combinations of the outcomes of the individual trials of the original experiments. With this definition, the single outcome produced by a trial of this combined experiment is a unique sequence of the individual outcomes of the original experiments, and the event of the combined experiment is determined by this single outcome of a trial of the combined experiment.

For example, suppose that the event under consideration is determined by the sequence of the outcomes of n trials of a single experiment, e.g., throwing a die n times. A combined experiment may be defined by defining a single trial of the experiment as a sequence of n trials of the original experiment. The space of the combined experiment consists of all possible ordered sequences, that is, n-tuples, of the elements of the original space.

### 3.1.4 Probabilities and Statistics

In probability analysis, one begins by assigning probabilities to elementary events, or, by using the known probabilities of other events, if the event under consideration is based on the events for which the probabilities have already been determined. For example, in a die-throwing experiment, first, the probabilities of the elementary events of the six sides, that is, 1/6 for each side, are assigned. Without these initial assignments, one cannot proceed to address more complex problems associated with the outcomes of a die-throwing experiment. The only other option is to try the experiment many times and count the frequencies of the outcomes of interest. For example, in a die-throwing experiment, to determine the probability of an odd number, a die must be thrown many times, and the frequencies of odd numbers must be counted. Even then, a question remains as to how many times the die must be thrown before the probability can be determined.

This dilemma can be avoided by taking one’s a priori judgment about the probabilities of the elementary events. The axiomatic approach to be discussed in the next section allows that a probability analysis can begin by one’s a priori assignment of the probabilities of the elementary events.

Statistics deals with analyzing the frequencies of the outcomes. Therefore, statistics can provide one with the basis of making a priori judgments, for example, on the probabilities of elementary events.

## 3.2 Axiomatic Formulation of Probability Theory

The axiomatic formulation of probability theory was introduced by the Russian mathematician Kolmogoroff in 1933. In this approach, all possible outcomes of an experiment form a space Ω. Events are defined by the subsets of the space Ω. Probabilities are determined for the events. Probabilities are “assigned” to the elementary events in Ω as the starting point of the probability analysis. The events and the probabilities must obey a set of axioms presented below.

Given two events A and B in Ω, the probabilities of A and B are denoted by P(A) and P(B). P(A) and P(B) are real numbers, referred to as probability measures, and must obey the following rules:

### Axiom I

$$P(A)\ge 0$$
(3.1)

### Axiom II

$$P\left(\Omega \right)=1$$
(3.2)

### Axiom III

$$\mathrm{If}\ A\cap B=\left\{\varnothing \right\},\mathrm{then}\ P\left(A\cup B\right)=P(A)+P(B)$$
(3.3)

Axiom I states that the probability measure assigned to an event is nonnegative. Axiom II states that the probability measure assigned to a certain event is 1. Finally, Axiom III states that, if two events A and B are mutually exclusive with the probabilities P(A) and P(B), respectively, the probability that either A or B or both would occur, that is, the probability of the event A ∪ B, is the sum of the two probabilities P(A) and P(B).

### Example 3.2.1

To illustrate Axiom III , consider that A represents the event that Tom will attend a conference in Philadelphia tomorrow at 9 AM and B the event that Tom will travel to Boston tomorrow at 9 AM. Assume that P(A) = 0.1 and P(B) = 0.2. Clearly, A and B are mutually exclusive because Tom cannot be at two places at the same time. Then, the probability that Tom will either attend the conference in Philadelphia or travel to Boston is the sum of the two probabilities, 0.3.

While Axioms I and II give the rules for assigning probability measures, Axiom III gives the rule for deriving the probability measure for a complex event A ∪ B from the probabilities of A and B. In the axiomatic approach, these three axioms are all one needs to formulate a probability problem. The above three axioms together with the employment of set theory are sufficient for developing probability theory.

A generalization of Axiom III is given by the theorem below.

### Theorem 3.2.1

For a finite number of mutually exclusive sets A i  ∩ A j  = {∅} for all i , j = 1 , 2 ,  …  , n ,  i ≠ j
$$P\left({A}_1\cup {A}_2\cup \dots \kern0.5em \cup {A}_n\right)=P\left({A}_1\left)+P\left({A}_2\right)+\dots +P\right({A}_n\right)$$
(3.4)

### Proof

First, consider the case of three mutually exclusive sets, A, B and C, that is,
$$A\cap B=\left\{\varnothing \right\}\kern1.25em A\cap C=\left\{\varnothing \right\}\kern1.25em B\cap C=\left\{\varnothing \right\}$$
We have the following relationship:
$$\left(A\cup B\right)\cap C=\left(A\cap C\right)\cup \left(B\cap C\right)=\left\{\varnothing \right\}\cup \left\{\varnothing \right\}=\left\{\varnothing \right\}$$
That is, (A ∪ B) and C are two mutually exclusive sets. These two mutually exclusive sets satisfy Axiom III as follows:
$$P\left[\left(A\cup B\right)\cap C\right]=P\left(A\cup B\right)+P(C)$$
Applying Axiom III to the first term on the right-hand side, that is, P[(A ∪ B) ] = P(A) + P(B), we obtain the following equation:
$$P\left[\left(A\cup B\right)\cap C\right]=P(A)+P(B)+P(C)$$

Continuing this process for n mutually exclusive sets A 1 , A 2 ,  …  , A n , we prove the theorem by mathematical induction .

Q.E.D.

### Example 3.2.2

In this example, we illustrate the significance of the three axioms in formulating a probability problem using a die-throwing experiment. Suppose that you would win $10, if the number of dots shown after throwing the die is less than four, and would win a trip to Philadelphia if it is more than four dots. What is the probability that you would win$10, a trip to Philadelphia or both? We will formulate and solve this problem using the three axioms.

For this problem, the space is Ω = {1, 2, 3, 4, 5, 6}. In the axiomatic approach, the formulation of a probability problem starts with the assignments of the probability measures for the basic events, whether they are elementary events or the events for which a priori probabilistic information is known. For this problem, we will assign 1/6 for each of the six possible outcomes: P({i}) = 1/6, i = 1 ~ 6. In the absence of any a priori information about the six elementary events, 1/6 is a reasonable assignment and satisfies Axiom I . If a priori information, e.g., past experimental data, is available about the die used in the experiment, different probabilities may be assigned. In any event , the key point here is that the formulation starts with the assignments of the probabilities.

There are two events of interest, “winning $10” and “winning a trip to Philadelphia.” An event is defined by a subset of Ω consisting of the outcomes that would make the event happen. What outcomes would make the event “winning$10” happen? Outcomes 1, 2, or 3 would make you win $10. So, $$A=\mathrm{event}\kern0.5em {\mathrm{of}}^{}\mathrm{winning}\kern0.5em \{10}^{"}=\left\{1,2,3\right\}$$ You would win a trip to Philadelphia if 5 or 6 shows. So, $$B=\mathrm{event}\kern0.5em {\mathrm{of}}^{}\mathrm{winning}\kern0.5em \mathrm{a}\kern0.5em \mathrm{trip}\kern0.5em \mathrm{to}\kern0.5em {\mathrm{Philadelphia}}^{"}=\left\{5,6\right\}$$ The event that you will win$10 or a trip to Philadelphia or both would then be represented by the union of A and B and A ∪ B, and we need to determine P(A ∪ B).

We see that A and B are mutually exclusive , that is, A ∩ B = {∅}. Therefore, by Axiom III , we have
$$P\left(A\cup B\right)=P(A)+P(B)$$
Now, set A can be expressed as a union of three elementary events as follows:
$$A=\left\{1,2,3\right\}=\left\{1\right\}\cup \left\{2\right\}\cup \left\{3\right\}$$
where the elementary events are mutually exclusive , that is, {i} ∩ {j} = {∅}, for i ≠ j.
So, by Theorem 3.2.1, we have
$$\begin{array}{c}P(A)=P\left[\left\{1,2,3\right\}\right]=P\left[\left\{1\right\}\cup \left\{2\right\}\cup \left\{3\right\}\right]\\ {}=P\left[\left\{1\right\}\right]+P\left[\left\{2\right\}\right]+P\left[\left\{3\right\}\right]=\frac{3}{6}\end{array}}$$
Similarly,
$$P(B)=P\left[\left\{5,6\right\}\right]=P\left[\left\{5\right\}\cup \left\{6\right\}\right]=P\left[\left\{5\right\}\right]+P\left[\left\{6\right\}\right]=\frac{2}{6}$$
Therefore,
$$P\left(A\cup B\right)=P(A)+P(B)=\frac{3}{6}+\frac{2}{6}=\frac{5}{6}$$

The event represented by the union of two events A and B would occur if A or B or both would occur. For mutually exclusive A and B, the probability that “both” A and B would occur is zero. For the current problem, the probability of winning both would be zero because A and B are mutually exclusive.

In addition to Theorem 3.2.1, it is convenient to establish several key theorems up front that follow from the three probability axioms. These theorems are discussed below.

### Theorem 3.2.2

$$P\left[\left\{\varnothing \right\}\right]=0$$
(3.5)

Equation (3.5) states that the probability of an impossible event is 0.

Two observations are made regarding this theorem. First, unlike the probability measure 1 of a certain event , which is “assigned” by Axiom II , the probability measure of 0 can be “derived” from the axioms. Second, one might wonder why this is not included as an axiom. Since this measure can be derived from the above axioms, it would be superfluous to include this as an axiom. As can be seen from the proof given below, this theorem could have been included as Axiom II, and the current Axiom II could be derived instead. In any event, it is not necessary to include both.

### Proof

For an arbitrary set A, the following equations hold true:
$$A=A\cup \left\{\varnothing \right\}\kern4.5em A\cap \left\{\varnothing \right\}=\left\{\varnothing \right\}$$
Hence, from Axiom III , it follows that
$$P(A)=P\left[A\cup \left\{\varnothing \right\}\right]=P(A)+P\left[\left\{\varnothing \right\}\right]$$
Rearranging the terms of the above equation, we obtain the following equation:
$$P\left(\left\{\varnothing \right\}\right)=P(A)-P(A)=0$$

Q.E.D.

### Theorem 3.2.3

For any two events A and B, that is, A ⊂ Ω and B ⊂ Ω,
$$P\left(A\cup B\right)=P\left(A\ \right)+P\left(B\ \right)-P\left(A\cap B\right)\le \kern0.5em P\left(A\ \right)+P\left(B\ \right)$$
(3.6)

### Proof

Referring to Fig. 3.1d, it can be shown that
$$A\cup B=A\cup \left(\overline{A}\cap B\right)$$ Fig. 3.1 (a) Subsets A and B, (b) A ∪ B, (c) Subset A and its complement $$\overline{A}$$, (d) $$\overline{A}\cap B$$
$$P\left(A\cup B\right)=P\left[A\cup \left(\overline{A}\cap B\right)\right]$$
Since A and Ā ∩ B are mutually exclusive , that is, A ∩ (Ā ∩ B) = {∅}, using Axiom III , we can rewrite the above equation as follows:
$$P\left(A\cup B\right)=P\left(A\ \right)+P\left(\overline{A}\cap B\right)$$
(3.7)
On the other hand, referring to Fig. 3.1d, we can express B as the union of two intersections as follows:
$$\begin{array}{c}B=\left(A\cap B\right)\kern0.5em \left(\overline{\mathrm{A}}\cap B\right)\\ {}P(B)=P\left[\left(A\cap B\right)\cup \left(\overline{\mathrm{A}}\cap B\right)\right]\end{array}}$$
Since (A ∩ B) and (Ā ∩ B) are mutually exclusive , that is, (A ∩ B) ∩ (Ā ∩ B) = {∅}, using Axiom III , we can rewrite the above equation as follows:
$$P(B)=P\left[\left(A\cap B\right)\ \right]+P\left[\ \left(\overline{A\ }\cap B\right)\right]$$
Rearranging the terms of the above equation, we obtain the following equation:
$$P\left[\ \left(\overline{A}\cap B\right)\right]=P(B)-P\left[\left(A\cap B\right)\ \right]$$
(3.8)
Substituting (3.8) into (3.7), we obtain
$$P\left(A\cup B\right)=P\left(A\ \right)+P(B)-P\left[\left(A\cap B\right)\ \right]$$

Q.E.D.

Furthermore, by Axiom I , P(A ∩ B) ≥ 0, and, thus, the last equation yields the following relationship:
$$P\left(A\ \right)+P(B)-P\left[\left(A\cap B\right)\ \right]\le P\left(A\ \right)+P(B)$$

Q.E.D.

### Theorem 3.2.4

$$P(A)\le 1$$
(3.9)

### Proof

For an arbitrary subset A of Ω, that is, A ⊂ Ω,
$$A\cup \overline{A}=\Omega$$
Hence, using Axiom III , we obtain the following equation:
$$P\left(A\cup \overline{A}\right)=P\left(\Omega \right)=1$$
Since A and Ā are mutually exclusive , that is, A ∩ Ā = {∅}, from Axiom III , we can rewrite the above equation as follows:
$$P\left(A\cup \overline{A}\right)=P(A)+P\left(\overline{A}\right)=1$$
or
$$P(A)=1-P\left(\overline{A}\right)$$

Q.E.D.

Furthermore, since, from Axiom I, P(Ā) ≥ 0, we have
$$P(A)\le 1$$

Q.E.D.

This theorem shows that, if the three axioms are followed, the probability measure derived for any arbitrary event cannot be greater than 1. Once again, including this statement as an axiom would be superfluous.

### Theorem 3.2.5

If B ⊂ A,
$$P(B)\le P(A)$$
(3.10)

### Proof

If B is a subset of A, referring to Fig. 3.2c, we can express A as the union of two mutually exclusive sets as follows: Fig. 3.2 (a) B ⊂ A, (b) Subset B and its complement $$\overline{B}$$, (c) $$A=B\cup \left(\overline{B}\cap A\right)$$
$$A=B\cup \left(\overline{B}\cap A\right)$$
Hence, we have
$$P(A)=P\left[B\cup \left(\overline{B}\cap A\right)\right]$$
As can be seen in Fig. 3.2c, since B and ( ∩ A) are mutually exclusive, that is, B ∩ ( ∩ A) = {∅}, we can rewrite the above equation as follows:
$$P(A)=P(B)+P\left(\overline{B}\cap A\right)$$
or
$$P(B)=P(A)-P\left(\overline{B}\cap A\right)$$
Since, by Axiom I , P( ∩ A) ≥ 0, the above equation yields the following relationship:
$$P(B)\le P(A)$$

Q.E.D.

In sum, the following steps are taken to formulate a probability problem based on the axiomatic approach:
• Define the experiment and the probability space Ω.

• Assign the probabilities of the elementary events.

• Define the event.

• Determine the probability of the event.

The following example is a simple probability problem that can be solved without elaborate formulation. However, we will deliberately go through the above steps to illustrate the axiomatic approach to probability formulation.

### Example 3.2.3

In a die-throwing game, a number less than 5 wins. Find the probability of winning the game.

### Solution

Determine the space:
$$\Omega =\left\{1,2,3,4,5,6\right\}$$
Assign the probabilities to the elementary events:
$$P\left[\left\{1\right\}\right]=P\left[\left\{2\right\}\right]=P\left[\left\{3\right\}\right]=P\left[\left\{4\right\}\right]=P\left[\left\{5\right\}\right]=P\left[\left\{6\right\}\right]=\frac{1}{6}$$
Define the event by selecting the elements of Ω that would make the event happen:
$$A=\left\{1,2,3,4\right\}$$

Find P(A).

The elementary events {1}, {2}, {3}, and {4} are mutually exclusive . Using Axiom III , we obtain the following result:
$$\begin{array}{c}P(A)=P\left[\left\{1,2,3,4\right\}\right]=P\left[\left\{1\right\}\cup \left\{\ 2\right\}\cup \left\{\ 3\right\}\cup \left\{\ 4\right\}\right]\\ {}=P\left[\left\{1\right\}\right]+P\left[\left\{2\right\}\right]+P\left[\left\{3\right\}\right]+P\left[\left\{4\right\}\right]=\frac{4}{6}\end{array}}$$

## 3.3 Conditional Probability

Consider two events A and B defined in the space Ω and the event given by the intersection A ∩ B. The probability measures of these events in Ω are denoted by P(A), P(B), and P(A ∩ B), respectively. Now, let us consider the ratio of P(A ∩ B) to P(A) and that to P(B) as follows assuming that P(A) and P(B) are not zero:
$$\frac{P\left(A\cap B\right)}{P(B)}\kern2.75em \mathrm{and}\kern3em \frac{P\left(A\cap B\right)}{P(A)}\kern0.75em \mathrm{where}\ P(A)\ne 0,P(B)\ne 0$$
(3.11)

In the first ratio, consider that, given B, that is, for a fixed B, A is varied, that is, the ratio is a function of A. Similarly, in the second ratio, the ratio is a function of B for a fixed A. For the time being, let us denote these two quantities by R[A given B] and R[B given A], respectively. We now show that these two quantities are also probability measures in Ω satisfying Axioms I, II, and III. We show this using the first ratio, R[A given B], as A varies with B fixed.

First, the ratio R[A given B] satisfies Axiom I given by (3.1) as follows:

By Axiom I , P(A ∩ B) ≥ 0 and P(B) ≥ 0. Therefore, if P(B) ≠ 0,
$$\frac{P\left(A\cap B\right)}{P(B)}\kern1.25em \ge 0$$
(3.12)

Q.E.D.

Next, the ratio R[A given B] satisfies Axiom II given by (3.2) as follows:

Consider the case A = Ω so that the numerator of the ratio becomes P(Ω ∩ B). We know that, since B ⊂ Ω, P(Ω ∩ B) = P(B). Hence,
$$\frac{P\left(\Omega \kern0.5em \cap B\right)}{P(B)}\kern0.5em =\frac{P(B)}{P(B)}\kern0.5em =1$$
(3.13)

Q.E.D.

Finally, the ratio R[A given B] satisfies Axiom III given by (3.3) as follows:

Consider that A is equal to the union of two mutually exclusive sets C and D in Ω as follows:
$$A=C\cup D,\mathrm{where}\ C\cup D=\left\{\varnothing \right\}$$
and consider the following expression:
$$\frac{P\left[\left(C\cup D\right)\cap B\right]}{P(B)}$$
(3.14)
We have the following set identity:
$$\left(C\cup D\right)\cap B=\left(C\cap B\right)\cup \left(D\cap B\right)$$
Referring to Fig. 3.3, we see that, since C and D are mutually exclusive , (C ∩ B) and (D ∩ B) are mutually exclusive, that is, (C ∩ B) ∪ (D ∩ B) = {∅}. Fig. 3.3 (C ∩ B) ∪ (D ∩ B)={∅}
Therefore, by Axiom III , we obtain the following equation:
$$\begin{array}{c}P\left[\left(C\cup D\right)\cap B\right]=P\left[\left(C\cap B\right)\cup \left(D\cap B\right)\right]\\ {}=P\left[\left(C\cap B\right)\ \right]+P\left[\ \left(D\cap B\right)\right]\end{array}}$$
Hence,
$$\begin{array}{c}\frac{P\left[\left(C\cup D\right)\cap B\right]}{P(B)}\kern0.75em =\frac{P\left[\left(C\cap B\right)\right]+P\left[\left(D\cap B\right)\right]}{P(B)}\\ {}=\frac{P\left[\left(C\cap B\right)\right]}{P(B)}+\frac{P\left[\left(D\cap B\right)\right]}{P(B)}\end{array}}$$
(3.15)

Q.E.D.

### 3.3.1 Definition of the Conditional Probability

We have shown that the ratio of the probability of the intersection of two arbitrary events, A and B, in Ω, that is, (A ∪ B), to the probability of either A or B, is also a probability. In fact, these ratios, R[B given A] or R[A given B], are given a special name, “conditional probability ,” and are denoted by the following notations:
$$P\left(A|B\right)\triangleq \frac{P\left(A\cap B\right)}{P(B)}\kern0.75em \mathrm{where}\ P(B)\ne 0$$
(3.16)
$$P\left(B|A\right)\triangleq \frac{P\left(A\cap B\right)}{P(A)}\kern0.75em \mathrm{where}\ P(A)\ne 0$$
(3.17)

The first conditional probability as defined above can be interpreted as the probability of event A given that event B has occurred, and, similarly, the second conditional probability, as the probability of event B given that event A has occurred.

To understand this interpretation and how the name “conditional probability ” is given to the quantity, consider the following: the probability of event A in the sample space Ω, P(A), is the probability measure given to A with respect to the total measure 1 of Ω, that is, P(A)/P(Ω) = P(A)/1. The probability measure that event A would occur due to the outcomes from the overlapping portion between A and B would be P(A ∩ B). Now, consider the probability measure of the event A if the space for A is restricted to B. This last probability measure would be the probability measure P(A ∩ B) relative to the probability measure of B, P(B), that is, the ratio defined as the conditional probability P(A/B) (Fig. 3.4). Fig. 3.4 (A ∩ B)

### Theorem 3.3.1

Let B 1 , B 2 ,  …  , B i  ,  …  , B n be n subsets of the space Ω that form a partition of Ω. Consider an arbitrary event A in Ω with a nonzero probability , P(A) > 0. Then, P(A) can be expressed in terms of the conditional probabilities given B i , i = 1, 2, …, n as follows:
$$\begin{array}{c}P(A)=P\left(A|{B}_1\right)P\left({B}_1\right)+P\left(A|{B}_2\right)P\left({B}_2\right)+\cdots \\ {}\kern36.5pt +P\left(A|{B}_i\right)P\left({B}_i\right)+\cdots +P\left(A|{B}_n\right)P\left({B}_n\right)\end{array}}$$
(3.18)

The last equation is referred to as the total probability theorem .

### Proof

Referring to Fig. 3.5, we can express event A as the union of n intersections as follows: Fig. 3.5 A = (A ∩ B 1) ∪ (A ∩ B 2) ∪  …  ∪ (A ∩ B i ) ∪  …  ∪ (A ∩ B n )
$$A=\left(A\cap {B}_1\right)\cup \left(A\cap {B}_2\right)\cup \dots \cup \left(A\cap {B}_i\right)\cup \dots \cup \left(A\cap {B}_n\right)$$
These n intersections are mutually exclusive , that is, (A ∩ B i ) ∩ (A ∩ B j ) = {∅}, i ≠ j. By Axiom III , we obtain the following equation:
$$\begin{array}{l}P(A)=P\left[\left(A\cap {B}_1\right)\cup \left(A\cap {B}_2\right)\cup \cdots \cup \left(A\cap {B}_i\right)\cup \cdots \cup \left(A\cap {B}_n\right)\right]\kern1.00em \\ {}\kern2em =P\left[\left(A\cap {B}_1\right)\right]+P\left[\left(A\cap {B}_2\right)\right]+\cdots +P\left[\left(A\cap {B}_i\right)\right]\kern1.00em \\ {}\kern3em +\cdots \kern.5em +P\left[\left(A\cap {B}_n\right)\right]\kern1.00em \end{array}}$$
The right-hand side of the above equation can be expressed in the form of the total probability theorem as follows. First, by the definition of the conditional probability , we have the following equation
$$P\left(A|{B}_i\right)\triangleq \frac{P\left(A\cap {B}_i\right)}{P\left({B}_i\right)}\kern0.75em \mathrm{where}\ P\left({B}_i\right)\ne 0$$
Next, rearranging the terms of the above equation, we obtain the following expression:
$$P\left(A\cap {B}_i\right)=P\left(\left.A\right|{B}_i\right)P\left({\mathrm{B}}_i\right)\kern0.5em \mathrm{for}\kern0.5em i=1,\dots, n$$
Finally, substituting the above expression in the right-hand side of the equation for P(A), we obtain the following equation
$$\begin{array}{c}P(A)=P\left(A|{B}_1\right)P\left({B}_1\right)+P\left(A|{B}_2\right)P\left({B}_2\right)\\ {}\kern6em +\cdots P\left(A|{B}_i\right)P\left({B}_i\right)+\cdots P\left(A|{B}_n\right)P\left({B}_n\right)\end{array}}$$

Q.E.D.

### Theorem 3.3.2

Let B 1 , B 2 ,  …  , B i  ,  …  , B n are n subsets of the space Ω that form a partition of Ω. Consider an arbitrary event A with a nonzero probability , P(A) > 0. The conditional probability of B i given A is given by the following equation:
$$P\left({B}_i|A\right)=\frac{P\left({B}_i\right)P\left(A|{B}_i\right)}{P\left({B}_1\right)P\left(A|{B}_1\right)+P\left({B}_2\right)P\left(A|{B}_2\right)+\cdots P\left({B}_n\right)P\left(A|{B}_n\right)}$$
(3.19)

This theorem is referred to as the Bayes’ theorem and is used to determine the probability that a given event A implies the subset B i of the partition . For example, given that a product is found to be defective, denoted by event A, the theorem can be used to calculate the probability that the defective product is from supplier B i , when the defect data for each supplier, P(A|B i ), is available.

### Proof

The left-hand side of the above equation is the conditional probability of B i given A, which is given by the following equation by definition:
$$P\left({B}_i|A\right)=\frac{P\left({B}_i\cap A\right)}{P(A)}$$
(3.20)
On the other hand, the conditional probability of A given B i is given by
$$P\left(A|{B}_i\right)=\frac{P\left({B}_i\cap A\right)}{P\left({B}_i\right)}$$
(3.21)
Rearranging the terms of the last equation, we obtain the following equation:
$$P\ \left({B}_i\cap A\right)=P\left({B}_i\right)P\left(A|{B}_i\right)$$
(3.22)
Substituting (3.22) into (3.20), we have
$$P\left({B}_i|A\right)=\frac{P\left({B}_i\right)P\left(A|{B}_i\right)}{P(A)}$$
(3.23)
substituting (3.18) into (3.23) yields (3.19).

Q.E.D.

### Example 3.3.1

A reliability problem. A component is randomly selected from a batch of 10,000 pieces supplied by five different factories. The following table shows the factory data of failure statistics of the component and the number of pieces supplied by the factories. Suppose that the randomly selected component has just failed. What is the probability that the failed component is from Factory A?

 Factory #Supplied Probability of failure Factory #Supplied Probability of failure A 1000 P(fail|A) = 1.3×10−6 D 2000 P(fail|D) = 1.4×10−6 B 3000 P(fail|B) = 1.2×10−6 E 1000 P(fail|E) = 1.5×10−6 C 3000 P(fail|C) = 1.1×10−6

From the number of components supplied by each factory given above, we have

 P(A) = 1000/10,000 = 0.1 P(B) = 3000/10,000 = 0.3 P(C) = 3000/10,000 = 0.3 P(D) = 2000/10,000 = 0.2 P(E) = 1000/10,000 = 0.1
Using the Bayes’ theorem and substituting the above probabilities and the failure statistics of each factory given by the above table, we obtain the following solution:
$$\begin{array}{lcl}P\left(A|\mathrm{fail}\right)& = &\frac{P(A)P\left(\mathrm{fail}|A\right)}{P\left(\mathrm{fail}|A\right)P(A)+\left(\mathrm{fail}|B\right)P(B)+P\left(\mathrm{fail}|C\right)P(C)+\left(\mathrm{fail}|D\right)P(D)+\left(\mathrm{fail}|E\right)P(E)}\\ {}& = &\frac{1.3\times {10}^{-6}\times 0.1}{1.3\times {10}^{-6}\times 0.1+1.2\times {10}^{-6}\times 0.3+1.1\times {10}^{-6}\times 0.3+1.4\times {10}^{-6}\times 0.2+1.5\times {10}^{-6}\times 0.1}\\ {} & = &0.104\end{array}}$$

### Example 3.3.2

A communications signal detection problem. A total of 4000 characters have been received from four different sources as follows. The probabilities of character “a” from the four sources are given. Out of the total 4000 characters received, a randomly selected character is found to be “a.” What is the probability that this character came from Source A?

 Source #Characters sent Probability of “a” Source #Characters sent Probability of “a” A 500 P(a|A) = 0.1 C 2000 P(a|C) = 0.3 B 1000 P(a|B) = 0.2 D 500 P(a|D) = 0.4
Based on the number of characters sent by each source, we have
$$P(A)=\frac{500}{4000}=\frac{1}{8}\kern1.em P(B)=\frac{1000}{4000}=\frac{2}{8}\kern1.em P(C)=\frac{2000}{4000}=\frac{4}{8}\kern1.em P(D)=\frac{500}{4000}=\frac{1}{8}$$
By the Bayes’ theorem, we obtain the following solution:
$$P\left(A|a\right)=\frac{0.1\times \frac{1}{8}}{0.1\times \frac{1}{8}+0.2\times \frac{2}{8}+0.3\times \frac{4}{8}+0.4\times \frac{1}{8}}=0.476$$

### 3.3.4 Independence of Events

In our everyday language, we say that two events A and B are independent if the occurrence of A has no effect on the occurrence of B and vice versa. This lexical definition of the word independence can be expressed in terms of the conditional probability as follows:
$$P\left(A|B\right)=\kern0.5em P(A)$$
(3.24)
$$P\left(B|A\right)=\kern0.5em P(B)$$
(3.25)
By the definition of the conditional probability , these two equations read as follows: the probability of A’s occurrence, P(A), stays unchanged regardless of B’s occurrence, that is, P(A|B), and the probability of B’s occurrence, P(B), stays unchanged regardless of A’s occurrence. By combining the definition of the conditional probability given by (3.16) and (3.17) and the above statements of independence , we obtain the following equation:
$$P\left(A|B\right)=\frac{P\left(A\cap B\right)}{P(B)}=P(A)$$
(3.26)
$$P\left(B|A\right)=\frac{P\left(A\cap B\right)}{P(A)}=P(B)$$
(3.27)
From these, we have the following relationship for two independent events A and B:
$$P\left(A\cap B\right)=P(A)P(B)$$
(3.28)

### Definition of Independence

Two events A and B are said to be independent if
$$P\left(A\cap B\right)=P(A)P(B)$$
(3.29)

This definition of independence is consistent with the definition of the conditional probability .

## 3.4 Cartesian Product

This section defines a special type of set called the Cartesian product. To illustrate the Cartesian product , consider the following example:

### Example 3.4.1

You are to throw a coin and then throw a die. If the coin shows the heads side and the die shows less than four dots, you win a prize. What is the probability of winning the prize? This problem can be formulated using the axiomatic probability approach as follows: first, formulate the game as a combined experiment of two separate experiments of coin-throwing and die-throwing experiments with spaces Ω1 and Ω2, respectively, as follows:
$${\Omega}_1=\left\{\mathrm{heads},\mathrm{tails}\right\}\kern5.25em {\Omega}_1=\left\{1,2,3,4,5,6\right\}$$
where the numbers in Ω2 represent the number of dots on the die.
The space Ω of all possible outcomes of the game is the Cartesian product of Ω1 and Ω2 as follows (Fig. 3.6):
$$\begin{array}{ll}\Omega & ={\Omega}_1\times {\Omega}_2\\ {}& =\left\{\left(\mathrm{heads},1\right),\left(\mathrm{heads},2\right),\left(\mathrm{heads},3\right),\left(\mathrm{heads},4\right),\left(\mathrm{heads},5\right),\left(\mathrm{heads},6\right),\left(\mathrm{tails},1\right),\left(\mathrm{tails},2\right),\left(\mathrm{tails},3\right),\left(\mathrm{tails},4\right),\left(\mathrm{tails},5\right),\left(\mathrm{tails},6\right)\right\}\end{array}}$$
The event A “winning the prize” consists of the following elements of Ω:
$$A=\left\{\left(\mathrm{heads},1\right),\left(\mathrm{heads},2\right),\left(\mathrm{heads},3\right)\right\}$$

We will return to this example to calculate the probability of the event A after discussing more on the combined experiment .

Figure 3.7 illustrates that, if E and F are subsets of X and Y, respectively, their Cartesian product , E × F, is a subset of X × Y.
$$E\subset X\ \mathrm{and}\ F\subset Y\kern0.75em \Rightarrow \kern1.75em E\times F\kern0.5em \subset \kern0.5em X\times Y$$
Consider two spaces, Ω1 and Ω2
$${\Omega}_1=\left\{{x}_1,{x}_2,\dots, {x}_i,\dots, {x}_m\right\}\kern2.5em {\Omega}_2=\left\{{y}_1,{y}_2,\dots, {y}_i,\dots, {y}_n\right\}$$
and the events represented by the following two Cartesian products :
$$E\times {\Omega}_2,\mathrm{where}\ E\subset {\Omega}_1\kern5.75em {\Omega}_1\times F,\mathrm{where}\ F\subset {\Omega}_2$$
The event E × Ω2 would occur if an element of E and an element of Ω2 occur to form a pair, which is a member of the Cartesian product E × Ω2. Since Ω2 is a space , any element of Ω2 paired with an element of E would make the event E × Ω2 happen. By this reasoning, we establish
$$P\left(E\times {\Omega}_2\right)=P\left(E\ \right)$$
(3.30)
$$P\left({\Omega}_1\times F\right)=P\left(F\ \right)$$
(3.31)
where P(E) and P(F) are probabilities of the events E and F defined in Ω1 and Ω1, respectively.
Figure 3.8 illustrates that the Cartesian product E × F can be expressed as the intersection of the two Cartesian products E × Ω2 and Ω1 × F as follows:
$$E\times F=\left(E\times {\Omega}_2\right)\cap \left({\Omega}_1\times F\right)$$
(3.32) Fig. 3.8 (a) E × Ω2, (b) Ω1 × F, (c) E × F = (E × Ω2) ∩ (Ω1 × F)

Assume that the two individual experiments with spaces Ω1 and Ω2, respectively, are independent , that is, an outcome from Ω1 has no effect on the outcome from Ω2. Under this condition, the two events E × Ω2 and Ω1 × F are independent.

From (3.32), we obtain the following equation:
$$\begin{array}{c}P\left(E\times F\right)=P\left[\left(E\times {\Omega}_2\right)\cap \left({\Omega}_1\times F\right)\right]\\ {}=P\left(E\times {\Omega}_2\right)P\left({\Omega}_1\times F\right)\end{array}}$$
(3.33)
Applying (3.30) and (3.31) to the above equation, we obtain the following equation:
$$P\left(E\times F\right)=P\left(E\ \right)P(F)$$
(3.34)
Consider the case where E and F are elementary events in Ω1 and Ω2, respectively, as follows:
$$E=\left\{{x}_i\right\}\kern4.5em F=\left\{{y}_j\right\}$$
Substituting the above into (3.34), we obtain the following equation:
$$P\left(E\times F\right)=P\left[\left\{{x}_i\right\}\times \left\{{y}_j\right\}\right]=P\left[\left\{{x}_i\right\}\ \right]P\left[\left\{{y}_j\right\}\right]$$
(3.35)
To illustrate the concepts of (3.24) and (3.35), return to Example 3.4.1 and find the probability of the event A, where
$$\begin{array}{c}A=\left\{\left(\mathrm{heads},1\right),\left(\mathrm{heads},2\right),\left(\mathrm{heads},3\right)\right\}\\ {}P(A)=P\left[\left\{\left(\mathrm{heads},1\right),\left(\mathrm{heads},2\right),\left(\mathrm{heads},3\right)\right\}\right]\\ {}=P\left[\left\{\left(\mathrm{heads},1\right)\right\}\right]+P\left[\left\{\left(\mathrm{heads},2\right)\right\}\right]+P\left[\left\{\left(\mathrm{heads},3\right)\right\}\right]\end{array}}$$
(3.36)

Note that {(heads, 1)}, {(heads, 2)}, and {(heads, 3)} are elementary events in the combined experiment space Ω.

Using (3.35), we can express the probabilities of the elementary events of the set A as the products of the probabilities of the elementary events of Ω1 and Ω2, respectively, as follows:
$$P\left[\left\{\left(\mathrm{heads},1\right)\right\}\right]=P\left[\left\{\mathrm{heads}\right\}\times \left\{1\right\}\right]=P\left[\left\{\mathrm{heads}\right\}\ \right]P\left[\left\{1\right\}\right]$$
(3.37)
$$P\left[\left\{\left(\mathrm{heads},2\right)\right\}\right]=P\left[\left\{\mathrm{heads}\right\}\times \left\{2\right\}\right]=P\left[\left\{\mathrm{heads}\right\}\ \right]P\left[\left\{2\right\}\right]$$
(3.38)
$$P\left[\left\{\left(\mathrm{heads},3\right)\right\}\right]=P\left[\left\{\mathrm{heads}\right\}\times \left\{3\right\}\right]=P\left[\left\{\mathrm{heads}\right\}\ \right]P\left[\left\{3\right\}\right]$$
(3.39)
where {heads} is an elementary event in Ω1 and {1}, {2}, and {3} are elementary events in Ω2.
Assume the following probabilities for these elementary events in the two separate spaces:
$$\begin{array}{c}P\left[\left\{\mathrm{heads}\right\}\right]=\frac{1}{2}\\ {}P\left[\left\{1\right\}\right]=P\left[\left\{2\right\}\right]=P\left[\left\{3\right\}\right]=\frac{1}{6}\end{array}}$$
Substituting these probabilities into (3.37) through (3.39) and the resultants in (3.36), we have
$$P(A)=\frac{1}{2}\left(\frac{1}{6}+\frac{1}{6}+\frac{1}{6}\right)=\frac{1}{4}$$
To illustrate (3.30), in Example 3.4.1, change the event as follows: You win the prize if coin-throwing shows heads regardless of the outcome of die-throwing . This event, denoted by B, is
$$\begin{array}{c}B=\left\{\left(\mathrm{heads},1\right),\left(\mathrm{heads},2\right),\left(\mathrm{heads},3\right),\left(\mathrm{heads},4\right),\left(\mathrm{heads},5\right),\left(\mathrm{heads},6\right)\right\}\\ {}=\left\{\mathrm{heads}\right\}\times {\Omega}_2\end{array}}$$
Since event B occurs if coin-throwing shows heads regardless of the outcome of die-throwing , P(B) should be equal to P[{heads}]. This is confirmed by applying (3.30) to the above as follows:
$$P(B)=P\left[\left\{\mathrm{heads}\right\}\times {\Omega}_2\right]=P\left[\left\{\mathrm{heads}\right\}\right]P\left[{\Omega}_2\right]=P\left[\left\{\mathrm{heads}\right\}\right]=\frac{1}{2}$$
Similarly, to illustrate (3.31), suppose that you win a prize if die-throwing shows six regardless of the outcome of coin-throwing. This event , denoted by C, is
$$C=\left\{\left(\mathrm{heads},6\right),\left(\mathrm{tails},6\right)\right\}={\Omega}_2\times \left\{6\right\}$$
Since event C occurs if die-throwing shows six regardless of the outcome of coin-throwing, P(C) should be equal to P[{6}]. This is confirmed by applying (3.34) to the above as follows:
$$P(C)=P\left[{\Omega}_2\times \left\{6\right\}\right]=P\left[{\Omega}_2\ \right]P\left[\left\{6\right\}\right]=\frac{1}{6}$$