# Useful probability considerations in genetics: the goat problem with tigers and other applications of Bayes’ theorem

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.1007/s00431-005-0039-2

- Cite this article as:
- Oexle, K. Eur J Pediatr (2006) 165: 299. doi:10.1007/s00431-005-0039-2

- 1 Citations
- 72 Downloads

## Abstract

Probabilities or risks may change when new information is available. Common sense frequently fails in assessing this change. In such cases, Bayes’ theorem may be applied. It is easy to derive and has abundant applications in biology and medicine. Some examples of the application of Bayes' theorem are presented here, such as carrier risk estimation in X-chromosomal disorders, maximal manifestation probability of a dominant trait with unknown penetrance, combination of genetic and non-genetic information, and linkage analysis. The presentation addresses the non-specialist who asks for valid and consistent explanations. The conclusion to be drawn is that Bayes’ theorem is an accessible and helpful tool for probability calculations in genetics.

### Keywords

Bayes’ theorem Carrier Penetrance Error rate Linkage analysis Monty Hall problem## Introduction

Yes, she should, but she fatally sticks to her initial choice instead. This is not surprising since such mistakes are made frequently [7, 18]. However, if Anne was aware of Bayes’ theorem, she would be able to calculate her chances correctly. Besides, her day-to-day work in genetics would also improve. Therefore, this theorem and some useful applications are reviewed here.

The derivation of Bayes’ theorem is simple, as shown in the next section. Given the prior probability of an event (or a state of affairs such as “tiger behind door 2”) Bayes’ theorem enables the calculation of the posterior probability which prevails after some new information has been provided by another event. Thus, the posterior probability is a *conditional* probability (e.g., the probability of a tiger being behind door 2 *if* Barbara indicates a tiger behind door 1). In order to calculate this probability Bayes’ theorem uses inverse conditional probabilities which are frequently quite obvious (e.g., the probability that Barbara indicates a tiger behind door 1 *if* door 3 leads to freedom is obviously 50% since Barbara may indicate either the tiger behind door 1 or the tiger behind door 2).

## Bayes’ theorem

*conditional*probability p(B/A) that B occurs if A has occurred,

*n*alternative events A

_{1}, A

_{2},...A

_{n}. One – but only one – of these events takes place. Thus, they are jointly exhaustive and mutually exclusive. Their probabilities add up to 1, that is, 100%,

_{1}=A and A

_{2}=non-A with \({+AFw-text{p}}{+AFw-left( {{+AFw-text{A}}+AFw;{+AFw-text{or}}+AFw;{+AFw-text{non - A}}} +AFw-right)}{+AFw-text{ = p}}{+AFw-left( {+AFw-text{A}} +AFw-right)}{+AFw-text{ p}}{+AFw-left( {{+AFw-text{non - A}}} +AFw-right)}{+AFw-text{ = 1}}\). Under the condition that some independent event B has taken place, still one and only one of the events A

_{1}, A

_{2},...A

_{n}occurs. Hence, the conditional probabilities p(A

_{I}/B) also add up to 1,

_{I}p(A

_{I}/B) p(B)=p(B) which, according to Eq. 2, is equivalent to

_{I}). Introducing Eq. 5 in Eq. 2 – that is, in p(A

_{k}/B) p(B)=p(B/A

_{k}) p(A

_{k}) – yields Bayes’ theorem,

*n*}. In the special case of A

_{k}=A

_{1}=A and A

_{2}=non-A,

## Carrier risk estimation

_{2}) – which is the conditional probability that the woman is a carrier if her second son is healthy (“H

_{2}”) – Bayes’ theorem is applied according to Eq.–7 using the prior probability \({+AFw-text{p}}{+AFw-left( {+AFw-text{C}} +AFw-right)}{+AFw-text{ = 0}}{+AFw-text{.66 = 1 - p}}{+AFw-left( {{+AFw-text{non - C}}} +AFw-right)}\), the conditional probability p(H

_{2}/C)=0.5 that a son is unaffected if the mother is a carrier, and the conditional probability p(H

_{2}/non-C)≈1that a son is unaffected if the mother is not a carrier.

_{3}(Fig. 2b). Thereby, her probability of being a carrier further declines as can be calculated by a repeated application of Bayes’ theorem. The posterior risk p(C/H

_{2})=0.5 calculated before now is the prior probability, while the probabilities p(H

_{3}/C) and p(H

_{3}/non-C) are 0.5 and 1, respectively, as before.

In this manner, pedigree information may be introduced stepwise into a Bayesian risk calculation. Eq. (8) gives an idea how such a multistep calculation can be handled efficiently.

It is necessary, however, to inform the reader about confounding factors such as germinal mosaicism [3], reduced fitness of carriers, and unequal mutation rates in oogenesis versus spermatogenesis [4, 17]. For example, in the case of the X-chromosomal urea cycle disorder ornithine transcarbamylase deficiency (OTCD), 23% of the heterozygous female patients and 80% (instead of 66%) of the male patients have de novo mutations [9]. These proportions result from the reduced reproductive fitness of heterozygotes and the predominance of mutations in spermatogenesis [15].

The woman’s risk of being a carrier is considerably reduced. However, there is a remaining risk of ½0=5% due to fact that the sensitivity of the test is not 100%. The consequences of submaximal sensitivity of the allopurinol test as applied to possible OTCD carriers has been analyzed by Oexle et al. [12]. Carrier risk assessment in Duchenne muscular dystrophy using creatine kinase activity data quantitatively (instead of merely the alternative of positive or negative result) and incorporating repeated measurements has been established by Percy et al. [13].

## Dominant trait with unknown penetrance

_{III}) of III:1 to be affected by the disorder “D” of the grandmother? II:1 may or may not carry the mutated gene. If II:1 is a carrier, III:1 may inherit the gene with a probability of 0.5, but even then III:1 may be un-affected since the penetrance of the disorder is incomplete. Assume that

*f*indicates the degree of penetrance, i.e., the fraction of carriers that are affected (0≤

*f*≤1). Then, p(D

_{III}) is given as,

_{II}/non-D

_{II}) is the probability of II:1 to be a carrier despite her normal phenotype. This probability can be calculated in Bayesian manner using Eq. 7,

_{II})=0.5=p(non-C

_{II}). Non-carriers are not affected, that is, p(non-D

_{II}/non-C

_{II})=1. Carriers may be unaffected due to incomplete penetrance, p(non-D

_{II}/C

_{II}) = 1−

*f*. Hence, Eqs. 12 and 13 yield

_{III}) as a function of

*f*in the interval of 0≤

*f*≤1 is located at the zero of its derivative. With

_{III}) is found at

*f*=(2–2

^{½})=0.59. Thus, with Eq. 14, p(D

_{III})

_{max}=0.086=8.6%, which is surprisingly low. The same maximal risk results from the tabular analysis presented by Young [20].

## Linkage analysis

Linkage analysis, which has been of tremendous success in recent medical research, is also a method that relates to Bayes’ theorem. The following section will supply the interested reader with a satisfying understanding of two-point linkage analysis.

Two loci of the genome (e.g., a genetic marker and an unknown gene) may or may not be linked. If they are linked, they have a tendency to cosegregate. Since the haploid human genome is distributed on 23 different chromosomes, the prior probability of linkage p(L) was assumed to be 5% initially; i.e., roughly equal to the probability that two randomly selected loci reside on the same chromosome [10]. However, due to intervening recombinations, the linkage of two loci on the same chromosome is lost if they are far apart. Calculations based on genome length and the distance between loci over which one could detect linkage, determined a prior probability of p(L)=2% (=0.02) for linkage between a given locus of interest and some randomly chosen genome location [2].

*l*ogarithm of

*od*ds (“LOD”, [1, 10]),

*n*meioses, constellation K may have occurred fortuitously with probability

Therefore, with Eq. 17, at least \(n = 3 +AFw-mathord{+AFw-left/ {+AFw-vphantom {3 {+AFw-log _{{10}} {+AFw-left( 2 +AFw-right)}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-log _{{10}} {+AFw-left( 2 +AFw-right)}}\) meioses have to be examined if a LOD of 3 and a significance level of 0.05 are to be reached in the linkage analysis. If recombinations between genetic marker and gene of the disorder are possible, then p(K/L)<1, LOD<log_{10}(1/(0.5)^{n}), and more than *n*=10 meioses are required.

*A1*(i.e., being in phase with

*A1*) or on the chromosome of allele

*A2*(i.e., being in phase with

*A2*). Thus, there are three mutually exclusive states, state L

_{A1}(i.e., mutated gene linked in phase with

*A1*), state L

_{A2}(i.e., mutated gene linked in phase with

*A2*), and state non-L (i.e., no linkage), with the prior probabilities p(L

_{A1})=p(L

_{A2})=p(L)/2, and p(non-L)=1–p(L), respectively. In this case of three jointly exhaustive and mutually exclusive alternatives, the general version of Bayes’ theorem (Eq. 6) has to be applied to calculate the error probability p(non-L/K), i.e., the significance level,

_{A1})=p(L

_{A2})=p(L)/2 and applying the logarithms, Eq. 20 yields the LOD

*n*meioses, there may be

*r*recombinant and (

*n*-

*r*) non-recombinant meioses with respect to marker allele

*A1*. If the mutated gene was in phase with allele

*A1*(state L

_{A1}), the specific pedigree constellation K has occurred with probability

*n*independent meiotic events. In the opposite state L

_{A2}, (

*n*−

*r*) meioses were recombinant and

*r*were non-recombinant,

The probability that constellation K has occurred in the absence of linkage is p(K/non-L)=(0.5)^{n} as shown above (compare Eq. 18).

_{LODmax}where LOD is maximal. Via \(d{+AFw-left( {p{+AFw-left( {K +AFw-mathord{+AFw-left/ {+AFw-vphantom {K {L_{{A1}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {L_{{A1}} }} +AFw-right)}} +AFw-right)} {p{+AFw-left( {K +AFw-mathord{+AFw-left/ {+AFw-vphantom {K {L_{{A2}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {L_{{A2}} }} +AFw-right)}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{p{+AFw-left( {K +AFw-mathord{+AFw-left/ {+AFw-vphantom {K {L_{{A2}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {L_{{A2}} }} +AFw-right)}} {d+AFw-theta }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {d+AFw-theta } = 0\), one calculates

Linkage information may be collected from two pedigrees if they have the same disorder. For that purpose, the posterior probability of linkage in the first pedigree is used as prior probability of linkage in the second pedigree. Consequently, as derived easily from Eq. 8, the odds ratios may be multiplied; that is, the LODs of two pedigrees may be added.

Linkage analysis is an elaborate field of genetic research. Of course, not all its aspects are presented here. For further reading see, for example, Freimer and Sabatti [2] or Strauch et al. [14].

## Discussion

Bayes’ theorem as well as the problems that it can solve reside just above common sense intuition [7, 18]. For example, the theorem is helpful when: (1) reconsidering the (“prior”) probability of a possible carrier in view of additional (i.e., “conditional”) information – e.g., after the birth of an unaffected child; (2) combining genetic or non-genetic data from pedigrees or tests; (3) requiring the inverse of a conditional probability – e.g., the probability that a woman is a carrier if she gives birth to an unaffected son instead of the probability that she will have an unaffected son if she is a carrier. When there is doubt with respect to the application of Bayes' theorem, a review of the examples of risk calculation and linkage analysis presented in this paper may provide the necessary elucidation.

_{1})=p(F

_{2})=p(F

_{3})=1/3. The probability that Barbara shows a tiger behind the first door if this door leads to freedom is zero, p(T

_{1}/F

_{1})=0. The probability that Barbara shows a tiger behind the first door if the second door leads to freedom is one, p(T

_{1}/F

_{2})=1, since she is not allowed to show the tiger behind door 3 that Anne has chosen. The probability that Barbara shows a tiger behind the first door if the third door leads to freedom is one half, p(T

_{1}/F

_{3})=1/2, since she may chose either the first or the second door. Bayes’ theorem as given in Eq. 6 indicates that the probability that door 2 leads to freedom if Barbara shows a tiger behind door 1 is

Anne doubles her chance to reach freedom if she switches from the third to the second door after Barbara has shown her the tiger behind the first door. This can be made intuitively obvious if the three possible distributions of the two tigers are considered. In two cases Anne is safe if she turns away from the door that she has chosen at first. However, this revealing consideration is not to suggest that Bayesian dissection of conditional probabilities can be left aside. Next time – if there is a next time – Anne might run into the following version of a problem provided by Lewis Carrol [8]. A bag contains a pearl, known to be either black and precious or white and faked. A white faked pearl is put in the bad, the bag is shaken, and a pearl is drawn out which proves to be white. What is now the chance of having a precious black pearl in the bag?