European Journal of Pediatrics

, Volume 165, Issue 5, pp 299–305

Useful probability considerations in genetics: the goat problem with tigers and other applications of Bayes’ theorem


    • Institut für Klinische Genetik, Medizinische Fakultät Carl Gustav CarusTechnische Universität Dresden
Original Paper

DOI: 10.1007/s00431-005-0039-2

Cite this article as:
Oexle, K. Eur J Pediatr (2006) 165: 299. doi:10.1007/s00431-005-0039-2


Probabilities or risks may change when new information is available. Common sense frequently fails in assessing this change. In such cases, Bayes’ theorem may be applied. It is easy to derive and has abundant applications in biology and medicine. Some examples of the application of Bayes' theorem are presented here, such as carrier risk estimation in X-chromosomal disorders, maximal manifestation probability of a dominant trait with unknown penetrance, combination of genetic and non-genetic information, and linkage analysis. The presentation addresses the non-specialist who asks for valid and consistent explanations. The conclusion to be drawn is that Bayes’ theorem is an accessible and helpful tool for probability calculations in genetics.


Bayes’ theoremCarrierPenetranceError rateLinkage analysisMonty Hall problem


Common sense is not infallible. Consider the following version of the infamous “goat” [18] or “Monty Hall” [7] problem: Anne and Barbara, two young geneticists who happily and accepted a grant from Kim Jong-il’s biotechnology fund, visit their donator to report on their ambitious research. Proudly they present their results. Kim is not satisfied, however, since the application of the results for his bioweapon program is not obvious to him. Enraged, he commands Anne and Barbara into “the room with the three doors”. One door leads to a safe return home, while a hungry tiger lies in wait behind each of the other doors. Barbara is told where the tigers are but Anne is the one to choose a door. As a macabre twist of the grave situation Barbara has to indicate a door that hides a tiger – but not before Anne has made an initial choice and not about the door that Anne is choosing at first. The sequence of events is shown in Fig. 1. Anne opts for door 3, and Barbara shows her a tiger behind one of the other doors, e.g., door 1. Should Anne revise her decision now and opt for the remaining door?
Fig. 1

The room with the three doors. Only one door is safe, while two doors lead into tiger cages. Anne makes a choice (a). Before Anne opens that door, Barbara is allowed to show her a tiger behind one of the other doors. Now, Anne should revise her decision but she stubbornly adheres to her initial choice (b)

Yes, she should, but she fatally sticks to her initial choice instead. This is not surprising since such mistakes are made frequently [7, 18]. However, if Anne was aware of Bayes’ theorem, she would be able to calculate her chances correctly. Besides, her day-to-day work in genetics would also improve. Therefore, this theorem and some useful applications are reviewed here.

The derivation of Bayes’ theorem is simple, as shown in the next section. Given the prior probability of an event (or a state of affairs such as “tiger behind door 2”) Bayes’ theorem enables the calculation of the posterior probability which prevails after some new information has been provided by another event. Thus, the posterior probability is a conditional probability (e.g., the probability of a tiger being behind door 2 if Barbara indicates a tiger behind door 1). In order to calculate this probability Bayes’ theorem uses inverse conditional probabilities which are frequently quite obvious (e.g., the probability that Barbara indicates a tiger behind door 1 if door 3 leads to freedom is obviously 50% since Barbara may indicate either the tiger behind door 1 or the tiger behind door 2).

Bayes’ theorem

Bayes theorem is derived from the following rather simple principles. The first principle is that the joint probability p(A & B) of two events A and B is the probability that both events occur and equals the probability of event A times the conditional probability p(B/A) that B occurs if A has occurred,
$$ \begin{array}{*{20}l} {{{\text{p}}{\left( {{\text{A}}\& {\text{B}}} \right)} = {\text{p}}{\left( {{\text{B}} \mathord{\left/ {\vphantom {{\text{B}} {\text{A}}}} \right. \kern-\nulldelimiterspace} {\text{A}}} \right)}{\text{p}}{\left( {\text{A}} \right)}} \hfill} \\ {{{\text{Since}}\;{\text{p}}{\left( {{\text{A}}\& {\text{B}}} \right)} = {\text{p}}{\left( {{\text{B}}\& {\text{A}}} \right)},} \hfill} \\ \end{array} $$
$${+AFw-text{p}}{+AFw-left( {{+AFw-text{A}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{A}} {+AFw-text{B}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{B}}} +AFw-right)}{+AFw-text{p}}{+AFw-left( {+AFw-text{B}} +AFw-right)}{+AFw-text{ = p}}{+AFw-left( {{+AFw-text{B}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{B}} {+AFw-text{A}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{A}}} +AFw-right)}{+AFw-text{p}}{+AFw-left( {+AFw-text{A}} +AFw-right)}$$
The second principle involves the assumption that there are n alternative events A1, A2,...An. One – but only one – of these events takes place. Thus, they are jointly exhaustive and mutually exclusive. Their probabilities add up to 1, that is, 100%,
$${+AFw-text{p}}{+AFw-left( {{+AFw-text{A}}_{{+AFw-text{1}}} +AFw;{+AFw-text{or}}+AFw;{+AFw-text{A}}_{{+AFw-text{2}}} +AFw;{+AFw-text{or}}+AFw;{+AFw-text{A}}_{{+AFw-text{n}}} } +AFw-right)}{+AFw-text{ = p}}{+AFw-left( {{+AFw-text{A}}_{{+AFw-text{1}}} } +AFw-right)}{+AFw-text{ p}}{+AFw-left( {{+AFw-text{A}}_{{+AFw-text{2}}} } +AFw-right)}{+AFw-text{ }}...{+AFw-text{ p}}{+AFw-left( {{+AFw-text{A}}_{{+AFw-text{n}}} } +AFw-right)}{+AFw-text{ = 1}}$$
As a simple example, consider A1=A and A2=non-A with \({+AFw-text{p}}{+AFw-left( {{+AFw-text{A}}+AFw;{+AFw-text{or}}+AFw;{+AFw-text{non - A}}} +AFw-right)}{+AFw-text{ = p}}{+AFw-left( {+AFw-text{A}} +AFw-right)}{+AFw-text{ p}}{+AFw-left( {{+AFw-text{non - A}}} +AFw-right)}{+AFw-text{ = 1}}\). Under the condition that some independent event B has taken place, still one and only one of the events A1, A2,...An occurs. Hence, the conditional probabilities p(AI/B) also add up to 1,
$${+AFw-text{p}}{+AFw-left( {{{+AFw-text{A}}_{{+AFw-text{1}}} } +AFw-mathord{+AFw-left/ {+AFw-vphantom {{{+AFw-text{A}}_{{+AFw-text{1}}} } {+AFw-text{B}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{B}}} +AFw-right)}{+AFw-text{ }}...{+AFw-text{ p}}{+AFw-left( {{{+AFw-text{A}}_{{+AFw-text{n}}} } +AFw-mathord{+AFw-left/ {+AFw-vphantom {{{+AFw-text{A}}_{{+AFw-text{n}}} } {+AFw-text{B}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{B}}} +AFw-right)}{+AFw-text{ = }}+AFw-sum _{{+AFw-text{i}}} {+AFw-text{p}}{+AFw-left( {{{+AFw-text{A}}_{{+AFw-text{i}}} } +AFw-mathord{+AFw-left/ {+AFw-vphantom {{{+AFw-text{A}}_{{+AFw-text{i}}} } {+AFw-text{B}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{B}}} +AFw-right)}{+AFw-text{ = 1}}$$
with \({+AFw-text{p}}{+AFw-left( {{+AFw-text{A}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{A}} {+AFw-text{B}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{B}}} +AFw-right)}{+AFw-text{ p}}{+AFw-left( {{+AFw-text{non - }}{+AFw-text{A}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{A}} {+AFw-text{B}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-text{B}}} +AFw-right)}{+AFw-text{ = 1}}\) as the most simple example. Multiplying each side of this identity with p(B) results in ƩI p(AI/B) p(B)=p(B) which, according to Eq. 2, is equivalent to
$${+AFw-text{p}}{+AFw-left( {+AFw-text{B}} +AFw-right)}{+AFw-text{ = }}+AFw-sum _{{+AFw-text{i}}} {+AFw-text{p}}{+AFw-left( {{+AFw-text{B}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{B}} {{+AFw-text{A}}_{{+AFw-text{i}}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {{+AFw-text{A}}_{{+AFw-text{i}}} }} +AFw-right)}{+AFw-text{p}}{+AFw-left( {{+AFw-text{A}}_{{+AFw-text{i}}} } +AFw-right)}$$
Equation 5 expresses p(B) as the weighted mean of the conditional probabilities p(B/AI). Introducing Eq. 5 in Eq. 2 – that is, in p(Ak/B) p(B)=p(B/Ak) p(Ak) – yields Bayes’ theorem,
$$\text{p}\left( {{{\text{A}_\text{k} } \mathord{\left/ {\vphantom {{\text{A}_\text{k} } \text{B}}} \right. \kern-\nulldelimiterspace} \text{B}}} \right)\text{ = }\frac{{\text{p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} {\text{A}_\text{k} }}} \right. \kern-\nulldelimiterspace} {\text{A}_\text{k} }}} \right)\text{p}\left( {\text{A}_\text{k} } \right)}}{{\sum\nolimits_\text{i} {\text{p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} {\text{A}_\text{i} }}} \right. \kern-\nulldelimiterspace} {\text{A}_\text{i} }}} \right)\text{p}\left( {\text{A}_\text{i} } \right)} }}$$
where k ɛ {1,...I,...n}. In the special case of Ak=A1=A and A2=non-A,
$$\text{p}\left( {{\text{A} \mathord{\left/ {\vphantom {\text{A} \text{B}}} \right. \kern-\nulldelimiterspace} \text{B}}} \right)\text{ = }\frac{{\text{p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} \text{A}}} \right. \kern-\nulldelimiterspace} \text{A}}} \right)\text{p}\left( \text{A} \right)}}{{\text{p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} \text{A}}} \right. \kern-\nulldelimiterspace} \text{A}}} \right)\text{p}\left( \text{A} \right)\text{ + p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} {\text{non - A}}}} \right. \kern-\nulldelimiterspace} {\text{non - A}}}} \right)\text{p}\left( {\text{non - A}} \right)}}$$
Dividing Eq. 7 by its version for p(non-A/B) yields
$$\frac{{\text{p}\left( {{\text{A} \mathord{\left/ {\vphantom {\text{A} \text{B}}} \right. \kern-\nulldelimiterspace} \text{B}}} \right)}}{{\text{1 - p}\left( {{\text{A} \mathord{\left/ {\vphantom {\text{A} \text{B}}} \right. \kern-\nulldelimiterspace} \text{B}}} \right)}}\text{ = }\frac{{\text{1 - p}\left( {\text{non - }{\text{A} \mathord{\left/ {\vphantom {\text{A} \text{B}}} \right. \kern-\nulldelimiterspace} \text{B}}} \right)}}{{\text{p}\left( {\text{non - }{\text{A} \mathord{\left/ {\vphantom {\text{A} \text{B}}} \right. \kern-\nulldelimiterspace} \text{B}}} \right)}}\text{ = }\frac{{\text{p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} \text{A}}} \right. \kern-\nulldelimiterspace} \text{A}}} \right)}}{{\text{p}\left( {{\text{B} \mathord{\left/ {\vphantom {\text{B} {\text{non - A}}}} \right. \kern-\nulldelimiterspace} {\text{non - A}}}} \right)}}\text{ }\frac{{\text{p}\left( \text{A} \right)}}{{\text{1 - p}\left( \text{A} \right)}}$$

Carrier risk estimation

Assume you are consulted by a woman who had a male child that died of a lethal X-chromosomal recessive disorder. She asks for an assessment of the risk that her next son will also be affected. If mutation rates are equal in male and female X-chromosomes, about two thirds of the patients with such a disorder have heterozygous carriers as mothers, while one third have de novo mutations since non-reproduction of affected males means that one third of all mutated genes are lost and have to be replaced in each generation [5]. The woman’s risk p(C) of being a carrier (“C”) may thus be given as 66%. Now, she reveals that she already has a healthy son (Fig. 2a). Does that change her risk of being a carrier? It does. For calculation of the probability p(C/H2) – which is the conditional probability that the woman is a carrier if her second son is healthy (“H2”) – Bayes’ theorem is applied according to Eq.–7 using the prior probability \({+AFw-text{p}}{+AFw-left( {+AFw-text{C}} +AFw-right)}{+AFw-text{ = 0}}{+AFw-text{.66 = 1 - p}}{+AFw-left( {{+AFw-text{non - C}}} +AFw-right)}\), the conditional probability p(H2/C)=0.5 that a son is unaffected if the mother is a carrier, and the conditional probability p(H2/non-C)≈1that a son is unaffected if the mother is not a carrier.
$$ {\text{p}}{\left( {{\text{C}} \mathord{\left/ {\vphantom {{\text{C}} {{\text{H}}_{2} }}} \right. \kern-\nulldelimiterspace} {{\text{H}}_{2} }} \right)}{\text{ }} = {\text{ }}\frac{{{\text{p}}{\left( {{{\text{H}}_{2} } \mathord{\left/ {\vphantom {{{\text{H}}_{2} } {\text{C}}}} \right. \kern-\nulldelimiterspace} {\text{C}}} \right)}{\text{p}}{\left( {\text{C}} \right)}}} {{{\text{p}}{\left( {{{\text{H}}_{2} } \mathord{\left/ {\vphantom {{{\text{H}}_{2} } {\text{C}}}} \right. \kern-\nulldelimiterspace} {\text{C}}} \right)}{\text{p}}{\left( {\text{C}} \right)} + {\text{p}}{\left( {{{\text{H}}_{2} } \mathord{\left/ {\vphantom {{{\text{H}}_{2} } {{\text{non - C}}}}} \right. \kern-\nulldelimiterspace} {{\text{non - C}}}} \right)}{\text{p}}{\left( {{\text{non - C}}} \right)}}} = {\text{ }}\frac{{0.5\;0.66}} {{0.5\cdot 0.66 + 1\,0.33}} = {\text{ }}50\%$$
Fig. 2

A woman may be carrier if her son died of an X-chromosomal recessive disorder. If she gives birth to unaffected sons (a, b) and has a negative test result (c), information becomes available that reduces her risk of being carrier. Bayes’ theorem integrates this information into the carrier risk estimation

Hence, the woman has a risk of ½×50%=25% that her next son will be affected. However, she gives birth to another healthy son H3 (Fig. 2b). Thereby, her probability of being a carrier further declines as can be calculated by a repeated application of Bayes’ theorem. The posterior risk p(C/H2)=0.5 calculated before now is the prior probability, while the probabilities p(H3/C) and p(H3/non-C) are 0.5 and 1, respectively, as before.
$$\text{p}\left( {{\text{C} \mathord{\left/ {\vphantom {\text{C} {\text{H}_\text{3} }}} \right. \kern-\nulldelimiterspace} {\text{H}_\text{3} }}} \right)\text{ = }\frac{{0.5\,0.5}}{{0.5\,0.5 + 1\,0.5}}\text{ = 33\% }$$

In this manner, pedigree information may be introduced stepwise into a Bayesian risk calculation. Eq. (8) gives an idea how such a multistep calculation can be handled efficiently.

It is necessary, however, to inform the reader about confounding factors such as germinal mosaicism [3], reduced fitness of carriers, and unequal mutation rates in oogenesis versus spermatogenesis [4, 17]. For example, in the case of the X-chromosomal urea cycle disorder ornithine transcarbamylase deficiency (OTCD), 23% of the heterozygous female patients and 80% (instead of 66%) of the male patients have de novo mutations [9]. These proportions result from the reduced reproductive fitness of heterozygotes and the predominance of mutations in spermatogenesis [15].

Information derived from pedigree analysis may be combined with genetic or non-genetic test results via Bayes’ theorem: each such test has limited sensitivity and specificity. The sensitivity of direct genetic testing, for instance, may not be higher than 80% [16] since not all pathogenic mutations are recognized. Biochemical tests are helpful in these cases in identifying potential carriers; for example, the creatine kinase test in carriers of Duchenne muscular dystrophy [11, 19] or the allopurinol test in OTCD carriers [6]. Let it be assumed that the woman described in Fig. 2c has a negative result in a test for carriership with sensitivity – i.e., probability of a positive result “T” if she is carrier, p(T/C)=90% – and specificity – i.e., probability that the test is negative if she is not a carrier, p(non-T/non-C)=95%. How does that change her risk? With the appropriate substitutions (i.e., “C” for “A” and “non-T” for “B”) EqI 7 may be applied again. The prior probability p(C) is 33%, as derived in Eq. 10. Then, with p(non-T/C)=1–p(T/C)=10%,
$$\text{p}\left( {{\text{C} \mathord{\left/ {\vphantom {\text{C} {\text{non - T}}}} \right. \kern-\nulldelimiterspace} {\text{non - T}}}} \right)\text{ = }\frac{{\text{p}\left( {{{\text{non - T}} \mathord{\left/ {\vphantom {{\text{non - T}} \text{C}}} \right. \kern-\nulldelimiterspace} \text{C}}} \right)\text{p}\left( \text{C} \right)}}{{\text{p}\left( {{{\text{non - T}} \mathord{\left/ {\vphantom {{\text{non - T}} \text{C}}} \right. \kern-\nulldelimiterspace} \text{C}}} \right)\text{p}\left( \text{C} \right)\text{ + p}\left( {{{\text{non - T}} \mathord{\left/ {\vphantom {{\text{non - T}} {\text{non - C}}}} \right. \kern-\nulldelimiterspace} {\text{non - C}}}} \right)\text{p}\left( {\text{non - C}} \right)}}\text{ = }\frac{\text{1}}{{\text{20}}}$$

The woman’s risk of being a carrier is considerably reduced. However, there is a remaining risk of ½0=5% due to fact that the sensitivity of the test is not 100%. The consequences of submaximal sensitivity of the allopurinol test as applied to possible OTCD carriers has been analyzed by Oexle et al. [12]. Carrier risk assessment in Duchenne muscular dystrophy using creatine kinase activity data quantitatively (instead of merely the alternative of positive or negative result) and incorporating repeated measurements has been established by Percy et al. [13].

Dominant trait with unknown penetrance

Consider a pedigree as shown in Fig. 3 with an autosomal dominant trait of unknown penetrance. What is the maximal risk p(DIII) of III:1 to be affected by the disorder “D” of the grandmother? II:1 may or may not carry the mutated gene. If II:1 is a carrier, III:1 may inherit the gene with a probability of 0.5, but even then III:1 may be un-affected since the penetrance of the disorder is incomplete. Assume that f indicates the degree of penetrance, i.e., the fraction of carriers that are affected (0≤f≤1). Then, p(DIII) is given as,
$${+AFw-text{p}}{+AFw-left( {{+AFw-text{D}}_{{{+AFw-text{III}}}} } +AFw-right)}{+AFw-text{ = }}f+AFw;0.5+AFw;{+AFw-text{p}}{+AFw-left( {{{+AFw-text{C}}_{{{+AFw-text{II}}}} } +AFw-mathord{+AFw-left/ {+AFw-vphantom {{{+AFw-text{C}}_{{{+AFw-text{II}}}} } {{+AFw-text{non - D}}_{{{+AFw-text{II}}}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {{+AFw-text{non - D}}_{{{+AFw-text{II}}}} }} +AFw-right)}$$
where p©II/non-DII) is the probability of II:1 to be a carrier despite her normal phenotype. This probability can be calculated in Bayesian manner using Eq. 7,
$$\text{p}\left( {{{\text{C}_{\text{II}} } \mathord{\left/ {\vphantom {{\text{C}_{\text{II}} } {\text{non - D}_{\text{II}} }}} \right. \kern-\nulldelimiterspace} {\text{non - D}_{\text{II}} }}} \right)\text{ = }\frac{{\text{p}\left( {{{\text{non - D}_{\text{II}} } \mathord{\left/ {\vphantom {{\text{non - D}_{\text{II}} } {\text{C}_{\text{II}} }}} \right. \kern-\nulldelimiterspace} {\text{C}_{\text{II}} }}} \right)\text{p}\left( {\text{C}_{\text{II}} } \right)}}{{\text{p}\left( {{{\text{non - D}_{\text{II}} } \mathord{\left/ {\vphantom {{\text{non - D}_{\text{II}} } {\text{C}_{\text{II}} }}} \right. \kern-\nulldelimiterspace} {\text{C}_{\text{II}} }}} \right)\text{p(C}_{\text{II}} \text{) + p}\left( {{{\text{non - D}_{\text{II}} } \mathord{\left/ {\vphantom {{\text{non - D}_{\text{II}} } {\text{non - C}_{\text{II}} }}} \right. \kern-\nulldelimiterspace} {\text{non - C}_{\text{II}} }}} \right)\text{p}\left( {\text{non - C}_{\text{II}} } \right)}}$$
Fig. 3

An autosomal-dominant disorder with unknown penetrance f may not be present in each generation. If I:1 is affected, what is the maximal risk of III:1 to be affected if II:1 is not affected? The answer is found by calculating the carrier risk of II:1 as a function of f

The prior probability (prior to any information on her phenotype) of II:1 to be a carrier is 50%, i.e., p©II)=0.5=p(non-CII). Non-carriers are not affected, that is, p(non-DII/non-CII)=1. Carriers may be unaffected due to incomplete penetrance, p(non-DII/CII) = 1−f. Hence, Eqs. 12 and 13 yield
$$\text{p}\left( {\text{D}_{\text{III}} } \right)\text{ = 0}\text{.5 }f\text{ }\frac{{0.5\left( {1 - f} \right)}}{{0.5\left( {1 - f} \right) + 0.5}}\text{ = }\frac{{\left( {f - f^2 } \right)}}{{\left( {4 - 2f} \right)}}$$
The maximum of p(DIII) as a function of f in the interval of 0≤f≤1 is located at the zero of its derivative. With
$$\frac{{\text{dp}\left( {\text{D}_{III} } \right)}}{{\text{d}f}}\text{ = }\frac{1}{2} - \frac{1}{{\left( {2 - f} \right)^2 }}\text{ = 0}$$
the maximum of p(DIII) is found at f=(2–2½)=0.59. Thus, with Eq. 14, p(DIII)max=0.086=8.6%, which is surprisingly low. The same maximal risk results from the tabular analysis presented by Young [20].

Linkage analysis

Linkage analysis, which has been of tremendous success in recent medical research, is also a method that relates to Bayes’ theorem. The following section will supply the interested reader with a satisfying understanding of two-point linkage analysis.

Two loci of the genome (e.g., a genetic marker and an unknown gene) may or may not be linked. If they are linked, they have a tendency to cosegregate. Since the haploid human genome is distributed on 23 different chromosomes, the prior probability of linkage p(L) was assumed to be 5% initially; i.e., roughly equal to the probability that two randomly selected loci reside on the same chromosome [10]. However, due to intervening recombinations, the linkage of two loci on the same chromosome is lost if they are far apart. Calculations based on genome length and the distance between loci over which one could detect linkage, determined a prior probability of p(L)=2% (=0.02) for linkage between a given locus of interest and some randomly chosen genome location [2].

Analysis of a pedigree constellation (“K”) with respect to the segregation of a gene’s phenotype and of a genetic marker may indicate linkage (“L”) between the gene and the marker. The claim of linkage should have a standard significance of 0.05 – i.e., the probability p(non-L/K) that the claim is false-positive should be 5% at most. This probability may be treated in Bayesian manner according to Eq 8.
$$\frac{{1 - \text{p}\left( {\text{non} - {\text{L} \mathord{\left/ {\vphantom {\text{L} \text{K}}} \right. \kern-\nulldelimiterspace} \text{K}}} \right)}}{{\text{p}\left( {\text{non} - {\text{L} \mathord{\left/ {\vphantom {\text{L} \text{K}}} \right. \kern-\nulldelimiterspace} \text{K}}} \right)}}\text{ = }\frac{{0.95}}{{0.05}}\text{ = }\frac{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} \text{L}}} \right. \kern-\nulldelimiterspace} \text{L}}} \right)}}{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)}}\text{ }\frac{{\text{p}\left( \text{L} \right)}}{{1 - \text{p}\left( \text{L} \right)}}$$
The term p(K/L)/p(K/non-L) is the “odds-ratio”. With p(L)=0.02 and the use of logarithms [remember that log(a · b)=log(a)+log(b)], Eq. 16 shows that the logarithm of odds (“LOD”, [1, 10]),
$$\text{log}_{\text{10}} \left( {\frac{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} \text{L}}} \right. \kern-\nulldelimiterspace} \text{L}}} \right)}}{{p\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)}}} \right)\text{ = log}_{\text{10}} \left( {\frac{{0.95}}{{0.05}}} \right)\text{ - log}_{\text{10}} \left( {\frac{{0.02}}{{0.98}}} \right) \approx \text{3}$$
should be 3 in order to achieve a significance level of 0.05 in two-point linkage analysis.
Now assume that the analysis addresses a pedigree with a fully penetrant autosomal dominant disorder. If the gene of the disorder and the genetic marker used in the analysis are identical, there is no alternative to the observed constellation K, and p(K/L)=1. In the absence of linkage, on the other hand, K may occur fortuitously, being one of several possible constellations which could have resulted randomly from the germ cell formations (meioses) that generated the pedigree. Each of these meioses had a probability of 50% to be compatible with K. If the pedigree contains n meioses, constellation K may have occurred fortuitously with probability
$${+AFw-text{p}}{+AFw-left( {{+AFw-text{K}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{K}} {{+AFw-text{non - L}}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {{+AFw-text{non - L}}}} +AFw-right)} = {+AFw-left( {0.5} +AFw-right)}^{n} $$
Hence, the LOD is
$$\text{log}_{\text{10}} \left( {\frac{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} \text{L}}} \right. \kern-\nulldelimiterspace} \text{L}}} \right)}}{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)}}} \right)\text{ = log}_{\text{10}} \left( {\frac{1}{{0.5^n }}} \right)\text{ = }n\text{ log}_{\text{10}} \left( \text{2} \right)$$

Therefore, with Eq. 17, at least \(n = 3 +AFw-mathord{+AFw-left/ {+AFw-vphantom {3 {+AFw-log _{{10}} {+AFw-left( 2 +AFw-right)}}}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {+AFw-log _{{10}} {+AFw-left( 2 +AFw-right)}}\) meioses have to be examined if a LOD of 3 and a significance level of 0.05 are to be reached in the linkage analysis. If recombinations between genetic marker and gene of the disorder are possible, then p(K/L)<1, LOD<log10(1/(0.5)n), and more than n=10 meioses are required.

Additional information also is necessary to determine the “phase” relation between the mutated gene and the genetic marker. Consider the family depicted in Fig. 4. The mutated gene causing the autosomal dominant disorder in the mother may be either on the same chromosome as the genetic marker’s allele A1 (i.e., being in phase with A1) or on the chromosome of allele A2 (i.e., being in phase with A2). Thus, there are three mutually exclusive states, state LA1 (i.e., mutated gene linked in phase with A1), state LA2 (i.e., mutated gene linked in phase with A2), and state non-L (i.e., no linkage), with the prior probabilities p(LA1)=p(LA2)=p(L)/2, and p(non-L)=1–p(L), respectively. In this case of three jointly exhaustive and mutually exclusive alternatives, the general version of Bayes’ theorem (Eq. 6) has to be applied to calculate the error probability p(non-L/K), i.e., the significance level,
$$\text{p}\left( {\text{non - }{\text{L} \mathord{\left/ {\vphantom {\text{L} \text{K}}} \right. \kern-\nulldelimiterspace} \text{K}}} \right)\text{ = }\frac{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)\text{p}\left( {\text{non} - \text{L}} \right)}}{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)\text{p}\left( {\text{non} - \text{L}} \right) + p\left( {{K \mathord{\left/ {\vphantom {K {\text{L}_{\text{A}1} }}} \right. \kern-\nulldelimiterspace} {\text{L}_{\text{A}1} }}} \right)\text{p}\left( {\text{L}_{\text{A}1} } \right) + \text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{L}_{\text{A}2} }}} \right. \kern-\nulldelimiterspace} {\text{L}_{\text{A}2} }}} \right)\text{p}\left( {\text{L}_{\text{A}2} } \right)}}$$
Fig. 4

Linkage analysis of a fully penetrant autosomal-dominant trait in a pedigree of n=6 meioses. If the trait’s gene is linked to the polymorphic genetic marker A, either the youngest daughter or the five older siblings are recombinant depending on whether the gene is in phase with the marker’s allele A1 or with A2, respectively. The logarithm of the odds ratio (LOD) in favor of linkage is shown as function of the recombination fraction θ. The latter is a measure of the unknown distance between gene and marker. The LOD (a) is maximal at θ=0.168. If for some reason the gene was known to be in phase with A1, the LOD (b) would be higher and have its maximum exactly at θ=r/n=1/6. If there was no recombinant (r=0), the LOD would be maximal at θ=0 with values of 1.5 (c, unknown phase) and 1.8 (d, known phase)

Using p(LA1)=p(LA2)=p(L)/2 and applying the logarithms, Eq. 20 yields the LOD
$$\text{log}_{\text{10}} \left( {\frac{{\raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{L}_{A1} }}} \right. \kern-\nulldelimiterspace} {\text{L}_{A1} }}} \right)}}{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)}}\text{ + }\frac{{\raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{L}_{A2} }}} \right. \kern-\nulldelimiterspace} {\text{L}_{A2} }}} \right)}}{{\text{p}\left( {{\text{K} \mathord{\left/ {\vphantom {\text{K} {\text{non} - \text{L}}}} \right. \kern-\nulldelimiterspace} {\text{non} - \text{L}}}} \right)}}} \right)\text{ = log}_{\text{10}} \left( {\frac{{\text{p}\left( {{\text{L} \mathord{\left/ {\vphantom {\text{L} \text{K}}} \right. \kern-\nulldelimiterspace} \text{K}}} \right)}}{{1 - \text{p}\left( {{\text{L} \mathord{\left/ {\vphantom {\text{L} \text{K}}} \right. \kern-\nulldelimiterspace} \text{K}}} \right)}}} \right)\text{ - log}_{\text{10}} \left( {\frac{{\text{p}\left( \text{L} \right)}}{{1 - \text{p}\left( \text{L} \right)}}} \right)$$
for unknown phase. Again, as in Eq. 17, LOD≈3, if p(L)=0.02 and p(L/K)=0.95. Meiotic recombinations may occur between the gene and the genetic marker (Fig. 4). This is indicated quantitatively by the recombination fraction θ, which is the probability that the mutated gene is disconnected from the marker allele. Generally, θ increases with the chromosomal distance between the gene and its marker up to the level of 0.5 where gene and marker segregate randomly as if they were located on different chromosomes. In a pedigree comprising n meioses, there may be r recombinant and (n-r) non-recombinant meioses with respect to marker allele A1. If the mutated gene was in phase with allele A1 (state LA1), the specific pedigree constellation K has occurred with probability
$${+AFw-text{p}}{+AFw-left( {{+AFw-text{K}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{K}} {{+AFw-text{L}}_{{{+AFw-text{A1}}}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {{+AFw-text{L}}_{{{+AFw-text{A1}}}} }} +AFw-right)} = {+AFw-left( {1 - +AFw-theta } +AFw-right)}^{{{+AFw-left( {n - r} +AFw-right)}}} +AFw-theta ^{r} $$
which is the product of the probabilities of the n independent meiotic events. In the opposite state LA2, (nr) meioses were recombinant and r were non-recombinant,
$${+AFw-text{p}}{+AFw-left( {{+AFw-text{K}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{+AFw-text{K}} {{+AFw-text{L}}_{{{+AFw-text{A2}}}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {{+AFw-text{L}}_{{{+AFw-text{A2}}}} }} +AFw-right)} = +AFw-theta ^{{{+AFw-left( {n - r} +AFw-right)}}} {+AFw-left( {1 - +AFw-theta } +AFw-right)}r$$

The probability that constellation K has occurred in the absence of linkage is p(K/non-L)=(0.5)n as shown above (compare Eq. 18).

Zeroing the derivative of the LOD as a function of θ (compare Eqs. 2123) – that is, dLOD/dθ=0 – yields the recombination fraction θLODmax where LOD is maximal. Via \(d{+AFw-left( {p{+AFw-left( {K +AFw-mathord{+AFw-left/ {+AFw-vphantom {K {L_{{A1}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {L_{{A1}} }} +AFw-right)}} +AFw-right)} {p{+AFw-left( {K +AFw-mathord{+AFw-left/ {+AFw-vphantom {K {L_{{A2}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {L_{{A2}} }} +AFw-right)}} +AFw-mathord{+AFw-left/ {+AFw-vphantom {{p{+AFw-left( {K +AFw-mathord{+AFw-left/ {+AFw-vphantom {K {L_{{A2}} }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {L_{{A2}} }} +AFw-right)}} {d+AFw-theta }}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} {d+AFw-theta } = 0\), one calculates
$$+AFw-theta _{{{+AFw-text{LODmax}}}} +AFw-approx r +AFw-mathord{+AFw-left/ {+AFw-vphantom {r n}} +AFw-right. +AFw-kern-+AFw-nulldelimiterspace} n$$
which is precise if the phase is known (Fig. 4).

Linkage information may be collected from two pedigrees if they have the same disorder. For that purpose, the posterior probability of linkage in the first pedigree is used as prior probability of linkage in the second pedigree. Consequently, as derived easily from Eq. 8, the odds ratios may be multiplied; that is, the LODs of two pedigrees may be added.

Linkage analysis is an elaborate field of genetic research. Of course, not all its aspects are presented here. For further reading see, for example, Freimer and Sabatti [2] or Strauch et al. [14].


Bayes’ theorem as well as the problems that it can solve reside just above common sense intuition [7, 18]. For example, the theorem is helpful when: (1) reconsidering the (“prior”) probability of a possible carrier in view of additional (i.e., “conditional”) information – e.g., after the birth of an unaffected child; (2) combining genetic or non-genetic data from pedigrees or tests; (3) requiring the inverse of a conditional probability – e.g., the probability that a woman is a carrier if she gives birth to an unaffected son instead of the probability that she will have an unaffected son if she is a carrier. When there is doubt with respect to the application of Bayes' theorem, a review of the examples of risk calculation and linkage analysis presented in this paper may provide the necessary elucidation.

“Risk calculation” brings the subject back to the adventure of the two young geneticists presented in the Introduction. Why should Anne revise her decision? Again, a Bayesian analysis provides the answer. Each door’s prior probability to lead to freedom is one third, p(F1)=p(F2)=p(F3)=1/3. The probability that Barbara shows a tiger behind the first door if this door leads to freedom is zero, p(T1/F1)=0. The probability that Barbara shows a tiger behind the first door if the second door leads to freedom is one, p(T1/F2)=1, since she is not allowed to show the tiger behind door 3 that Anne has chosen. The probability that Barbara shows a tiger behind the first door if the third door leads to freedom is one half, p(T1/F3)=1/2, since she may chose either the first or the second door. Bayes’ theorem as given in Eq. 6 indicates that the probability that door 2 leads to freedom if Barbara shows a tiger behind door 1 is
$$\text{p}\left( {{{\text{F}_\text{2} } \mathord{\left/ {\vphantom {{\text{F}_\text{2} } {\text{T}_\text{1} }}} \right. \kern-\nulldelimiterspace} {\text{T}_\text{1} }}} \right)\text{ = }\frac{{\text{p}\left( {{{\text{T}_1 } \mathord{\left/ {\vphantom {{\text{T}_1 } {\text{F}_2 }}} \right. \kern-\nulldelimiterspace} {\text{F}_2 }}} \right)\text{p}\left( {\text{F}_2 } \right)}}{{\text{p}\left( {{{\text{T}_1 } \mathord{\left/ {\vphantom {{\text{T}_1 } {\text{F}_1 }}} \right. \kern-\nulldelimiterspace} {\text{F}_1 }}} \right)p\left( {F_1 } \right) + \text{p}\left( {{{\text{T}_\text{1} } \mathord{\left/ {\vphantom {{\text{T}_\text{1} } {\text{F}_\text{2} }}} \right. \kern-\nulldelimiterspace} {\text{F}_\text{2} }}} \right)\text{p}\left( {\text{F}_2 } \right) + \text{p}\left( {{{\text{T}_1 } \mathord{\left/ {\vphantom {{\text{T}_1 } {\text{F}_3 }}} \right. \kern-\nulldelimiterspace} {\text{F}_3 }}} \right)p\left( {\text{F}_3 } \right)}}\text{ = }\frac{{{1 \mathord{\left/ {\vphantom {1 3}} \right. \kern-\nulldelimiterspace} 3}}}{{{{0 + 1} \mathord{\left/ {\vphantom {{0 + 1} {3 + {1 \mathord{\left/ {\vphantom {1 6}} \right. \kern-\nulldelimiterspace} 6}}}} \right. \kern-\nulldelimiterspace} {3 + {1 \mathord{\left/ {\vphantom {1 6}} \right. \kern-\nulldelimiterspace} 6}}}}}\text{ = }{\text{2} \mathord{\left/ {\vphantom {\text{2} \text{3}}} \right. \kern-\nulldelimiterspace} \text{3}}$$

Anne doubles her chance to reach freedom if she switches from the third to the second door after Barbara has shown her the tiger behind the first door. This can be made intuitively obvious if the three possible distributions of the two tigers are considered. In two cases Anne is safe if she turns away from the door that she has chosen at first. However, this revealing consideration is not to suggest that Bayesian dissection of conditional probabilities can be left aside. Next time – if there is a next time – Anne might run into the following version of a problem provided by Lewis Carrol [8]. A bag contains a pearl, known to be either black and precious or white and faked. A white faked pearl is put in the bad, the bag is shaken, and a pearl is drawn out which proves to be white. What is now the chance of having a precious black pearl in the bag?

Copyright information

© Springer-Verlag 2006