Skip to main content
Log in

Estimation of copy number in polyploid plants: the good, the bad, and the ugly

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Genetic studies in polyploid plants rely heavily on the collection of data from dominant marker loci. A dominant marker locus is a locus for which only the presence or absence of an observable (dominant) allele is recorded. Before these marker loci can be used for genetic exploration, the number of copies of a dominant allele carried by a parent (copy number) must be determined for each marker locus. Copy number in polyploids is estimated using a hypothesis testing procedure. The performance of this estimation procedure has never been evaluated. In this paper, I quantify whether the highly sought after single-copy markers can be accurately identified, if the performance of the estimation procedure improves with increasing sample size, and whether the estimation procedure is capable of accurately estimating the copy number of high copy markers. I found that the probability of incorrectly estimating copy number is quite low and that more data can actually reduce the accuracy of the estimation procedure when the testing assumptions are violated. Fortunately, when a significant result is obtained, it is almost always correct. The challenge often is in obtaining a significant result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aitken K, Jackson P, McIntyre L (2005) A combination of AFLP and SSR markers provides extensive map coverage and identification of homo(eo)logous linkage groups in a sugarcane cultivar. Theor Appl Genet 110:789–801

    Article  PubMed  CAS  Google Scholar 

  • Dasilva J, Honeycutt RJ, Burnquist W, Aljanabi SM, Sorrells ME, Tanksley SD, Sobral BWS (1995) Saccharum-Spontaneum L SES-208 genetic-linkage map combining RFLP-based and PCR-based markers. Mol Breeding 1:165–179

    Article  Google Scholar 

  • Doyle GG (1963) Preferential pairing in structural heterozygotes of Zea mays. Genetics 48:1011–1027

    PubMed  CAS  Google Scholar 

  • George AW, Thompson EA (2003) Discovering disease genes: Multipoint linkage analysis via a new Markov chain Monte Carlo approach. Stat Sci 18:515–531

    Article  Google Scholar 

  • Grivet L, DHont A, Roques D, Feldmann P, Lanaud C, Glaszmann JC (1996) RFLP mapping in cultivated sugarcane (Saccharum spp.): genome organization in a highly polyploid and aneuploid interspecific hybrid. Genetics 142:987–1000

    PubMed  CAS  Google Scholar 

  • Leitch IJ, Bennett MD (1997) Polyploidy in angiosperms. Trends Plant Sci 2:470–476

    Article  Google Scholar 

  • Missaoui AM, Paterson AH, Bouton JH (2005) Investigation of genomic organization in switchgrass (Panicum virgatum L.) using DNA markers. Theor Appl Genet 110:1372–1383

    Article  PubMed  CAS  Google Scholar 

  • Pfosser M, Amon A, Lelley T, Heberlebors E (1995) Evaluation of sensitivity of flow-cytometry in detecting aneuploidy in wheat using disomic and ditelosomic wheat-rye addition lines. Cytometry 21:387–393

    Article  PubMed  CAS  Google Scholar 

  • Rhoades MM (1952) Preferential segregation in maize. In: Cowen JW (ed) Heterosis. Iowa State College Press, Ames, pp 66–80

    Google Scholar 

  • Ripol MI, Churchill GA, da Silva JAG, Sorrells M (1999) Statistical aspects of genetic mapping in autopolyploids. Gene 235:31–41

    Article  PubMed  CAS  Google Scholar 

  • Sybenga J (1992) Cytogenetics in plant breeding. Springer, Berlin

    Google Scholar 

  • Sybenga J (1996) Chromosome pairing affinity and quadrivalent formation in polyploids: do segmental allopolyploids exist? Genome 39:1176–1184

    Article  PubMed  CAS  Google Scholar 

  • Wendel JF (2000) Genome evolution in polyploids. Plant Mol Biol 42:225–249

    Article  PubMed  CAS  Google Scholar 

  • Wu KK, Burnquist W, Sorrells ME, Tew TL, Moore PH, Tanksley SD (1992) The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet 83:294–300

    Article  Google Scholar 

  • Wu RL, Ma CX, Casella G (2002) A bivalent polyploid model for linkage analysis in outcrossing tetraploids. Theor Popul Biol 62:129–151

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew W. George.

Additional information

Communicated by B. Friebe.

Appendices

Appendix 1: Distribution of the segregation ratio for a dominant marker

Here, I derive the probability distribution for the segregation ratio (proportion of progeny carrying a dominant allele) in polyploids. I begin by constructing the joint probability of the observed marker data conditional on the amount of preferential pairing. I then simplify this joint distribution and use it to derive the probability distribution of the segregation ratio.

Suppose data on a dominant marker are collected from a full-sib family with n progeny. The marker phenotype is a dichotomous trait for which only the presence or absence of a dominant allele is observed. I denote a family’s marker data by \( {\mathbf{Y}} = (y_{m} ,y_{p} ,y_{1} ,y_{2} , \ldots ,y_{n} ) \) where y m and y p are the marker phenotypes for the maternal and paternal parent, respectively, y j is the marker phenotype for the jth progeny, and the marker phenotype is either 1 (presence) or 0 (absence). Dominant alleles are assumed to segregate from only one parent and I have, without loss of generality, chosen the paternal parent. The paternal parent carries ω copies of a dominant allele.

I begin my derivation of the probability distribution of the segregation ratio s by deriving the joint probability distribution of the observed data Y. I can formulate the joint probability of Y in terms of the genetic information (i.e. the unobserved marker genotypes) that is passed from parent to progeny. This is due to data observed on a family being determined by a family’s latent marker genotypes. Note that an individual’s latent marker genotype is also equivalent to the founder genes or identical by descent (ibd) genes that an individual inherits from its parents (George and Thompson 2003).

Suppose G is a vector of possible genotypes for a marker locus. Each element in G corresponds to an individual’s marker genotype. There will be many possible G that will be consistent with the observed marker data. Expressing the joint probability of the observed data in terms of the possible latent marker genotypes, we have \( {{\Pr}_{\ell ,\omega }} ({\mathbf{Y}}|{\mathbf{p}}) = \sum\nolimits_{{\mathbf{G}}} {{\Pr}_{\ell ,\omega } } ({\mathbf{Y}},{\mathbf{G}}|{\mathbf{p}}) \) where \( {\Pr}_{\ell ,\omega } ({\mathbf{Y}}|{\mathbf{p}}) \) is the joint probability distribution of the observed data given preferential pairing probabilities p and assuming ploidy level l and copy number ω, and \( \sum\nolimits_{{\mathbf{G}}} {} \) is the sum over all possible genotypic configurations. For notational convenience, I will no longer subscript a probability distribution by its ploidy level l. All subsequent probabilities are constructed under an assumed ploidy level. Using a property of conditional probabilities, we can write

$$ {\Pr}_{\omega } ({\mathbf{Y}}|{\mathbf{p}}) = \sum\limits_{{\mathbf{G}}} {{\Pr} ({\mathbf{Y}}|{\mathbf{G}}){\Pr}_{\omega } ({\mathbf{G}}|{\mathbf{p}})} $$
(3)

where \( {\Pr} ({\mathbf{Y}}|{\mathbf{G}}) \) is the joint probability of the observed marker data conditional on the unobserved marker genotypes, and \( {\Pr}_{\omega } ({\mathbf{G}}|{\mathbf{p}}) \) is the joint probability of the latent marker genotypes conditional on the preferential pairing probabilities p and copy number ω.

Equation 3 is still not in a manageable form, since the joint probabilities on the right hand side are difficult to compute. In preparation for constructing a more tractable form of the joint probability of Y, I note the following. First, as discussed previously, only those marker loci whose dominant alleles originate from a single parent (and I have arbitrarily chosen the paternal parent) need be considered for analysis. Consequently, the distribution of Y is independent of maternally inherited genetic information.

Second, a family member’s marker phenotype is independent of the other phenotypic data given the family member’s marker genotype. The joint conditional probability of Y given G can therefore be factored into the product of n + 1 marginal distributions

$$ {\Pr} ({\mathbf{Y}}|{\mathbf{G}}) = {\Pr} (y_{p} = 1|G_{p} )\prod\limits_{j = 1}^{n} {{\Pr} (y_{j} |G_{j} )} $$
(4)

where G p is the paternal parent’s latent marker genotype, and G j is the paternally inherited marker genotype for the jth progeny.

Third, I assume that the marker data does not contain genotyping errors. Hence, the marginal probabilities in Eq. 4 are either zero or one such that:

$$ {\Pr} ({\mathbf{Y}}|{\mathbf{G}}) = \left\{ {\begin{array}{*{20}c} 1 \hfill & {{\text{if }}y = 1{\text{ and }}G \in \{ G^{d} \} } \hfill \\ 1 \hfill & {{\text{if }}y = 0{\text{ and }}G \notin \{ G^{d} \} } \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$

where, for notational convenience, I have ignored the subscripts on Y and G, and \( \{ G^{d} \} \) is the set of possible genotypes of a progeny that contains at least one dominant allele.

Fourth, a progeny’s genotype is independent of its siblings’ genotypes if the parental genotypes are known. The joint probability distribution of G in Eq. 3 can therefore be factorised as

$$ {\Pr}_{\omega } ({\mathbf{G}}|{\mathbf{p}}) = {\Pr}_{\omega } (G_{p} |{\mathbf{p}})\prod\limits_{j = 1}^{n} {{\Pr}_{\omega } (G_{j} |G_{p} ,{\mathbf{p}})} $$
(5)

By substituting the factorised forms of Eqs. 4 and 5 into Eq. 3 and remembering that the data are observed without error, we can write

$$ {\Pr}_{\omega } ({\mathbf{Y}}|{\mathbf{p}}) = \sum\limits_{{G_{p}^{d} }} {{\Pr}_{\omega } (G_{p}^{d} |{\mathbf{p}})\left[ {{\Pr}_{\omega } (G_{p} \, = \,G_{p}^{d} |{\mathbf{p}})} \right]^{x} \left[ {{\Pr}_{\omega } (G_{p} \, \ne G_{p}^{d} \,|{\mathbf{p}})} \right]^{{(n - x)}} } $$
(6)

where \( \sum\nolimits_{{G_{p}^{d} }} {} \) is the summation over the set of paternal marker genotypes that carry at least one copy of a dominant allele, and \( x = \sum\nolimits_{j = 1}^{n} {y_{j} } \) is the number of progeny that exhibits a dominant allele. The three terms on the right hand side of Eq. 6 are univariate probabilities and easy to calculate. The first term is the probability of the paternal parent having marker genotype \( G_{p}^{d} \) given copy number ω and preferential pairing probability p. The second term is the probability that a progeny carries at least one copy of a dominant allele conditional on the paternal parent having at least one dominant allele, the copy number, and the preferential pairing probabilities. The third term is the probability that a progeny does not inherit any dominant alleles given that the paternal parent carries at least one copy of a dominant allele, the copy number, and the preferential pairing probabilities.

Before constructing the probability distribution of a segregation ratio, I note that segregation ratios are summary measures. For a marker locus with observed data Y, the associated segregation ratio is x/n and there are many different data sets Y that can result in the same segregation ratio. In fact, for a family with n progeny, there are \( \left( {\begin{array}{*{20}c} n \\ {ns} \\ \end{array} } \right) \) combinations of ns progeny exhibiting a dominant allele.

The probability distribution of the segregation ratio for dominant marker in polyploids is then

$$ {\Pr}_{\omega } (s|{\mathbf{p}}) = \left( {\begin{array}{*{20}c} n \\ {ns} \\ \end{array} } \right){\Pr}_{\omega } ({\mathbf{Y}}|{\mathbf{p}}) = \sum\limits_{j = 1}^{m} {w_{j} {\text{Bin}}(ns;n,\pi_{j} )} $$
(7)

where m is the number of mixture components, j is the index over the mixture components, \( w_{j} \propto {\Pr}_{\omega } (G_{p}^{d} |{\mathbf{p}}) \) is the jth mixture weight, Bin(·) is the binomial probability distribution of observing ns progeny exhibiting a dominant allele in a family with n progeny, and \( \pi_{j} = {\Pr} (G_{p} = G^{d} |G_{p}^{d} ,\omega ,{\mathbf{p}}) \) is the jth probability of a progeny exhibiting a dominant allele given that the paternal parent carries a dominant allele, the copy number, and the preferential pairing probabilities. For an example of how to form Eq. 7 for a double-copy allele in hexaploids, see “Appendix 2”.

Appendix 2: An example of constructing the distribution of the segregation ratio

In this example, the construction of the probability distribution for the segregation ratio of a double-copy allele in hexaploids is presented. I begin by assigning preferential pairing probabilities to the set of unique chromosome pairing configurations. Based on these preferential pairing probabilities, a probability is calculated for each possible parental gamete. I then construct the distribution of those gametes that result in a progeny exhibiting a dominant allele given the copy number. Finally, this distribution is used to calculate the mixture distribution parameters w j and π j . I assume that each chromosome has a single homologous partner and are labelled 1A, 2A, 3B, 4B, 5C, and 6C where chromosomes with the same subscript are homologous. I also assume that chromosomes pair during meiosis as bivalents and, without loss of generality, that dominant alleles originate from the paternal parent.

To begin, I construct the distribution of the unique chromosome pairing configurations (Table 3). There are 15 unique pairings and each configuration is assigned a probability that is based upon the number of associated homologous bivalents. I then construct a probability distribution for the possible gametes that may be inherited from a hexaploid father (Table 4). Each gamete may have originated from one of several pairing configurations. To assign a probability to a gamete, I take the sum of probabilities of the configurations from which the gamete may have originated. For example, from Table 3, the paternal gamete 1A, 2A, 3B may have originated from pairing configurations 8, 9, 11, 12, 14, and 15. Therefore, the gametic probability is the sum of the associated configuration probabilities such that the probability of 1A, 2A, 3B is p 0  + 2p 1.

Table 3 Unique chromosome pairing configurations associated with a hexaploid with chromosomes labelled 1A, 2A, 3B, 4B, 5C, and 6C where chromosomes with the same subscripts are homologous
Table 4 Tabular form for the calculation of the mixing weights w j and binomial probabilities π j in Eq. 2

There are several ways in which I could present the calculation of the mixture parameters w and π that appear in Eq. 2. I found a tabular form to be the easiest to demonstrate their calculation. In Table 4, each row corresponds to a different paternal gamete and each column corresponds to a pair of chromosomes carrying a dominant allele. A cell in the table contains a one (zero) if, given that the paternal chromosomes carrying the dominant allele, a progeny would (not) exhibit a dominant allele if they inherited the gamete. The binomial probabilities π j are then easily calculated from the ratio of the sum of gamete probabilities with non-zero cell entries to the normalizing constant 12a + 8b. The weights w j are obtained from the proportion of columns that result in the same π j . For example, there are three out of 15 columns in Table 3 that result in π1 so \( w_{1} = {\raise0.5ex\hbox{$\scriptstyle 3$} \kern-0.1em/\kern-0.15em \lower0.25ex\hbox{$\scriptstyle {15}$}} \) and there are 12 out of 15 columns that result in π2 so \( w_{2} = {\raise0.5ex\hbox{$\scriptstyle {12}$} \kern-0.1em/\kern-0.15em \lower0.25ex\hbox{$\scriptstyle {15}$}} \). Hence, the probability distribution of s for a double copy-number allele in hexaploids is

$$ {\Pr}_{\omega = 2} (s|{\mathbf{p}}) = \frac{3}{15}{\text{Bin}}(ns;n,\pi_{1} ) + \frac{12}{15}{\text{Bin}}(ns;n,\pi_{2} ) $$

where \( \pi_{1} = \tfrac{8a + 8b}{12a + 12b} \), \( \pi_{2} = \tfrac{10a + 6b}{12a + 6b}, \) and a and b are defined in Table 4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

George, A.W. Estimation of copy number in polyploid plants: the good, the bad, and the ugly. Theor Appl Genet 119, 483–496 (2009). https://doi.org/10.1007/s00122-009-1054-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-009-1054-x

Keywords

Navigation