Abstract
In sugarcane or other autopolyploids, after generating the data, the first step in constructing molecular marker maps is to determine marker dosage. Improved methods for correctly allocating marker dosage will result in more accurate maps and increased efficiency of QTL linkage detection. When employing dominant markers like AFLPs, single-dose markers represent alleles present as one copy in one parent and null in the other parent, double-dose markers are those present as two copies in one parent and null in the other parent and so on. Observed segregation ratios in the offspring are employed to infer marker dosage in the parent from which the marker was inherited. Commonly, for each marker, a χ 2 test is used to assign dosage. Such an approach does not address important practical considerations such as multiple testing and departures from theoretical assumptions. In particular, extra-binomial variation or overdispersion has been observed in sugarcane studies and standard methods may result in fewer correct dosage allocations than the data warrant. To address these shortcomings, a Bayesian mixture model is proposed where all markers are considered simultaneously. Since analytic solutions are not available, Markov chain Monte Carlo methods are employed. Marker dosage allocation for each individual marker employs the estimated posterior probability of each dosage. For a sugarcane study these methods resulted in more markers being allocated a dosage than by standard approaches. Simulation studies demonstrated that, in general, not only are more markers classified but that more markers are also correctly classified, particularly if overdispersion is present.
This is a preview of subscription content, log in to check access.










References
Aitken K, Jackson P, McIntyre C (2005) A combination of aflp and ssr markers provides extensive map coverage and identification of homo(eo)logous linkage groups in a sugarcane cultivar. Theor Appl Genet 110:789–801
Aitken KS, Jackson PA, McIntyre CL (2007) Construction of a genetic linkage map for Saccharum officinarum incorporating both simplex and duplex markers to increase genome coverage. Genome 50:742–756
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control AC-19:713–716
Al-Janabi SM, Honeycutt RJ, McClelland M, Sobral BW (1993) A genetic linkage map of Saccharum spontaneum L. ‘SES 208’. Genetics 134:1249–1260
Altman DG, Bland JM (1983) Measurement in medicine: the analysis of method comparison studies. Statistician 32:307–317
Besag J, Green P, Higdon PJ, Mengersen K (1995) Bayesian computation and stochastic systems. Stat Sci 10:3–66 (with discussion)
Best N, Cowles MK, Vines K (1995) CODA convergence diagnosis and output software for Gibbs sampling output Version 0.30. MRC Biostat Unit, Cambridge
Bland J, Altman D (1995) Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet 346:1085–1087
Burner DM (1997) Chromosome transmission and meiotic behaviour in various sugarcane crosses. J Am Soc Sugar Cane Tech 17:38–50
Celeux G, Forbes F, Robert C, Titterington D (2006) Deviance information criteria for missing data models. Bayesian Anal 4:651–674
da Silva JAG (1993) A methodology for genome mapping of autopolyploids and its application to sugarcane Saccharum spontaneum. PhD thesis, Cornell University, Ithaca, NY
da Silva JAG, Sorrells ME, Burnquist WL, Tanksley ST (1993) RFLP linkage map and genome analysis of Saccharum spontaneum. Genome 36:782–791
da Silva J, Honeycutt RJ, Burnquist W, Al-Janabi SM, Sorrells ME, Tanksley SD, Sorbral BWS (1995) Saccharum spontaneum L. SES 208 genetic linkage map combining RFLP-and PCR-based markers. Mol Breed 1:165–179
De Winton D, Haldane JBS (1931) Linkage in the tetraploid. J Genet 24:121–144
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Chapman Hall, London
Geweke J (1992) Evaluating the accuracy of sampling based approaches to calculating posterior moments. In: Bernado JM, Berger JO, David AP, Smith AFM (eds) Bayesian statistics 4. Oxford University Press, Oxford, pp 169–194
Gilks W, Richardson S, Spiegelhalter D (1996) Markov chain Monte Carlo in practice. Chapman Hall, London
Grivet L, Arruda P (2002) Sugarcane genomics: depicting the complex genome of an important tropical crop. Curr Opin Plant Biol 5:122–127
Hackett CA (2001) A comment on Xie and Xu: ‘mapping quantitative trait loci in tetraploid species’. Genet Res 78:187–189
Hackett CA, Luo ZW (2003) TetraploidMap: construction of a linkage map in autotetraploid species. J Hered 94:358–359
Haldane JBS (1930) Theoretical genetics of autopolyploids. J Genet 22:359–372
Haley C, Knott S (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324
Heidelberger P, Welch P (1983) Simulation run length control in the presence of an initial transient. Oper Res 31:1109–1144
Janoo N, Grivet L, David J, D’Hont A, Glaszmann JC (2004) Differential chromosome pairing affinities at meiosis in polyploid sugarcane revealed by molecular markers. Heredity 93:460–467
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199
Luo ZW, Hackett CA, Bradshaw JE, McNicol JW, Milbourne D (2001) Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157:1369–1385
Luo ZW, Zhang RM, Kearsey MJ (2004) Theoretical basis for genetic linkage analysis in autotetraploid species. PNAS 101:7040–7045
Mao CX (2007) Estimating population sizes for capture–recapture sampling with binomial mixtures. Comput Stat Data Anal 51:5211–5219
Mather K (1951) The measurement of linkage in heredity. Methuen, London
Mengersen KL, Robert CP (1996) Testing for mixtures: a Bayesian entropic approach. In: Bernando JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics 5. Oxford University Press, Oxford, pp 225–276
Meyer R, Milbourne D, Hackett C, Bradshaw J, McNichol J, Waugh R (1998) Linkage analysis in tetraploid potato and association of markers with quantitative resistance to late blight (Phytophthora infestans). Mol Gen Genet 259:150–160
Ming R, Liu S, Lin Y, Braga D, da Silva J, van Deynze, Wenslaff A, Wu K, Moore P, Burnquist W, Sorrells M, Irvine J, Paterson A (1998) Alignment of Sorghum and Saccharum chromosomes: comparative organization of closely-related diploid and polyploid genomes. Genetics 150:1663–1882
Mood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw–Hill, New York
Plummer M (2005) JAGS Version 0.90 manual. International Agency for Research on Cancer, Lyon
Qu L, Hancock JF (2001) Detecting and mapping repulsion-phase linkage in polyploids with polysomic inheritance. Theor Appl Genet 103:136–143
Qu L, Hancock J (2002) Pitfalls of genetic analysis using a doubled-haploid backcrossed to its parent. Theor Appl Genet 105:392–396
Raftery AL, Lewis S (1992) How many iterations in the Gibbs sampler? In: Bernado JM, Berger JO, David AP, Smith AFM (eds) Bayesian statistics 4. Oxford University Press, Oxford, pp 763–774
Ripol MI, Churchill A, da Silva JA, Sorrells M (1999) Statistical aspects of genetic mapping in autopolyploids. Gene 235:31–41
Robert C (1996) Mixtures of distributions: inference and estimation. In: Gilks W, Richardson S, Spiegelhalter D (eds) Markov chain Monte Carlo in practice. Chapman Hall, London
Rufo MJ, Pérez CJ, Martìn J (2007) Bayesian analysis of finite mixtures of multinomial and negative-multinomial distributions. Comput Stat Data Anal 51:5452–5466
Skellam JG (1948) A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. J R Stat Soc Ser B 10:257–261
Smith AFM, Roberts GO (1993) Bayesian computation via the Gibbs sampler and related Markov Monte Carlo Methods. J R Stat Soc Series B 55:3–23
Soltis PS, Soltis DE (2000) The role of genetic and genomic attributes in the success of polyploids. PNAS 97:7051–7057
Spiegelhalter D, Thomas A, Best N, Gilks W (1995) BUGS. Bayesian inference Using Gibbs Sampling, Version 0.50. MRC Biostatistics Unit, Cambridge
Spiegelhalter DJ, Best N, Carlin B, van der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B 64:583–639
Stebbins GL (1950) Variation and evolution in plants. Columbia University Press, New York
Stephens M (2000) Bayesian analysis of mixtures with an unknown number of components—an alternative to reversible jump methods. Ann Stat 28:40–74
Sybenga J (1994) Preferential pairing estimates from multivalent frequencies in tetraploids. Genome 37:1045–1055
Sybenga J (1995) Meiotic pairing in autohexaploid Lathyrus: a mathematical model. Heredity 75:343–350
Sybenga J (1996) Chromosome pairing affinity and quadrivalent formation in polyploids: do segmental allopolyploids exist? Genome 39:1176–1184
Tanner MA (1993) Tools for statistical inference, 2nd edn. Springer, New York
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation: with discussion. J Am Stat Assoc 82:528–550
R Development Core Team (2007) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
Tweedie RL, Mengersen K (1996) Rates of convergence of the Hastings and Metropolis algorithms. Ann Stat 24:101–121
Ukoskit K, Thompson PG (1997) Autopolyploidy versus allopolyploidy and low-density randomly amplified polymorphic DNA linkage maps of sweetpotato. J Am Soc Hortic Sci 122:822–828
Wu KK, Burnquist W, Sorrells ME, Tew TL, Moore PH (1992) The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet 83:294–300
Wu R, Gallo-Meagher M, Littell RC, Zeng ZB (2001a) A general polyploid model for analyzing gene segregation in outcrossing tetraploid species. Genetics 159:869–882
Wu SS, Wu R, Ma CX, Zeng ZB, Yang MC, Casella G (2001b) A multivalent pairing model of linkage analysis in autotetraploids. Genetics 159:1339–1350
Wu R, Ma CX, Casella G (2002) A bivalent model for linkage analysis in outcrossing tetraploids. Theor Popul Biol 62:129–151
Xie CG, Xu SH (2000) Mapping quantitative trait loci in tetraploid populations. Genet Res 76:105–115
Acknowledgments
We thank Jingchuan Li for assistance with AFLP marker data production and Ross Darnell and Kerrie Mengersen for useful statistical discussions. This work was partially funded by the Cooperative Research Centre for Sugarcane Industry Innovation through Biotechnology.
Author information
Affiliations
Corresponding author
Additional information
Communicated by J. Bradshaw.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix 1: Informative prior specification for τ k
Conjugate prior distributions are employed for the means μ k and precisions \( {{\uptau}}_{\text{k}} ,k = 1, \ldots, K. \) A method to determine the hyperparameters as \( {\text{logit}}\left( {P_{k} } \right) \) and T k for the prior distribution of the mean μ k are outlined in “Priors”. Prior distributions for the means μ k in autooctoploids are provided in Table 4.
Employing a similar approach for the mean of \( {{\uptau}}_{k} \sim {\text{Gamma}}\left( {A_{k} ,B_{k} } \right), \) we can calculate the logit transformed 2.5 and 97.5 percentiles of the theoretical binomial distribution with parameters in Eq. 1 to obtain the expected value \( {{\hat{\tau}}}_{\text{k}} \) of \( {{\uptau}}_{\text{k}} \). Denoting the percentiles as q 0.025 and q 0.975 on the untransformed scale then for a 95% confidence region on the logit scale
and so expected value \( {{\tilde{\tau}}}_{k} = {1 \mathord{\left/{\vphantom {1 {\tilde{s}_{k}^{2}}}} \right. \kern-\nulldelimiterspace} {\tilde{s}_{k}^{2}}}, \), where \( \tilde{s}_{k} \) is defined in Eq. 12 and so may be obtained directly from the percentiles q 0.025 and q 0.075 of the binomial distribution with size equal to the number of individuals N k and probability P k .
The conjugate prior distribution for the precision τ k is a Gamma(A k ,B k ) which has mean \( {{A_{k} } \mathord{\left/ {\vphantom {{A_{k} } {B_{k} }}} \right. \kern-\nulldelimiterspace} {B_{k} }} \) and variance \( {{A_{k} } \mathord{\left/ {\vphantom {{A_{k} } {B_{k}^{2} }}} \right. \kern-\nulldelimiterspace} {B_{k}^{2} }} \). Ideally, the mean of the prior distribution would be specified as \( {{\hat{\tau}}}_{\text{k}} \) but the variance may be specified by several methods. Those considered here involve setting an interval around either \( {{\tilde{\tau}}}_{\text{k}} \) or \( \tilde{s}_{k} \).
First, if the prior distribution is specified to have a 95% confidence region around τ k as \( {{\uptau}}_{k} \left( {1 \pm x} \right) \), then the interval has length \( 2x{{\uptau}}_{k} \) which is 4 SD(τ k ). A k and B k are obtained by equating the mean \( A_{k}/B_{k}\) and variance \( {{A_{k} } \mathord{\left/ {\vphantom {{A_{k} } {B_{k}^{2} }}} \right. \kern-\nulldelimiterspace} {B_{k}^{2} }} \) to their observed values \( {{\hat{\tau}}}_{\text{k}} \) and \( 2x{{\hat{\tau}}} \), respectively.
In a similar fashion, if the 95% confidence region around s k is set to \( s_{k} \left( {1 \pm x} \right) \) then the SD(τ k ) is a quarter of the interval on the on the τ k scale. This is simply
and once the observed and expected means and variances are equated then \( A_{k} = C.{{\hat{\tau}}}_{k}^{4} \)and \( B_{k} = C{{\hat{\tau}}}_{k}^{3} \) where \( C = x^{ 2} /\left( { 1+ x} \right)\left( { 1- x} \right). \) This produces a narrower prior distribution than specifying limits around τ k .
Appendix 2: Generating markers with overdispersion
For simulation studies, markers with specified dosage may be generated from a Binomial (n,p) distribution where p is the appropriate segregation proportion in Eq. 1. When overdispersion is present, a beta-binomial distribution may be used with p ~ beta (α,β), where α and β are the first and second shape parameter, respectively (Skellams 1948). If the theoretical segregation ratio P jk in Eq. 1 is equated to the expected value E(p) = α/(α + β) then simply setting the first shape parameter α fixes the value of \( \beta = \alpha \left( {1 - P_{jk} } \right)/P_{jk} \). Note that larger values of α correspond to smaller values of Var \( \left( p \right) = ab\left( {a + b} \right)^{2} \left( {a + b + 1} \right) \) which results in less overdispersion.
Appendix 3: Comparison of mixture model options
From Fig. 11 the model with more components performs slightly better in that, on average, the median percentage of correctly allocated markers was higher with more components and the misclassification rate was lower. In general, while results are better when more components are employed at higher ploidy levels, the range may appear to indicate that worse results may actually be obtained for particular data sets. Further investigation reveals that this only occurs for medium to severe overdispersion (see Fig. 12).
Box plots of the i percentage of markers with dosage correctly allocated and ii misclassified by models with three or four components where equal variances on the logit scale were assumed and strong prior information was incorporated. Three component models may avoid computational problems but could result in more markers being misclassified
Box plots of the percentage of misclassified markers for non to severely overdispersed data for models with three or four components. The model employed was that of equal variances on the logit scale with strong prior information incorporated. The range of results increases with increasing overdispersion which could result in worse classification for some data sets
While the percentage of correctly allocated markers decreases with increasing threshold (see Fig. 13), the trend becomes more pronounced with increasing overdispersion and ploidy levels. On the other hand, misclassification rates increase with smaller thresholds and increasing ploidy or overdispersion levels. While there is no clear optimal threshold value, it would seem that that a value of around 0.8 is a reasonable compromise and corresponds in some ways to the value of 0.2 which is commonly used in false discovery rate studies and commonly used as a reasonable power when designing experimental studies (Fig. 14).
Box plots of the percentage of markers with dosage correctly allocated by mixture models for a range of thresholds by four levels of overdispersion (None, Slight, Medium, Severe) and ploidy (4, 6, 8, 10). Dosage is allocated when the posterior probability exceeds the threshold. The models fitted were chosen to be those with the maximum number of components, equal variances on the logit scale were assumed and strong prior information incorporated. The percentage of correctly allocated markers tails off for thresholds larger than 0.8
Box plots of the percentage of misclassified markers for a range of thresholds with increasing overdispersion (None, Slight, Medium, Severe) varying ploidy levels (4, 6, 8, 10). Dosage is allocated when the posterior probability exceeds the threshold. When there is little or no overdispersion, very few markers are misclassified. However, for moderate to severe overdispersion the misclassification rate decreases as the threshold increases but is greater for higher ploidy levels
Rights and permissions
About this article
Cite this article
Baker, P., Jackson, P. & Aitken, K. Bayesian estimation of marker dosage in sugarcane and other autopolyploids. Theor Appl Genet 120, 1653–1672 (2010). https://doi.org/10.1007/s00122-010-1283-z
Received:
Accepted:
Published:
Issue Date:
Keywords
- Sugarcane
- Mixture Model
- Amplify Fragment Length Polymorphism
- Markov Chain Monte Carlo
- Segregation Ratio



