Skip to main content

Developments in statistical analysis in quantitative genetics

Abstract

A remarkable research impetus has taken place in statistical genetics since the last World Conference. This has been stimulated by breakthroughs in molecular genetics, automated data-recording devices and computer-intensive statistical methods. The latter were revolutionized by the bootstrap and by Markov chain Monte Carlo (McMC). In this overview a number of specific areas are chosen to illustrate the enormous flexibility that McMC has provided for fitting models and exploring features of data that were previously inaccessible. The selected areas are inferences of the trajectories over time of genetic means and variances, models for the analysis of categorical and count data, the statistical genetics of a model postulating that environmental variance is partly under genetic control, and a short discussion of models that incorporate massive genetic marker information. We provide an overview of the application of McMC to study model fit, and finally, a discussion is presented on the development of efficient McMC updating schemes for non-standard models.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679

    Article  Google Scholar 

  2. Anderson DA, Aitkin M (1985) Variance component models with binary response: interviewer variability. J R Stat Soc B 47:203–210

    Google Scholar 

  3. Besag J (1994) Contribution to the discussion paper by Grenander and Miller. J R Stat Soc B 56:591–592

    Google Scholar 

  4. Blasco A, Piles M, Varona L (2003) A Bayesian analysis of the effect of selection for growth rate on growth curves in rabbits. Genet Sel Evol 35:21–41

    PubMed  Article  Google Scholar 

  5. Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26:211–252

    Google Scholar 

  6. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120

    PubMed  Article  CAS  Google Scholar 

  7. Chipman H, George E, McCulloch R (1998) Bayesian CART model search (with discussion). J Am Stat Assoc 93:935–960

    Article  Google Scholar 

  8. Christensen OF, Waagepetersen RP (2002) Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58:280–286

    PubMed  Article  Google Scholar 

  9. Curnow RN (1961) The estimation of repeatability and heritability from records subject to culling. Biometrics 17:553–566

    Article  Google Scholar 

  10. Damgaard LH, Korsgaard IR (2006a) A bivariate quantitative genetic model for a linear Gaussian trait and a survival trait. Genet Sel Evol 38:35–64

    Google Scholar 

  11. Damgaard LH, Korsgaard IR (2006b) A bivariate quantitative genetic model for a threshold trait and a survival trait. Genet Sel Evol 38:565–581

    PubMed  Article  CAS  Google Scholar 

  12. de Boer IJM, van Arendonk JAM (1992) Prediction of additive and dominance effects in selected or unselected populations with inbreeding. Theor Appl Genet 84:451–459

    Article  Google Scholar 

  13. Denison DGT, Mallik BK, Smith AFM (1998) Automatic Bayesian curve fitting. J R Stat Soc B 60:333–350

    Article  Google Scholar 

  14. Detilleux J, Leroy PL (2000) Application of a mixed normal mixture model to the estimation of mastitis-related parameters. J Dairy Sci 83:2341–2349

    PubMed  CAS  Article  Google Scholar 

  15. Ducrocq V, Casella G (1996) Bayesian analysis of mixed survival models. Genet Sel Evol 28:505–529

    Article  Google Scholar 

  16. Ducrocq V, Quaas RL, Pollak E, Casella G (1988) Length of productive life of dairy cows. II. Variance component estimation and sire evaluation. J Dairy Sci 71:3071–3079

    Google Scholar 

  17. Falconer DS (1965) The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet 29:51–76

    Article  Google Scholar 

  18. Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433

    Google Scholar 

  19. Foulley JL, Gianola D, Im S (1987) Genetic evaluation of traits distibuted as Poisson–binomial with reference to reproductive characters. Theor Appl Genet 73:870–877

    Article  Google Scholar 

  20. Gelman A, Meng XL, Stern H (1996) Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Stat Sin 6:733–807

    Google Scholar 

  21. Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis. Chapman and Hall

  22. George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 8: 881–889

    Article  Google Scholar 

  23. Geyer CJ (1992) Practical Markov chain Monte Carlo. Stat Sci 7:473–511

    Article  Google Scholar 

  24. Gianola D (1982) Theory and analysis of threshold characters. J Anim Sci 54:1079–1096

    Google Scholar 

  25. Gianola D, Fernando RL (1986) Bayesian methods in animal breeding theory. J Anim Sci 63:217–244

    Google Scholar 

  26. Gianola D, Foulley JL (1983) Sire evaluation for ordered categorical data with a threshold model. Genet Sel Evol 15:201–223

    Article  Google Scholar 

  27. Gianola D, Foulley JL, Fernando RL (1986) Prediction of breeding values when variances are not known. In: proceedings of the third world congress on genetics applied to livestock production, vol XII. University of Nebraska, Lincoln, pp 356–370

  28. Gianola D, Perez-Enciso M, Toro MA (2003) On marker-assisted prediction of genetic value: beyond the ridge. Genetics 157:1819–1829

    Google Scholar 

  29. Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173:1761–1776

    PubMed  Article  CAS  Google Scholar 

  30. Gustafson P, MacNab YC, Wen S (2004) On the value of derivative evaluations and random walk suppression in Markov chain Monte Carlo algorithms. Stat Comp 14:23–38

    Article  Google Scholar 

  31. Gutierrez JP, Nieto B, Piqueras P, Ibáñez N, Salgado C (2006) Genetic parameters for canalisation analysis of litter size and litter weight at birth in mice. Genet Sel Evol 38:445–462

    PubMed  Article  Google Scholar 

  32. Hartl DL, Jones EW (2005) Genetics. Analysis of Genes and Genomes. Jones and Bartlett Publishers, Sudbury, Massachusetts

    Google Scholar 

  33. Harville DA, Mee RW (1984) A mixed model procedure for analyzing ordered categorical data. Biometrics 40:393–408

    Article  Google Scholar 

  34. Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall

  35. Henderson CR (1950) Specific and general combining ability. In: Gowen JW (eds) Heterosis. Iowa State College Press, Ames, Iowa, pp 352–370

    Google Scholar 

  36. Henderson CR (1963) Selection index and expected selection advance. In: Hanson WD, Robinson HF (eds) Statistical genetics and plant breeding, National Academy of Sciences. National Research Council Publication No. 982, Washington, DC, pp 141–163

  37. Henderson CR (1973) Sire evaluation and genetic trends. In: proceedings of the animal breeding and genetics symposium in honor of Dr. J. L. Lush. American Society of Animal Science, Champaign, pp 10–41

  38. Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447

    PubMed  Article  CAS  Google Scholar 

  39. Henderson CR (1976) A simple method for the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32:69–83

    Article  Google Scholar 

  40. Hill WG, Zhang XS (2004) Effects on phenotypic variability of directional selection arising through genetic differences in residual variability. Genet Res 83:121–132

    PubMed  Article  CAS  Google Scholar 

  41. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417

    Article  Google Scholar 

  42. Ibáñez N, Varona L, Sorensen D, Noguera JL (2007) A study of heterogeneity of environmental variance for slaughter weight in pigs. Animal 2:19–26

    Google Scholar 

  43. Ibáñez N, Sorensen D, Waagepetersen R, Blasco A (2008) Selection for environmental variation: a statistical analysis and power calculations. Genetics (in press)

  44. Im S, Fernando R, Gianola D (1989) Likelihood inferences in animal breeding: a missing-data theory view point. Genet Sel Evol 21:399–414

    Google Scholar 

  45. Johnson NL, Kotz S (1969) Distributions in statistics: discrete distributions. Wiley, New York

  46. Kennedy BW (1990) The use of mixed model methods in the analysis of designed experiments. In: Gianola D, Hammond K (eds) Advances in statistical methods for genetic improvement of livestock. Springer-Verlag, New York, pp 77–97

  47. Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14

    Article  Google Scholar 

  48. Lee HKH (2004) Bayesian nonparametrics via neural networks. ASA-SIAM Series

  49. Lin DY, Zen D (2006) Likelihood-based inference on haplotype effects in genetic association studies. J Am Stat Assoc 101:89–104

    Article  CAS  Google Scholar 

  50. Mackay TFC, Lyman RF (2005) Drosophila bristles and the nature of quantitative genetic variation. Philos Trans R Soc B 360:1513–1527

    Article  CAS  Google Scholar 

  51. Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546

    Article  Google Scholar 

  52. Mäki-Tanila A, Kennedy BW (1986) Mixed model methodology under genetic models with a small number of additive and non-additive loci. In: Proceedings of the 3rd world congress on genetics applied to livestock production, vol 12. University of Nebraska, pp 443–447

  53. Martinez V, Bünger L, Hill WG (2000) Analysis of response to 20 generations of selection for body composition in mice: fit to infinitesimal model assumptions. Genet Sel Evol 32:3–21

    PubMed  Article  CAS  Google Scholar 

  54. McCulloch CE (1994) Maximum likelihood variance components estimation for binary data. J Am Stat Assoc 89:330–335

    Article  Google Scholar 

  55. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    PubMed  CAS  Google Scholar 

  56. Meyer K, Hill WG (1991) Mixed model analysis of a selection experiment for food intake in mice. Genet Res 57:71–81

    PubMed  CAS  Article  Google Scholar 

  57. Mulder HA, Bijma P, Hill WG (2007) Prediction of breeding values and selection responses with genetic heterogeneity of environmental variance. Genetics 175:1895–1910

    PubMed  Article  CAS  Google Scholar 

  58. Ochi Y, Prentice RL (1984) Likelihood inference in a correlated probit regression model. Biometrika 71:531–543

    Article  Google Scholar 

  59. Østergård J, Jensen J, Madsen P, Gianola D, Klemetsdal G, Heringstad B (2003) Detection of mastitis in dairy cattle by use of mixture models for repeated somatic cell scores: a Bayesian approach via Gibbs sampling. J Dairy Sci 86:3694–3703

    Google Scholar 

  60. Østergård J, Madsen P, Gianola D, Klemetsdal G, Jensen J, Heringstad B, Korsgaard IR (2005) A Bayesian threshold-normal mixture model for analysis of a continuous mastitis-related trait. J Dairy Sci 88:2652–2659

    Google Scholar 

  61. Pearson K (1904) Contributions to the mathematical theory of evolution. Philos Trans R Soc Lond A 185:71–110

    Article  Google Scholar 

  62. Raftery AE, Madigan D, Hoeting JA (1997) Model selection and accounting for model uncertainty in linear regression models. J Am Stat Assoc 92:179–191

    Article  Google Scholar 

  63. Roberts GO, Tweedie RL (1997) Exponential convergence of Langevin diffusions and their approximations. Bernoulli 2:314–363

    Google Scholar 

  64. Robertson A, Lerner IM (1949) The heritability of all-or-none traits: viability of poultry. Genetics 34:395–411

    Google Scholar 

  65. Ros M, Sorensen D, Waagepetersen R, Dupont-Nivet M, SanCristobal M, Bonnet J-C, Mallard J (2004) Evidence for genetic control of adult weight plasticity in the snail Helix aspersa. Genetics 168:2089–2097

    PubMed  Article  Google Scholar 

  66. Rowe SI, White S, Avendano S, Hill WG (2006) Genetic heterogeneity of residual variance in broiler chickens. Genet Sel Evol 38:617–635

    PubMed  Article  CAS  Google Scholar 

  67. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  Google Scholar 

  68. Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 12:1151–1172

    Article  Google Scholar 

  69. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press

  70. San Cristobal-Gaudy M, Elsen JM, Bodin L, Chevalet C (1998) Prediction of the response to a selection for canalisation of a continuous trait in animal breeding. Genet Sel Evol 30:423–451

    Article  Google Scholar 

  71. Sheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644

    Article  Google Scholar 

  72. Sorensen D, Gianola D (2002) Likelihood, Bayesian, and Markov chain Monte Carlo methods in quantitative genetics. Springer-Verlag, New York

  73. Sorensen D, Waagepetersen R (2003) Normal linear models with genetically structured residual variance heterogeneity: a case study. Genet Res 82:207–222

    PubMed  Article  Google Scholar 

  74. Sorensen D, Andersen S, Gianola D, Korsgaard IR (1995) Bayesian inference in threshold models using Gibbs sampling. Genet Sel Evol 27:229–249

    Article  Google Scholar 

  75. Sorensen D, Fernando RL, Gianola D (2001) Inferring the trajectory of genetic variance in the course of artificial selection. Genet Res 77:83–94

    PubMed  Article  CAS  Google Scholar 

  76. Sorensen D, Guldbrandtsen B, Jensen J (2003) On the need for a control line in selection experiments: a likelihood analysis. Genet Sel Evol 35:3–20

    PubMed  Article  Google Scholar 

  77. Sorensen D, Vernersen A, Andersen S (2000) Bayesian analysis of response to selection: a case study using litter size in Danish Yorkshire pigs. Genetics 156:283–295

    PubMed  CAS  Google Scholar 

  78. Sorensen D, Wang CS, Jensen J, Gianola D (1994) Bayesian analysis of genetic change due to selection using Gibbs sampling. Genet Sel Evol 26:333–360

    Article  Google Scholar 

  79. Tempelman RJ, Gianola D (1996) A mixed effects model for overdispersed count data in animal breeding. Biometrics 52:265–279

    Article  Google Scholar 

  80. Thompson R (1973) The estimation of variance and covariance components with an application when records are subject to culling. Biometrics 29:527–550

    Article  Google Scholar 

  81. Thompson R (1976) Estimation of quantitative genetic parameters. In: Pollak E, Kempthorne O, Bailey TB (eds) In: proceedings of the international conference on quantitative genetics. Iowa State University, pp 639–657

  82. Thompson R (1986) Estimation of realized heritability in a selected population using mixed-model methods. Genet Sel Evol 18:475–483

    Article  Google Scholar 

  83. Varona L, Sorensen D (2008) Genetic analysis of mortality in pigs using zero-inflated models. (in Preparation)

  84. Waagepetersen R, Ibáñez N, Sorensen D (2008) A comparison of strategies for Markov chain Monte Carlo computation in quantitative genetics. Genet Sel Evol 40:161–176

    PubMed  Article  Google Scholar 

  85. Wright S (1934) An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19:506–536

    PubMed  CAS  Google Scholar 

  86. Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801

    PubMed  CAS  Google Scholar 

  87. Zhang XS, Hill WG (2005) Evolution of the environmental component of phenotypic variance: stabilizing selection in changing environments and the cost of homogeneity. Evolution 59:1237–1244

    PubMed  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Daniel Sorensen.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sorensen, D. Developments in statistical analysis in quantitative genetics. Genetica 136, 319–332 (2009). https://doi.org/10.1007/s10709-008-9303-5

Download citation

Keywords

  • Statistical genetics
  • McMC
  • Genetic models