Advertisement

Theoretical and Applied Genetics

, Volume 126, Issue 8, pp 1991–2002 | Cite as

Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network

  • Gota Morota
  • Daniel Gianola
Original Paper

Abstract

Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

Keywords

Linkage Disequilibrium Partition Function Bayesian Network Lasso Conditional Independence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments. This work was supported by the Wisconsin Agriculture Experiment Station and by a Hatch grant from the United States Department of Agriculture.

References

  1. Besag J (1975) Statistical analysis of non-lattice data. In: Proceedings of the Twenty-First National Conference on artificial intelligence, pp 179–195Google Scholar
  2. Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  3. Borgelt C, Steinbrecher M, Kruse RR (2009) Graphical models: representations for learning, reasoning and data mining. Wiley, New YorkGoogle Scholar
  4. Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci 97(22):12182–12186Google Scholar
  5. Clifford P (1990) Markov random fields in statistics. In: Grimmett GR, Welsh DJA (eds) Disorder in physical systems. A volume in honour of John M. Hammersley. Oxford University Press, New YorkGoogle Scholar
  6. Crossa J, Burgueño J, Dreisigacker S, Vargas M, Herrera-Foessel SA, Lillemo M, Singh RP, Trethowan R, Warburton M, Franco J, Reynolds M, Crouch JH, Ortiz R (2007) Association analysis of historical bread wheat germplasm using additive genetic covariance of relatives and population structure. Genetics 177(3):1889–1913CrossRefPubMedGoogle Scholar
  7. Crossa J, de Los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724CrossRefPubMedGoogle Scholar
  8. Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst:1695. http://igraph.sf.net
  9. de Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel KA, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385CrossRefPubMedGoogle Scholar
  10. Ding S, Wahba G, Zhu X (2011) Learning higher-order graph structure with features by structure penalty. In: Proceedings of the 25th Annual Conference on neural information processing systemsGoogle Scholar
  11. Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433CrossRefGoogle Scholar
  12. Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441CrossRefPubMedGoogle Scholar
  13. Friedman J, Hastie T, Tibshirani R (2010) Regularized paths for generalized linear models via coordiante descent. J Stat Softw 33(1):1–22PubMedGoogle Scholar
  14. Gianola D, Manfredi E, Simianer H (2012) On measures of association among genetic variables. Anim Genet 43:19–35CrossRefPubMedGoogle Scholar
  15. Guo J, Levina E, Michailidis G, Zhu J (2010) Joint structure estimation for categorical Markov Networks. Tech Rep Department of Statistics, University of Michigan, Ann ArborGoogle Scholar
  16. Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica 11:1–12CrossRefGoogle Scholar
  17. Hammersley JM, Clifford P (1971) Markov field on finite graphs and lattices. unpublishedGoogle Scholar
  18. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkCrossRefGoogle Scholar
  19. Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447CrossRefPubMedGoogle Scholar
  20. Hill WG, Robertson A (1968) Linkage disequilibrium in finite population. Theor Appl Genet 38:226–231CrossRefGoogle Scholar
  21. Höfling H, Tibshirani R (2009) Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J Mach Learn Res 10(3):883–906PubMedGoogle Scholar
  22. Kolar M, Xing EP (2008) Improved estimation of high-dimensional Ising models. http://arxiv.org/abs/0811.1239
  23. Koller D, Friedman N (2009) Probabilistic graphical models: Principles and Techiniques. The MIT Press, LondonGoogle Scholar
  24. Krämer N, Schäfer J, Boulesteix AL (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinforma 10:384Google Scholar
  25. Lee SI, Ganapathi V, Koller D (2006) Efficient structure learning of Markov networks using L1 regularization. In: Proceeding of the Neural Information Processing SystemsGoogle Scholar
  26. Lewontin RC (1988) On measures of gametic disequilibrium. Genetics 120:849–852PubMedGoogle Scholar
  27. Lin Y, Zhu S, Lee DD, Taskar B (2009) Learning sparse Markov network structure via ensemble-of-trees models. In: Proceedings of the 12th Artificial Intelligence and Statistics, FloridaGoogle Scholar
  28. Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable secletion with the lasso. Ann Stat 34(3):1436–1462CrossRefGoogle Scholar
  29. Menéndez P, Kourmpetis YAI, ter Braak CJF, van Eeuwijk FA (2010) Gene regulatory networks from multifactorial perturbations using Graphical Lasso: application to the DREAM4 challenge. PloS One 5(12):e14–147CrossRefGoogle Scholar
  30. Meuwissen THE, Goddard ME (2000) Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155(1):421–430PubMedGoogle Scholar
  31. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829PubMedGoogle Scholar
  32. Morota G, Valente BD, Rosa GJM, Weigel KA, Gianola D (2012) An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. J Anim Breed Genet 129(6):474–487PubMedGoogle Scholar
  33. Neapolitan RE (2003) Learning Bayesian Networks. Prentice Hall, New JerseyGoogle Scholar
  34. Newton MA (1999) Thoughts on gibbs distributions and markov random fields. Course notes http://wwwstatwiscedu/~newton/st775/materials/notes/gibbspdf. Accessed 15 August 2012
  35. Park T, Casella G (2008) The bayesian LASSO. J Am Stat Assoc 157:1819–1829Google Scholar
  36. Peng J, Wang P, Zhou N, Zhu J (2009) Partial correlation estimation by joint sparse regression model. J Am Stat Assoc 104:735–746CrossRefPubMedGoogle Scholar
  37. Pérez P, de~los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3(2):106–116CrossRefPubMedGoogle Scholar
  38. Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using L1 regularized logistic regression. Ann Stat 38(3):1287–1319CrossRefGoogle Scholar
  39. Scutari M (2010) Learning Bayesian networks with the bnlearn R package. J Stat Softw 35(3):1–22Google Scholar
  40. Sharan R, Ideker T (2006) Modeling cellular machinery through biological network comparison. Nat Biotechnol 24:427–433CrossRefPubMedGoogle Scholar
  41. Thomas A, Camp NJ (2004) Graphical modeling of the joint distribution of alleles at associated loci. Am J Hum Genet 74(6):1088–1101CrossRefPubMedGoogle Scholar
  42. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc 58:267–288Google Scholar
  43. Tsamardinos I, Aliferis CF, Statnikov A (2003) Algorithms for large scale Markov blanket discovery. In: Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society ConferenceGoogle Scholar
  44. VanRaden PM (2008) Efficient methods to compute genomic prediction. J Dairy Sci 91:4414–4423CrossRefPubMedGoogle Scholar
  45. Wang P, Chao DL, Hsu L (2011) Learning oncogenic pathways from binary genomic instability data. Biometrics 67:164–173CrossRefPubMedGoogle Scholar
  46. Wright S (1921a) Correlation and causation. J Agric Res 20:557–585Google Scholar
  47. Wright S (1921b) Systems of mating. I. The biometric relations between parents and offspring. Genetics 6:111–123PubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Animal SciencesUniversity of WisconsinMadisonUSA
  2. 2.Department of Biostatistics and Medical InformaticsUniversity of WisconsinMadisonUSA
  3. 3.Department of Dairy ScienceUniversity of WisconsinMadisonUSA

Personalised recommendations