# Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network

## Abstract

Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

## Keywords

Linkage Disequilibrium Partition Function Bayesian Network Lasso Conditional Independence## Notes

### Acknowledgments

The authors thank the anonymous reviewers for their valuable comments. This work was supported by the Wisconsin Agriculture Experiment Station and by a Hatch grant from the United States Department of Agriculture.

## References

- Besag J (1975) Statistical analysis of non-lattice data. In: Proceedings of the Twenty-First National Conference on artificial intelligence, pp 179–195Google Scholar
- Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
- Borgelt C, Steinbrecher M, Kruse RR (2009) Graphical models: representations for learning, reasoning and data mining. Wiley, New YorkGoogle Scholar
- Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci 97(22):12182–12186Google Scholar
- Clifford P (1990) Markov random fields in statistics. In: Grimmett GR, Welsh DJA (eds) Disorder in physical systems. A volume in honour of John M. Hammersley. Oxford University Press, New YorkGoogle Scholar
- Crossa J, Burgueño J, Dreisigacker S, Vargas M, Herrera-Foessel SA, Lillemo M, Singh RP, Trethowan R, Warburton M, Franco J, Reynolds M, Crouch JH, Ortiz R (2007) Association analysis of historical bread wheat germplasm using additive genetic covariance of relatives and population structure. Genetics 177(3):1889–1913CrossRefPubMedGoogle Scholar
- Crossa J, de Los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724CrossRefPubMedGoogle Scholar
- Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst:1695. http://igraph.sf.net
- de Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel KA, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385CrossRefPubMedGoogle Scholar
- Ding S, Wahba G, Zhu X (2011) Learning higher-order graph structure with features by structure penalty. In: Proceedings of the 25th Annual Conference on neural information processing systemsGoogle Scholar
- Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433CrossRefGoogle Scholar
- Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441CrossRefPubMedGoogle Scholar
- Friedman J, Hastie T, Tibshirani R (2010) Regularized paths for generalized linear models via coordiante descent. J Stat Softw 33(1):1–22PubMedGoogle Scholar
- Gianola D, Manfredi E, Simianer H (2012) On measures of association among genetic variables. Anim Genet 43:19–35CrossRefPubMedGoogle Scholar
- Guo J, Levina E, Michailidis G, Zhu J (2010) Joint structure estimation for categorical Markov Networks. Tech Rep Department of Statistics, University of Michigan, Ann ArborGoogle Scholar
- Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica 11:1–12CrossRefGoogle Scholar
- Hammersley JM, Clifford P (1971) Markov field on finite graphs and lattices. unpublishedGoogle Scholar
- Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkCrossRefGoogle Scholar
- Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447CrossRefPubMedGoogle Scholar
- Hill WG, Robertson A (1968) Linkage disequilibrium in finite population. Theor Appl Genet 38:226–231CrossRefGoogle Scholar
- Höfling H, Tibshirani R (2009) Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J Mach Learn Res 10(3):883–906PubMedGoogle Scholar
- Kolar M, Xing EP (2008) Improved estimation of high-dimensional Ising models. http://arxiv.org/abs/0811.1239
- Koller D, Friedman N (2009) Probabilistic graphical models: Principles and Techiniques. The MIT Press, LondonGoogle Scholar
- Krämer N, Schäfer J, Boulesteix AL (2009) Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinforma 10:384Google Scholar
- Lee SI, Ganapathi V, Koller D (2006) Efficient structure learning of Markov networks using L1 regularization. In: Proceeding of the Neural Information Processing SystemsGoogle Scholar
- Lewontin RC (1988) On measures of gametic disequilibrium. Genetics 120:849–852PubMedGoogle Scholar
- Lin Y, Zhu S, Lee DD, Taskar B (2009) Learning sparse Markov network structure via ensemble-of-trees models. In: Proceedings of the 12th Artificial Intelligence and Statistics, FloridaGoogle Scholar
- Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable secletion with the lasso. Ann Stat 34(3):1436–1462CrossRefGoogle Scholar
- Menéndez P, Kourmpetis YAI, ter Braak CJF, van Eeuwijk FA (2010) Gene regulatory networks from multifactorial perturbations using Graphical Lasso: application to the DREAM4 challenge. PloS One 5(12):e14–147CrossRefGoogle Scholar
- Meuwissen THE, Goddard ME (2000) Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155(1):421–430PubMedGoogle Scholar
- Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829PubMedGoogle Scholar
- Morota G, Valente BD, Rosa GJM, Weigel KA, Gianola D (2012) An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. J Anim Breed Genet 129(6):474–487PubMedGoogle Scholar
- Neapolitan RE (2003) Learning Bayesian Networks. Prentice Hall, New JerseyGoogle Scholar
- Newton MA (1999) Thoughts on gibbs distributions and markov random fields. Course notes http://wwwstatwiscedu/~newton/st775/materials/notes/gibbspdf. Accessed 15 August 2012
- Park T, Casella G (2008) The bayesian LASSO. J Am Stat Assoc 157:1819–1829Google Scholar
- Peng J, Wang P, Zhou N, Zhu J (2009) Partial correlation estimation by joint sparse regression model. J Am Stat Assoc 104:735–746CrossRefPubMedGoogle Scholar
- Pérez P, de~los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3(2):106–116CrossRefPubMedGoogle Scholar
- Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using L1 regularized logistic regression. Ann Stat 38(3):1287–1319CrossRefGoogle Scholar
- Scutari M (2010) Learning Bayesian networks with the bnlearn R package. J Stat Softw 35(3):1–22Google Scholar
- Sharan R, Ideker T (2006) Modeling cellular machinery through biological network comparison. Nat Biotechnol 24:427–433CrossRefPubMedGoogle Scholar
- Thomas A, Camp NJ (2004) Graphical modeling of the joint distribution of alleles at associated loci. Am J Hum Genet 74(6):1088–1101CrossRefPubMedGoogle Scholar
- Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc 58:267–288Google Scholar
- Tsamardinos I, Aliferis CF, Statnikov A (2003) Algorithms for large scale Markov blanket discovery. In: Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society ConferenceGoogle Scholar
- VanRaden PM (2008) Efficient methods to compute genomic prediction. J Dairy Sci 91:4414–4423CrossRefPubMedGoogle Scholar
- Wang P, Chao DL, Hsu L (2011) Learning oncogenic pathways from binary genomic instability data. Biometrics 67:164–173CrossRefPubMedGoogle Scholar
- Wright S (1921a) Correlation and causation. J Agric Res 20:557–585Google Scholar
- Wright S (1921b) Systems of mating. I. The biometric relations between parents and offspring. Genetics 6:111–123PubMedGoogle Scholar