Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations

Sinoquet, Christine; Mourad, Raphaël; Leray, Philippe

doi:10.1007/978-3-642-38256-7_8

Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations

Christine Sinoquet⁸,
Raphaël Mourad⁹ &
Philippe Leray¹⁰

Conference paper

1953 Accesses
3 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 357))

Abstract

Genome-wide association studies have revolutionized the search for genetic influences on common genetic diseases such as diabetes, obesity, asthma, cardio-vascular diseases and some cancers. In particular, together with the population aging concern, increasing health care costs require that further investigations are pursued to design scalable and efficient tools. The high dimensionality and complexity of genetic data hinder the detection of genetic associations. To decrease the risks of missing the causal factor and discovering spurious associations, machine learning offers an attractive framework alternative to classical statistical approaches. A novel class of probabilistic graphical models (PGMs) has recently been proposed - the forest of latent tree models (FLTMs) - , to reach a trade-off between faithful modeling of data dependences and tractability. In this chapter, we assess the great potentiality of this model to detect genotype-phenotype associations. The FLTM-based contribution is first put into the perspective of PGM-based works meant to model the dependences in genetic data; then the contribution is considered from the technical viewpoint of LTM learning, with the vital objective of scalability in mind. We then present the systematic and comprehensive evaluation conducted to assess the ability of the FLTM model to detect genetic associations through latent variables. Realistic simulations were performed under various controlled conditions. In this context, we present a procedure tailored to correct for multiple testing. We also show and discuss results obtained on real data. Beside guaranteeing data dimension reduction through latent variables, the FLTM model is empirically proven able to capture indirect genetic associations with the disease: strong associations are evidenced between the disease and the ancestor nodes of the causal genetic marker node, in the forest; in contrast, very weak associations are obtained for other latent variables. Finally, we discuss the prospects of the model for association detection at genome scale.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, Y., Ji, L.: Clustering of SNPs by a Structural EM Algorithm. In: International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, pp. 147–150 (2009)
Google Scholar
Mourad, R., Sinoquet, C., Leray, P.: Learning Hierarchical Bayesian Networks for Genome-Wide Association Studies. In: Lechevallier, Y., Saporta, G. (eds.) 19th International Conference on Computational Statistics (COMPSTAT), pp. 549–556 (2010)
Google Scholar
Mourad, R., Sinoquet, C., Leray, P.: A Hierarchical Bayesian Network Approach for Linkage Disequilibrium Modeling and Data-Dimensionality Reduction Prior to Genome-wide Association Studies. BMC Bioinformatics 12, 16+ (2011)
Google Scholar
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-Resolution Haplotype Structure in the Human Genome. Nature Genetics 29(2), 229–232 (2001)
Article Google Scholar
Verzilli, C.J., Stallard, N., Whittaker, J.C.: Bayesian Graphical Models for Genome-Wide Association Studies. The American Journal of Human Genetics 79, 100–112 (2006)
Article Google Scholar
Han, B., Park, M., Chen, X.-W.: A Markov Blanket-Based Method for Detecting Causal SNPs in GWAS. BMC Bioinformatics 11(suppl. 3), S5+ (2010)
Google Scholar
Thomas, A., Camp, N.J.: Graphical Modeling of the Joint Distribution of Alleles at Associated Loci. The American Journal of Human Genetics 74, 1088–1101 (2004)
Article Google Scholar
Lee, P.H., Shatkay, H.: BNTagger: Improved Tagging SNP Selection Using Bayesian Networks. Bioinformatics 22(14), 211–219 (2006)
Article Google Scholar
Greenspan, G., Geiger, D.: High Density Linkage Disequilibrium Mapping Using Models of Haplotype Block Variation. Bioinformatics 20, 137–144 (2004)
Article Google Scholar
Kimmel, G., Shamir, R.: GERBIL: Genotype Resolution and Block Identification Using Likelihood. Proceedings of the National Academy of Sciences of The United States of America (PNAS) 102(1), 158–162 (2005)
Article Google Scholar
Scheet, P., Stephens, M.: A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase. The American Journal of Human Genetics 78(4), 629–644 (2006)
Article Google Scholar
Browning, S.R., Browning, B.L.: Rapid and Accurate Haplotype Phasing and Missing-data Inference for Whole-Genome Association Studies by Use of Localized Haplotype Clustering. The American Journal of Human Genetics 81(5), 1084–1097 (2007)
Article Google Scholar
Abel, H.J., Thomas, A.: Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation. Statistical Applications in Genetics and Molecular Biology 10(1), Article 5 (2011)
Google Scholar
Thomas, A., Green, P.J.: Enumerating the Junction Trees of a Decomposable Graph. Journal of Computational and Graphical Statistics 18(4), 930–940 (2009)
Article MathSciNet Google Scholar
Schwartz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464 (1978)
Article MathSciNet Google Scholar
Zhang, N.L.: Hierarchical Latent Class Models for Cluster Analysis. Journal of Machine Learning Research 5, 697–723 (2004)
MATH Google Scholar
Chen, T., Zhang, N.L., Liu, T., Poon, K.M., Wang, Y.: Model-Based Multidimensional Clustering of Categorical Data. Artificial Intelligence 176(1), 2246–2269 (2011)
Article MathSciNet Google Scholar
Zhang, N.L., Kocka, T.: Efficient Learning of Hierarchical Latent Class Models. In: 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 585–593 (2004)
Google Scholar
Hwang, K.-B., Kim, B.-H., Zhang, B.-T.: Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006, Part I. LNCS, vol. 4232, pp. 670–679. Springer, Heidelberg (2006)
Chapter Google Scholar
Harmeling, S., Williams, C.K.I.: Greedy Learning of Binary Latent Trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(6), 1087–1097 (2011)
Article Google Scholar
Wang, Y., Zhang, N.L., Chen, T.: Latent Tree Models and Approximate Inference in Bayesian Networks. Machine Learning 32, 879–900 (2008)
MathSciNet MATH Google Scholar
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering Gene Expression Patterns. In: 3rd Annual International Conference on Computational Molecular Biology, pp. 33–42 (1999)
Google Scholar
Mourad, R., Sinoquet, C., Dina, C., Leray, P.: Visualization of Pairwise and Multilocus Linkage Disequilibrium Structure Using Latent Forests. PLoS ONE 6(12), e27320 (2011)
Google Scholar
Spencer, C.C., Su, Z., Donnelly, P., Marchini, J.: Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genetics, 5, e1000477+ (2009)
Google Scholar
Hosking, L.K., Boyd, P.R., Xu, C.F., Nissum, M., Cantone, K., Purvis, I.J., Khakhar, R., Barnes, M.R., Liberwirth, U., Hagen-Mann, K., Ehm, M.G., Riley, J.H.: Linkage Disequilibrium Mapping Identifies a 390 kb Region Associated with CYP2D6 Poor Drug Metabolising Activity. Pharmacogenomics Journal 2(3), 165–175 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LINA, UMR CNRS 6241, Université de Nantes, 2 rue de la Houssinière, BP 92208, 44322, Nantes Cedex, France
Christine Sinoquet
Center for Computational Biology and Bioinformatics, Department of Molecular and Medical Genetics, Indiana University, Indianapolis, IN, 46002, U.S.A.
Raphaël Mourad
UMR CNRS 6241, Ecole Polytechnique de l’Université de Nantes, rue Christian Pauc, BP 50609, 44306, Nantes Cedex 3, France
Philippe Leray

Authors

Christine Sinoquet
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Mourad
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Leray
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Porto, Portugal
Joaquim Gabriel
Institute of Information Theory and Automation of the ASCR, Pod vodárenskou věží 4, CZ-182 08, Prague 8, Czech Republic
Jan Schier
Dept. of Electrical Engineering, ESAT-SCD(SISTA), Katholieke Universiteit Leuven, Belgium
Sabine Van Huffel
University of Toulouse, France
Emmanuel Conchon
University of Coimbra, Portugal
Carlos Correia
IST - Technical University of Lisbon,, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Institute of Telecommunication, Lisboa, Portugal
Hugo Gamboa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sinoquet, C., Mourad, R., Leray, P. (2013). Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations. In: Gabriel, J., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2012. Communications in Computer and Information Science, vol 357. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38256-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-38256-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38255-0
Online ISBN: 978-3-642-38256-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics