Skip to main content

Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 357))

Abstract

Genome-wide association studies have revolutionized the search for genetic influences on common genetic diseases such as diabetes, obesity, asthma, cardio-vascular diseases and some cancers. In particular, together with the population aging concern, increasing health care costs require that further investigations are pursued to design scalable and efficient tools. The high dimensionality and complexity of genetic data hinder the detection of genetic associations. To decrease the risks of missing the causal factor and discovering spurious associations, machine learning offers an attractive framework alternative to classical statistical approaches. A novel class of probabilistic graphical models (PGMs) has recently been proposed - the forest of latent tree models (FLTMs) - , to reach a trade-off between faithful modeling of data dependences and tractability. In this chapter, we assess the great potentiality of this model to detect genotype-phenotype associations. The FLTM-based contribution is first put into the perspective of PGM-based works meant to model the dependences in genetic data; then the contribution is considered from the technical viewpoint of LTM learning, with the vital objective of scalability in mind. We then present the systematic and comprehensive evaluation conducted to assess the ability of the FLTM model to detect genetic associations through latent variables. Realistic simulations were performed under various controlled conditions. In this context, we present a procedure tailored to correct for multiple testing. We also show and discuss results obtained on real data. Beside guaranteeing data dimension reduction through latent variables, the FLTM model is empirically proven able to capture indirect genetic associations with the disease: strong associations are evidenced between the disease and the ancestor nodes of the causal genetic marker node, in the forest; in contrast, very weak associations are obtained for other latent variables. Finally, we discuss the prospects of the model for association detection at genome scale.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, Y., Ji, L.: Clustering of SNPs by a Structural EM Algorithm. In: International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, pp. 147–150 (2009)

    Google Scholar 

  2. Mourad, R., Sinoquet, C., Leray, P.: Learning Hierarchical Bayesian Networks for Genome-Wide Association Studies. In: Lechevallier, Y., Saporta, G. (eds.) 19th International Conference on Computational Statistics (COMPSTAT), pp. 549–556 (2010)

    Google Scholar 

  3. Mourad, R., Sinoquet, C., Leray, P.: A Hierarchical Bayesian Network Approach for Linkage Disequilibrium Modeling and Data-Dimensionality Reduction Prior to Genome-wide Association Studies. BMC Bioinformatics 12, 16+ (2011)

    Google Scholar 

  4. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-Resolution Haplotype Structure in the Human Genome. Nature Genetics 29(2), 229–232 (2001)

    Article  Google Scholar 

  5. Verzilli, C.J., Stallard, N., Whittaker, J.C.: Bayesian Graphical Models for Genome-Wide Association Studies. The American Journal of Human Genetics 79, 100–112 (2006)

    Article  Google Scholar 

  6. Han, B., Park, M., Chen, X.-W.: A Markov Blanket-Based Method for Detecting Causal SNPs in GWAS. BMC Bioinformatics 11(suppl. 3), S5+ (2010)

    Google Scholar 

  7. Thomas, A., Camp, N.J.: Graphical Modeling of the Joint Distribution of Alleles at Associated Loci. The American Journal of Human Genetics 74, 1088–1101 (2004)

    Article  Google Scholar 

  8. Lee, P.H., Shatkay, H.: BNTagger: Improved Tagging SNP Selection Using Bayesian Networks. Bioinformatics 22(14), 211–219 (2006)

    Article  Google Scholar 

  9. Greenspan, G., Geiger, D.: High Density Linkage Disequilibrium Mapping Using Models of Haplotype Block Variation. Bioinformatics 20, 137–144 (2004)

    Article  Google Scholar 

  10. Kimmel, G., Shamir, R.: GERBIL: Genotype Resolution and Block Identification Using Likelihood. Proceedings of the National Academy of Sciences of The United States of America (PNAS) 102(1), 158–162 (2005)

    Article  Google Scholar 

  11. Scheet, P., Stephens, M.: A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase. The American Journal of Human Genetics 78(4), 629–644 (2006)

    Article  Google Scholar 

  12. Browning, S.R., Browning, B.L.: Rapid and Accurate Haplotype Phasing and Missing-data Inference for Whole-Genome Association Studies by Use of Localized Haplotype Clustering. The American Journal of Human Genetics 81(5), 1084–1097 (2007)

    Article  Google Scholar 

  13. Abel, H.J., Thomas, A.: Accuracy and Computational Efficiency of a Graphical Modeling Approach to Linkage Disequilibrium Estimation. Statistical Applications in Genetics and Molecular Biology 10(1), Article 5 (2011)

    Google Scholar 

  14. Thomas, A., Green, P.J.: Enumerating the Junction Trees of a Decomposable Graph. Journal of Computational and Graphical Statistics 18(4), 930–940 (2009)

    Article  MathSciNet  Google Scholar 

  15. Schwartz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  16. Zhang, N.L.: Hierarchical Latent Class Models for Cluster Analysis. Journal of Machine Learning Research 5, 697–723 (2004)

    MATH  Google Scholar 

  17. Chen, T., Zhang, N.L., Liu, T., Poon, K.M., Wang, Y.: Model-Based Multidimensional Clustering of Categorical Data. Artificial Intelligence 176(1), 2246–2269 (2011)

    Article  MathSciNet  Google Scholar 

  18. Zhang, N.L., Kocka, T.: Efficient Learning of Hierarchical Latent Class Models. In: 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 585–593 (2004)

    Google Scholar 

  19. Hwang, K.-B., Kim, B.-H., Zhang, B.-T.: Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006, Part I. LNCS, vol. 4232, pp. 670–679. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Harmeling, S., Williams, C.K.I.: Greedy Learning of Binary Latent Trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(6), 1087–1097 (2011)

    Article  Google Scholar 

  21. Wang, Y., Zhang, N.L., Chen, T.: Latent Tree Models and Approximate Inference in Bayesian Networks. Machine Learning 32, 879–900 (2008)

    MathSciNet  MATH  Google Scholar 

  22. Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering Gene Expression Patterns. In: 3rd Annual International Conference on Computational Molecular Biology, pp. 33–42 (1999)

    Google Scholar 

  23. Mourad, R., Sinoquet, C., Dina, C., Leray, P.: Visualization of Pairwise and Multilocus Linkage Disequilibrium Structure Using Latent Forests. PLoS ONE 6(12), e27320 (2011)

    Google Scholar 

  24. Spencer, C.C., Su, Z., Donnelly, P., Marchini, J.: Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genetics, 5, e1000477+ (2009)

    Google Scholar 

  25. Hosking, L.K., Boyd, P.R., Xu, C.F., Nissum, M., Cantone, K., Purvis, I.J., Khakhar, R., Barnes, M.R., Liberwirth, U., Hagen-Mann, K., Ehm, M.G., Riley, J.H.: Linkage Disequilibrium Mapping Identifies a 390 kb Region Associated with CYP2D6 Poor Drug Metabolising Activity. Pharmacogenomics Journal 2(3), 165–175 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sinoquet, C., Mourad, R., Leray, P. (2013). Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations. In: Gabriel, J., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2012. Communications in Computer and Information Science, vol 357. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38256-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38256-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38255-0

  • Online ISBN: 978-3-642-38256-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics