SNPboost: Interaction Analysis and Risk Prediction on GWA Data

Brænne, Ingrid; Erdmann, Jeanette; Mamlouk, Amir Madany

doi:10.1007/978-3-642-21738-8_15

Ingrid Brænne^19,20,21,
Jeanette Erdmann^20,21 &
Amir Madany Mamlouk^19,21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6792))

Included in the following conference series:

International Conference on Artificial Neural Networks

2344 Accesses

Abstract

Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies applying SVMs have been successful in improving the risk prediction for Type I and II diabetes, however, a drawback is the poor interpretability of the classifier. Training the SVM only on a subset of SNPs would imply a preselection, typically by the p-values. Especially for complex diseases, this might not be the optimal selection strategy. In this work, we propose an extension of Adaboost for GWA data, the so-called SNPboost. In order to improve classification, SNPboost successively selects a subset of SNPs. On real GWA data (German MI family study II), SNPboost outperformed linear SVM and further improved the performance of a non-linear SVM when used as a preselector. Finally, we motivate that the selected SNPs can be put into a biological context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ban, H.J., Heo, J.Y., Oh, K.S., Park, K.J.: Identification of type 2 diabetes-associated combination of snps using support vector machine. BMC Genetics 11(1), 26 (2010)
Article Google Scholar
Erdmann, J., Großhennig, A., Braund, P.S., König, I.R., Hengstenberg, C., Hall, A.S., Linsel-Nitschke, P., et al.: New susceptibility locus for coronary artery disease on chromosome 3q22.3. Nat. Genet. 41(3), 280–282 (2009)
Article Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: European Conference on Computational Learning Theory, pp. 23–37 (1995)
Google Scholar
Freund, Y., Schapire, R.E.: A short introduction to boosting. Journal of japanese Society for Artificial Intelligence, 771–780 (1999)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article Google Scholar
Ioannidis, J.P.: Prediction of cardiovascular disease outcomes and established cardiovascular risk factors by Genome-Wide association markers. Circ. Cardiovasc. Genet. 2(1), 7–15 (2009)
Article Google Scholar
Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., et al.: Finding the missing heritability of complex diseases. Nature 461(7265), 747–753 (2009)
Article Google Scholar
Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity 56(1-3), 73–82 (2003)
Article Google Scholar
Raelson, J.V., Little, R.D., Ruether, A., Fournier, H., Paquin, B., Van Eerdewegh, P., Bradley, W.E.C., et al.: Genome-wide association study for crohn’s disease in the quebec founder population identifies multiple validated disease loci. Proceedings of the National Academy of Sciences 104(37), 14747–14752 (2007)
Article Google Scholar
Samani, N.J., Erdmann, J., Hall, A.S., Hengstenberg, C., Mangino, M., Mayer, B., Dixon, R.J., et al.: Genomewide Association Analysis of Coronary Artery Disease. N. Engl. J. Med. 357(5), 443–453 (2007)
Article Google Scholar
Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M., Muller, J., Bork, P., Jensen, L.J., van Mering, C.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research 39(database), D561–D568 (2010)
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Wei, Z., Wang, K., Qu, H.Q.Q., Zhang, H., Bradfield, J., Kim, C., Frackleton, E., et al.: From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genetics 5(10), e1000678(2009)
Article Google Scholar
Wray, N.R., Goddard, M.E., Visscher, P.M.: Prediction of individual genetic risk of complex disease. Current Opinion in Genetics and Development 18(73), 257–263 (2008)
Article Google Scholar
Yoon, Y., Song, J., Hong, S.H., Kim, J.Q.: Analysis of multiple single nucleotide polymorphisms of candidate genes related to coronary heart disease susceptibility by using support vector machines. Clinical Chemistry and Laboratory Medicine: CCLM / FESCC 41(4), 529–534 (2003) PMID: 12747598
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Neuro- and Bioinformatics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
Ingrid Brænne & Amir Madany Mamlouk
Medizinische Klinik II, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
Ingrid Brænne & Jeanette Erdmann
Graduate School for Computing in Medicine and Life Sciences, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
Ingrid Brænne, Jeanette Erdmann & Amir Madany Mamlouk

Authors

Ingrid Brænne
View author publications
You can also search for this author in PubMed Google Scholar
Jeanette Erdmann
View author publications
You can also search for this author in PubMed Google Scholar
Amir Madany Mamlouk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Timo Honkela & Samuel Kaski &
School of Physics, Astronomy and Informatics, Department of Informatics, Nicolaus Copernicus University, ul. Grudziadzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Statistical Science, University College London, 1-19 Torrington Place, WC1E 7HB, London, UK
Mark Girolami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brænne, I., Erdmann, J., Mamlouk, A.M. (2011). SNPboost: Interaction Analysis and Risk Prediction on GWA Data. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6792. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21738-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-21738-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21737-1
Online ISBN: 978-3-642-21738-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics