Random Forest and Gene Networks for Association of SNPs to Alzheimer’s Disease

  • Gilderlanio S. Araújo
  • Manuela R. B. Souza
  • João Ricardo M. Oliveira
  • Ivan G. Costa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8213)


Machine learning methods, such as Random Forest (RF), have been used to predict disease risk and select a set of single nucleotide polymorphisms (SNPs) associated to the disease on Genome-Wide Association Studies (GWAS). In this study, we extracted information from biological networks for selecting candidate SNPs to be used by RF, for predicting and ranking SNPs by importance measures. From an initial set of genes already related to a disease, we used the tool GeneMANIA for constructing gene interaction networks to find novel genes that might be associated with Alzheimer’s Disease (AD). Therefore, it is possible to extract a small number of SNPs making the application of RF feasible. The experiments conducted in this study focus on investigating which SNPs may influence the susceptibility to AD.


Random Forest SNP Alzheimer’s Disease Genome-wide Association Study 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Thies, W., Bleiler, L.: Alzheimers disease facts and figures. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 7, 208–244 (2011)CrossRefGoogle Scholar
  2. 2.
    Wang, W.Y.S., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: theoretical and practical concerns. Nature Reviews. Genetics 6, 109–118 (2005)Google Scholar
  3. 3.
    Bertram, L., McQueen, M.B., Mullin, K., Blacker, D., Tanzi, R.E.: Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nature Genetics 39, 17–23 (2007)CrossRefGoogle Scholar
  4. 4.
    Saykin, A.J., et al.: Alzheimer’s Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 6, 265–273 (2010)CrossRefGoogle Scholar
  5. 5.
    Petersen, R.C., et al.: Alzheimer’s Disease Neuroimaging Initiative (ADNI) Clinical characterization. Neurology 74, 201–209 (2010)CrossRefGoogle Scholar
  6. 6.
    Kim, S., Misra, A.: SNP genotyping: technologies and biomedical applications. Annual Review of Biomedical Engineering 9, 289–320 (2007)CrossRefGoogle Scholar
  7. 7.
    Montojo, J., Zuberi, K., Rodriguez, H., Kazi, F., Wright, G., Donaldson, S.L., Morris, Q., Bader, G.D.: GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics 26(22), 2927–2928 (2010)CrossRefGoogle Scholar
  8. 8.
    Ritchie, M.D.: Using biological knowledge to uncover the mystery in the search for epistasis in genome-wide association studies. Ann. Hum. Genet. 75(1), 172–182 (2011)CrossRefGoogle Scholar
  9. 9.
    Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C., Morris, Q.: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome. Biol. 9(suppl. 1), S4 (2008)Google Scholar
  10. 10.
    Goldstein, B.A., Hubbard, A.E., Cutler, A., Barcellos, L.F.: An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings. BMC Genetics 11, 49 (2010)CrossRefGoogle Scholar
  11. 11.
    Lunetta, K.L., Hayward, L.B., Segal, J., Van Eerdewegh, P.: Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 (2004)CrossRefGoogle Scholar
  12. 12.
    Meng, Y.A., Yu, Y., Cupples, L.A., Farrer, L.A., Lunetta, K.L.: Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 10, 78 (2009)CrossRefGoogle Scholar
  13. 13.
    Purcell, S., Neale, B., Todd-Brown, K., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)CrossRefGoogle Scholar
  14. 14.
    Liaw, A., Wiener, M.: Classification and Regression by randomForest. R News 2, 18–22 (2002)Google Scholar
  15. 15.
    Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  16. 16.
    Heidema, A.G., Boer, J.M., Nagelkerke, N., Mariman, E.C., van der A, D.L., Feskens, E.J.: The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet. 7, 23 (2006)CrossRefGoogle Scholar
  17. 17.
    Glaser, B., Nikolov, I., Chubb, D., Hamshere, M.L., Segurado, R., Moskvina, V., Holmans, P.: Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests. BMC Proc. 1(suppl. 1), S54 (2007)Google Scholar
  18. 18.
    Liu, C., Ackerman, H.H., Carulli, J.P.: A genome-wide screen of gene-gene interactions for rheumatoid arthritis susceptibility. Hum. Genet. 129(5), 473–485 (2011)CrossRefGoogle Scholar
  19. 19.
    Sun, Y.V., Cai, Z., Desai, K., Lawrance, R., Leff, R., Jawaid, A., Kardia, S.L., Yang, H.: Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests. BMC Proc. 1(suppl. 1), S62 (2007)Google Scholar
  20. 20.
    Araujo, G., Costa, I.G., Souza, M., Oliveira, J.R.M.: An Experimental Application of Random Forest on ADNI Genotype Dataset. In: Digital Proceedings of Brazilian Symposium on Bioinformatics, Campo Grande, pp. 68–73. SBC, Porto Alegre (2012)Google Scholar
  21. 21.
    Di Paolo, G., Kim, T.W.: Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nat. Rev. Neurosci. 12(5), 284–296 (2011)CrossRefGoogle Scholar
  22. 22.
    Hirsch-Reinshagen, V., Burgess, B., Wellington, C.: Why lipids are important for Alzheimer disease? Molecular and Cellular Biochemistry 326(1), 121–129 (2009)CrossRefGoogle Scholar
  23. 23.
    Holtzman, D.M., Herz, J., Bu, G.: Apolipoprotein e and apolipoprotein e receptors: normal biology and roles in Alzheimer disease. Cold Spring Harb. Perspect. Med. 2(3), a006312(2012)Google Scholar
  24. 24.
    Wu, F., Yao, P.J.: Clathrin-mediated endocytosis and Alzheimer’s disease: an update. Ageing Res. Rev. 8(3), 147–149 (2009)CrossRefGoogle Scholar
  25. 25.
    McMahon, H.T., Boucrot, E.: Molecular mechanism and physiological functions of clathrin-mediated endocytosis. Nat. Rev. Mol. Cell Biol. 12(8), 517–533 (2011)CrossRefGoogle Scholar
  26. 26.
    Chatr-Aryamontri, A., Breitkreutz, B.J., Heinicke, S., Boucher, L., Winter, A., Stark, C., Nixon, J., Ramage, L., Kolas, N., O’Donnell, L., Reguly, T., Breitkreutz, A., Sellam, A., Chen, D., Chang, C., Rust, J., Livstone, M., Oughtred, R., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41(Database issue), D816-D823 (2013)Google Scholar
  27. 27.
    Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C.L., Serova, N., Davis, S., Soboleva, A.: NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41(Database issue), D991-D995 (2013)Google Scholar
  28. 28.
    Cerami, E.G., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., Sander, C.: Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39(Database issue), D685-D690 (2011)Google Scholar
  29. 29.
    Brown, K.R., Jurisica, I.: Online Predicted Human Interaction Database. Bioinformatics 21(9), 2076–2082 (2005)CrossRefGoogle Scholar
  30. 30.
    Bush, W.S., Moore, J.H.: Chapter 11: Genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Gilderlanio S. Araújo
    • 1
  • Manuela R. B. Souza
    • 3
    • 4
  • João Ricardo M. Oliveira
    • 3
    • 4
  • Ivan G. Costa
    • 1
    • 2
  1. 1.Center of InformaticsFederal University of PernambucoRecifeBrazil
  2. 2.Interdiciplinary Center for Clinical Research (IZKF) and Institute for Biomedical EngineeringRWTH University HospitalAachenGermany
  3. 3.Keizo Asami Laboratory (LIKA)Federal University of PernambucoRecifeBrazil
  4. 4.Neuropsychiatry DepartmentFederal University of PernambucoRecifeBrazil

Personalised recommendations