Skip to main content

Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10362))

Included in the following conference series:

Abstract

Direct-to-Consumer genetic testing services are becoming more ubiquitous. Consumers of such services are sharing their genetic and clinical information with the research community to facilitate the extraction of knowledge about different conditions. In this paper, we build on these services to analyse the genetic data of people with different BMI levels to determine the immediate and long-term risk factors associated with obesity. Using web scraping techniques, a dataset containing publicly available information about 230 participants from the Personal Genome Project is created. Subsequent analysis of the dataset is conducted for the identification of genetic variants associated with high BMI levels via standard quality control and association analysis protocols for Genome Wide Association Analysis. We applied a combination of Random Forest based feature selection algorithm and Support Vector Machine with Radial Basis Function Kernel learning method to the filtered dataset. Using a robust data science methodology our approach identified obesity related genetic variants, to be used as features when predicting individual obesity susceptibility. The results reveal that the subset of features obtained through the Random Forest based algorithm improve the performance of the classifier when compared to the top statistically significant genetic variants identified in logistic regression. Support Vector Machine showed the best results with sensitivity=81%, specificity=83% and area under the curve=92% when the model was trained with the top fifteen features selected by Boruta.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.who.int/.

  2. 2.

    https://www.noo.org.uk/.

  3. 3.

    http://www.genomicsengland.co.uk/.

  4. 4.

    http://www.personalgenomes.org/.

  5. 5.

    http://www.personalgenomes.org/.

  6. 6.

    http://www.who.int/

  7. 7.

    http://www.r-project.org/

References

  1. James, W.P.T.: WHO recognition of the global obesity epidemic. Int. J. Obes. 32(Suppl 7), S120ā€“S126 (2008). (Lond)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  2. Poloz, Y., Stambolic, V.: Obesity and cancer, a case for insulin signaling. Cell Death Dis. 6, e2037 (2015)

    ArticleĀ  Google ScholarĀ 

  3. Rao, K.R., Lal, N., Giridharan, N.V.: Genetic & epigenetic approach to human obesity. Indian J. Med. Res. 140, 589ā€“603 (2015)

    Google ScholarĀ 

  4. Li, S., et al.: Physical activity attenuates the genetic predisposition to obesity in 20,000 men and women from EPIC-Norfolk prospective population study. PLoS Med. 7, 1ā€“9 (2010)

    ArticleĀ  Google ScholarĀ 

  5. Bello, A., et al.: Using linked administrative data to study periprocedural mortality in obesity and chronic kidney disease (CKD). Nephrol. Dial. Transpl. 28, iv57ā€“iv64 (2013)

    ArticleĀ  Google ScholarĀ 

  6. Loos, R.J.F.: Genetic determinants of common obesity and their value in prediction. Best Pract. Res. Clin. Endocrinol. Metab. 26, 211ā€“226 (2012)

    ArticleĀ  Google ScholarĀ 

  7. Samish, I., Bourne, P.E., Najmanovich, R.J.: Achievements and challenges in structural bioinformatics and computational biophysics. Bioinformatics 31, 146ā€“150 (2014)

    ArticleĀ  Google ScholarĀ 

  8. Higdon, R., et al.: Unravelling the complexities of life sciences data. Big Data 1, 17ā€“23 (2012)

    Google ScholarĀ 

  9. Tanwani, A.K., Afridi, J., Shafiq, M.Zubair, Farooq, M.: Guidelines to select machine learning scheme for classification of biomedical datasets. In: Pizzuti, C., Ritchie, Marylyn D., Giacobini, M. (eds.) EvoBIO 2009. LNCS, vol. 5483, pp. 128ā€“139. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01184-9_12

    ChapterĀ  Google ScholarĀ 

  10. Su, P.: Direct-to-consumer genetic testing: a comprehensive view. Yale J. Biol. Med. 86, 59ā€“65 (2013)

    Google ScholarĀ 

  11. Ball, M.P., et al.: Harvard personal genome project: lessons from participatory public research. Genome Med. 6, 10 (2014)

    ArticleĀ  Google ScholarĀ 

  12. Glez-Pena, D., Lourenco, A., Lopez-Fernandez, H., Reboiro-Jato, M., Fdez-Riverola, F.: Web scraping technologies in an API world. Brief. Bioinform. 15, 788ā€“797 (2014)

    ArticleĀ  Google ScholarĀ 

  13. Marx, V.: Biology: the big challenges of big data. Nature 498, 255ā€“260 (2013)

    ArticleĀ  Google ScholarĀ 

  14. Tryka, K.A., et al.: NCBIā€™s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res. 42, D975ā€“D979 (2014)

    ArticleĀ  Google ScholarĀ 

  15. Gonzaga-Jauregui, C., Lupski, J.R., Gibbs, R.A.: Human genome sequencing in health and disease. Annu. Rev. Med. 63, 35ā€“61 (2012)

    ArticleĀ  Google ScholarĀ 

  16. Bush, W.S., Moore, J.H.: Chapter 11: Genome-wide association studies. PLoS Comput. Biol. 8, e1002822 (2012). doi:10.1371/journal.pcbi.1002822

    ArticleĀ  Google ScholarĀ 

  17. Fadista, J., Manning, A.K., Florez, J.C., Groop, L.: The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 24, 1202ā€“1205 (2016)

    ArticleĀ  Google ScholarĀ 

  18. Zhang, Y.-B., et al.: Genome-wide association study identifies multiple susceptibility loci for craniofacial microsomia. Nat. Commun. 7, 10605 (2016)

    ArticleĀ  Google ScholarĀ 

  19. StoeklƩ, H.-C., Mamzer-Bruneel, M.-F., Vogt, G., HervƩ, C.: 23andMe: a new two-sided data-banking market model. BMC Med. Ethics. 17, 19 (2016)

    ArticleĀ  Google ScholarĀ 

  20. Purcell, S., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559ā€“575 (2007)

    ArticleĀ  Google ScholarĀ 

  21. Anderson, C.A., Pettersson, F.H., Clarke, G.M., Cardon, L.R., Morris, A.P., Zondervan, K.T.: Data quality control in genetic case-control association studies. Nat. Protoc. 5, 64ā€“73 (2010)

    ArticleĀ  Google ScholarĀ 

  22. Turner, S., et al.: Quality control procedures for genome-wide association studies. Curr. Protoc. Hum. Genet. Chapter 1, Unit1.19 (2011). doi:10.1002/0471142905.hg0119s68

  23. Reed, E., Nunez, S., Kulp, D., Qian, J., Reilly, M.P., Foulkes, A.S.: A guide to genome-wide association analysis and post-analytic interrogation. Stat. Med. 34, 3769ā€“3792 (2015)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  24. GĆ¼l, H., Aydin Son, Y., AƧikel, C.: Discovering missing heritability and early risk prediction for type 2 diabetes: a new perspective for genome-wide association study analysis with the Nursesā€™ Health Study and the Health Professionalsā€™ Follow-Up Study. Turkish J. Med. Sci. 44, 946ā€“954 (2014)

    ArticleĀ  Google ScholarĀ 

  25. Kursa, M.B., Rudnicki, W.R.: Feature Selection with the Boruta package. J. Stat. Softw. 36, 1ā€“13 (2010)

    ArticleĀ  Google ScholarĀ 

  26. Cordell, H.J.: Detecting geneā€“gene interactions that underlie human diseases. Nat. Rev. Genet. 10, 392ā€“404 (2009)

    ArticleĀ  Google ScholarĀ 

  27. Curbelo MontaƱez, C.A. et al.: Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In: 2017 International Joint Conference on Neural Networks (IJCNN), p. 8, Anchorage, Alaska (2017)

    Google ScholarĀ 

  28. Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28, 1ā€“26 (2008)

    ArticleĀ  Google ScholarĀ 

  29. Stein, L.: Creating a bioinformatics nation. Nature 417, 119ā€“120 (2002)

    ArticleĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Casimiro Aday Curbelo MontaƱez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this paper

Cite this paper

Curbelo MontaƱez, C.A., Fergus, P., Hussain, A., Al-Jumeily, D., Dorak, M.T., Abdullah, R. (2017). Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data. In: Huang, DS., Jo, KH., Figueroa-Garcƭa, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63312-1_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63311-4

  • Online ISBN: 978-3-319-63312-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics