Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data

Curbelo Montañez, Casimiro Aday; Fergus, Paul; Hussain, Abir; Al-Jumeily, Dhiya; Dorak, Mehmet Tevfik; Abdullah, Rosni

doi:10.1007/978-3-319-63312-1_31

Casimiro Aday Curbelo Montañez¹⁶,
Paul Fergus¹⁶,
Abir Hussain¹⁶,
Dhiya Al-Jumeily¹⁶,
Mehmet Tevfik Dorak¹⁷ &
…
Rosni Abdullah¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10362))

Included in the following conference series:

International Conference on Intelligent Computing

2169 Accesses
2 Citations

Abstract

Direct-to-Consumer genetic testing services are becoming more ubiquitous. Consumers of such services are sharing their genetic and clinical information with the research community to facilitate the extraction of knowledge about different conditions. In this paper, we build on these services to analyse the genetic data of people with different BMI levels to determine the immediate and long-term risk factors associated with obesity. Using web scraping techniques, a dataset containing publicly available information about 230 participants from the Personal Genome Project is created. Subsequent analysis of the dataset is conducted for the identification of genetic variants associated with high BMI levels via standard quality control and association analysis protocols for Genome Wide Association Analysis. We applied a combination of Random Forest based feature selection algorithm and Support Vector Machine with Radial Basis Function Kernel learning method to the filtered dataset. Using a robust data science methodology our approach identified obesity related genetic variants, to be used as features when predicting individual obesity susceptibility. The results reveal that the subset of features obtained through the Random Forest based algorithm improve the performance of the classifier when compared to the top statistically significant genetic variants identified in logistic regression. Support Vector Machine showed the best results with sensitivity=81%, specificity=83% and area under the curve=92% when the model was trained with the top fifteen features selected by Boruta.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

James, W.P.T.: WHO recognition of the global obesity epidemic. Int. J. Obes. 32(Suppl 7), S120–S126 (2008). (Lond)
Article MathSciNet Google Scholar
Poloz, Y., Stambolic, V.: Obesity and cancer, a case for insulin signaling. Cell Death Dis. 6, e2037 (2015)
Article Google Scholar
Rao, K.R., Lal, N., Giridharan, N.V.: Genetic & epigenetic approach to human obesity. Indian J. Med. Res. 140, 589–603 (2015)
Google Scholar
Li, S., et al.: Physical activity attenuates the genetic predisposition to obesity in 20,000 men and women from EPIC-Norfolk prospective population study. PLoS Med. 7, 1–9 (2010)
Article Google Scholar
Bello, A., et al.: Using linked administrative data to study periprocedural mortality in obesity and chronic kidney disease (CKD). Nephrol. Dial. Transpl. 28, iv57–iv64 (2013)
Article Google Scholar
Loos, R.J.F.: Genetic determinants of common obesity and their value in prediction. Best Pract. Res. Clin. Endocrinol. Metab. 26, 211–226 (2012)
Article Google Scholar
Samish, I., Bourne, P.E., Najmanovich, R.J.: Achievements and challenges in structural bioinformatics and computational biophysics. Bioinformatics 31, 146–150 (2014)
Article Google Scholar
Higdon, R., et al.: Unravelling the complexities of life sciences data. Big Data 1, 17–23 (2012)
Google Scholar
Tanwani, A.K., Afridi, J., Shafiq, M.Zubair, Farooq, M.: Guidelines to select machine learning scheme for classification of biomedical datasets. In: Pizzuti, C., Ritchie, Marylyn D., Giacobini, M. (eds.) EvoBIO 2009. LNCS, vol. 5483, pp. 128–139. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01184-9_12
Chapter Google Scholar
Su, P.: Direct-to-consumer genetic testing: a comprehensive view. Yale J. Biol. Med. 86, 59–65 (2013)
Google Scholar
Ball, M.P., et al.: Harvard personal genome project: lessons from participatory public research. Genome Med. 6, 10 (2014)
Article Google Scholar
Glez-Pena, D., Lourenco, A., Lopez-Fernandez, H., Reboiro-Jato, M., Fdez-Riverola, F.: Web scraping technologies in an API world. Brief. Bioinform. 15, 788–797 (2014)
Article Google Scholar
Marx, V.: Biology: the big challenges of big data. Nature 498, 255–260 (2013)
Article Google Scholar
Tryka, K.A., et al.: NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res. 42, D975–D979 (2014)
Article Google Scholar
Gonzaga-Jauregui, C., Lupski, J.R., Gibbs, R.A.: Human genome sequencing in health and disease. Annu. Rev. Med. 63, 35–61 (2012)
Article Google Scholar
Bush, W.S., Moore, J.H.: Chapter 11: Genome-wide association studies. PLoS Comput. Biol. 8, e1002822 (2012). doi:10.1371/journal.pcbi.1002822
Article Google Scholar
Fadista, J., Manning, A.K., Florez, J.C., Groop, L.: The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 24, 1202–1205 (2016)
Article Google Scholar
Zhang, Y.-B., et al.: Genome-wide association study identifies multiple susceptibility loci for craniofacial microsomia. Nat. Commun. 7, 10605 (2016)
Article Google Scholar
Stoeklé, H.-C., Mamzer-Bruneel, M.-F., Vogt, G., Hervé, C.: 23andMe: a new two-sided data-banking market model. BMC Med. Ethics. 17, 19 (2016)
Article Google Scholar
Purcell, S., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)
Article Google Scholar
Anderson, C.A., Pettersson, F.H., Clarke, G.M., Cardon, L.R., Morris, A.P., Zondervan, K.T.: Data quality control in genetic case-control association studies. Nat. Protoc. 5, 64–73 (2010)
Article Google Scholar
Turner, S., et al.: Quality control procedures for genome-wide association studies. Curr. Protoc. Hum. Genet. Chapter 1, Unit1.19 (2011). doi:10.1002/0471142905.hg0119s68
Reed, E., Nunez, S., Kulp, D., Qian, J., Reilly, M.P., Foulkes, A.S.: A guide to genome-wide association analysis and post-analytic interrogation. Stat. Med. 34, 3769–3792 (2015)
Article MathSciNet Google Scholar
Gül, H., Aydin Son, Y., Açikel, C.: Discovering missing heritability and early risk prediction for type 2 diabetes: a new perspective for genome-wide association study analysis with the Nurses’ Health Study and the Health Professionals’ Follow-Up Study. Turkish J. Med. Sci. 44, 946–954 (2014)
Article Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature Selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010)
Article Google Scholar
Cordell, H.J.: Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 10, 392–404 (2009)
Article Google Scholar
Curbelo Montañez, C.A. et al.: Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In: 2017 International Joint Conference on Neural Networks (IJCNN), p. 8, Anchorage, Alaska (2017)
Google Scholar
Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008)
Article Google Scholar
Stein, L.: Creating a bioinformatics nation. Nature 417, 119–120 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Liverpool John Moores University, Liverpool, UK
Casimiro Aday Curbelo Montañez, Paul Fergus, Abir Hussain & Dhiya Al-Jumeily
Liverpool Hope University, Liverpool, UK
Mehmet Tevfik Dorak
Universiti Sains Malaysia, George Town, Malaysia
Rosni Abdullah

Authors

Casimiro Aday Curbelo Montañez
View author publications
You can also search for this author in PubMed Google Scholar
Paul Fergus
View author publications
You can also search for this author in PubMed Google Scholar
Abir Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Dhiya Al-Jumeily
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Tevfik Dorak
View author publications
You can also search for this author in PubMed Google Scholar
Rosni Abdullah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Casimiro Aday Curbelo Montañez .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Curbelo Montañez, C.A., Fergus, P., Hussain, A., Al-Jumeily, D., Dorak, M.T., Abdullah, R. (2017). Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-63312-1_31
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics