Abstract
Conventional epidemiological analyses in health-related research have been successful in identifying individual risk factors for adverse health outcomes, e.g. cigarettes’ effect on lung cancer. However, for conditions that are multifactorial or for which multiple variables interact to affect risk, these approaches have been less successful. Machine learning approaches such as classifiers can improve risk prediction due to their ability to empirically detect patterns of variables that are “diagnostic” of a particular outcome, over the conventional approach of examining isolated, statistically independent relationships that are specified a priori. This chapter presents a proof-of-concept using several classifiers (discriminant analysis, support vector machines (SVM), and neural nets) to classify obesity from 18 dietary and physical activity variables. Random subsampling cross-validation was used to measure prediction accuracy. Classifiers outperformed logistic regressions: quadratic discriminant analysis (QDA) correctly classified 59% of cases versus logistic regression’s 55% using original, unbalanced data; and radial-basis SVM classified nearly 61% of cases using balanced data, versus logistic regression’s 59% prediction accuracy. Moreover, radial SVM predicted both categories (obese and non-obese) above chance simultaneously, while some other methods achieved above-chance prediction accuracy for only one category, usually to the detriment of the other. These findings show that obesity can be more accurately classified by a combination or pattern of dietary and physical activity behaviors, than by individual variables alone. Classifiers have the potential to inform more effective nutritional guidelines and treatments for obesity. More generally, machine learning methods can improve risk prediction for health outcomes over conventional epidemiological approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III (1961) Factors of risk in the development of coronary heart disease–six year follow-up experience. The Framingham study. Ann Intern Med 55:33–50
Goff DJ, Lloyd-Jones D, Bennett G, Coady S, D’Agostino RBS, Gibbons R, Greenland P, Lackland D, Levy D, O’Donnell CRJ, Schwartz J, Smith SJ, Sorlie P, Shero S, Stone N, WIlson P (2014) 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. Circulation 129(suppl 2):S49–S73. https://doi.org/10.1161/01.cir.0000437741.48606.98
Crescenzo R, Bianco F, Mazzoli A, Giacco A, Cancelliere R, di Fabio G, Zarrelli A, Liverini G, Iossa S (2015) Fat quality influences the obesogenic effect of high fat diets. Nutrients 7(11):9475–9491. https://doi.org/10.3390/nu7115480
Riccardi G, Giacco R, Rivellese AA (2004) Dietary fat, insulin sensitivity and the metabolic syndrome. Clin Nutr 23(4):447–456. https://doi.org/10.1016/j.clnu.2004.02.006
U.S. Department of Health and Human Services, U.S. Department of Agriculture (2015) 2015–2020 dietary guidelines for Americans, 8th edn
Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. In: Springer series in statistics. Springer
Link BG, Phelan J (1995) Social conditions as fundamental causes of disease. J Health Soc Behav 80–94
Carocci A, Rovito N, Sinicropi MS, Genchi G (2014) Mercury toxicity and neurodegenerative effects. Rev Environ Contam Toxicol 229:1–18. https://doi.org/10.1007/978-3-319-03777-6_1
Solan TD, Lindow SW (2014) Mercury exposure in pregnancy: a review. J Perinat Med 42(6):725–729. https://doi.org/10.1515/jpm-2013-0349
Ralston NV, Ralston CR, Raymond LJ (2016) Selenium health benefit values: updated criteria for mercury risk assessments. Biol Trace Elem Res 171(2):262–269. https://doi.org/10.1007/s12011-015-0516-z
Hanson SJ, Schmidt A (2011) High-resolution imaging of the fusiform face area (FFA) using multivariate non-linear classifiers shows diagnosticity for non-face categories. Neuroimage 54(2):1715–1734. https://doi.org/10.1016/j.neuroimage.2010.08.028
Hanson SJ, Matsuka T, Haxby JV (2004) Combinatorial codes in ventral temporal lobe for object recognition: Haxby (2001) revisited: is there a “face” area? Neuroimage 23(1):156–166. https://doi.org/10.1016/j.neuroimage.2004.05.020
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, New York
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge, UK
Rumelhart DE, McClelland JL (1986) Psychological and biological models. MIT Press, Cambridge, MA
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
American Psychological Association (2010) Publication manual of the American Psychological Association, 6th edn. American Psychological Association, Washington, D.C
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2 edn. Lawrence Erlbaum Associates
Satia-Abouta J, Patterson RE, Schiller RN, Kristal AR (2002) Energy from fat is associated with obesity in U.S. men: results from the prostate cancer prevention Trial. Prev Med 34(5):493–501. https://doi.org/10.1006/pmed.2002.1018
Ogden CL, Carroll MD, Fryar CD, Flegal KM (2015) Prevalence of obesity among adults and youth: United States, 2011–2014. NCHS data brief, vol 219, Hyattsville, MD
National Institutes of Health (1998) Clinical guidelines on the identification, evaluation, and treatment of overweight and obesity in adults. vol NIH Publication No. 98-4083. U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, and National Heart, Lung, and Blood Institute
Hill JO, Wyatt HR, Peters JC (2012) Energy balance and obesity. Circulation 126(1):126–132. https://doi.org/10.1161/circulationaha.111.087213
Tucker LA, Kano MJ (1992) Dietary fat and body fat: a multivariate study of 205 adult females. Am J Clin Nutr 56(4):616–622
Walker TB, Parker MJ (2014) Lessons from the war on dietary fat. J Am Coll Nutr 33(4):347–351. https://doi.org/10.1080/07315724.2013.870055
Nau C, Ellis H, Huang H, Schwartz BS, Hirsch A, Bailey-Davis L, Kress AM, Pollak J, Glass TA (2015) Exploring the forest instead of the trees: an innovative method for defining obesogenic and obesoprotective environments. Health Place 35:136–146. https://doi.org/10.1016/j.healthplace.2015.08.002
Giabbanelli PJ, Adams J (2016) Identifying small groups of foods that can predict achievement of key dietary recommendations: data mining of the UK National Diet and Nutrition survey, 2008–12. Public Health Nutr 19(9):1543–1551. https://doi.org/10.1017/S1368980016000185
Seyednasrollah F, Makela J, Pitkanen N, Juonala M, Hutri-Kahonen N, Lehtimaki T, Viikari J, Kelly T, Li C, Bazzano L, Elo LL, Raitakari OT (2017) Prediction of adulthood obesity using genetic and childhood clinical risk factors in the cardiovascular risk in Young Finns study. Circ Cardiovasc Genet 10(3). https://doi.org/10.1161/circgenetics.116.001554
Dugan TM, Mukhopadhyay S, Carroll A, Downs S (2015) Machine learning techniques for prediction of early childhood obesity. Appl Clin Inform 6(3):506–520. https://doi.org/10.4338/aci-2015-03-ra-0036
Sze MA, Schloss PD (2016) Looking for a signal in the noise: revisiting obesity and the microbiome. mBio 7(4). https://doi.org/10.1128/mbio.01018-16
Lee BJ, Kim KH, Ku B, Jang JS, Kim JY (2013) Prediction of body mass index status from voice signals based on machine learning for automated medical applications. Artif Intell Med 58(1):51–61. https://doi.org/10.1016/j.artmed.2013.02.001
Centers for Disease Control and Prevention, National Center for Health Statistics (2014) National Health and Nutrition Examination Survey (NHANES) MEC In-Person Dietary Interviewers Procedures Manual. Centers for Disease Control and Prevention
Centers for Disease Control and Prevention (2016) Defining adult overweight and obesity. Centers for Disease Control and Prevention. https://www.cdc.gov/obesity/adult/defining.html. Accessed 21 June 2017
Team RC (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Dal Pazzolo A, Caelen O, Bontempi G (2015) Unbalanced: racing for unbalanced methods selection. R package version, 2.0 edn
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2015) e1071: misc functions of the department of statistics. R package version 1.6-7 edn. Probability Theory Group (Formerly: E1071), TU Wien
Thaiss CA, Itav S, Rothschild D, Meijer M, Levy M, Moresi C, Dohnalova L, Braverman S, Rozin S, Malitsky S, Dori-Bachash M, Kuperman Y, Biton I, Gertler A, Harmelin A, Shapiro H, Halpern Z, Aharoni A, Segal E, Elinav E (2016) Persistent microbiome alterations modulate the rate of post-dieting weight regain. Nature. https://doi.org/10.1038/nature20796
Kerr J, Patterson RE, Ellis K, Godbole S, Johnson E, Lanckriet G, Staudenmayer J (2016) Objective assessment of physical activity: classifiers for public health. Med Sci Sports Exerc 48(5):951–957. https://doi.org/10.1249/mss.0000000000000841
Karimi-Alavijeh F, Jalili S, Sadeghi M (2016) Predicting metabolic syndrome using decision tree and support vector machine methods. ARYA Atherosclerosis 12(3):146–152
Crutzen R, Giabbanelli PJ, Jander A, Mercken L, de Vries H (2015) Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: building classifiers on adolescent-parent paired data. BMC Public Health 15:747. https://doi.org/10.1186/s12889-015-2092-8
Crutzen R, Giabbanelli P (2014) Using classifiers to identify binge drinkers based on drinking motives. Subst Use Misuse 49(1–2):110–115
Golino HF, Amaral LS, Duarte SF, Gomes CM, Soares Tde J, Dos Reis LA, Santos J (2014) Predicting increased blood pressure using machine learning. J Obes 2014:637635. https://doi.org/10.1155/2014/637635
Dierker L, Rose J, Tan X, Li R (2010) Uncovering multiple pathways to substance use: a comparison of methods for identifying population subgroups. J Prim Prev 31(5–6):333–348. https://doi.org/10.1007/s10935-010-0224-6
Pugach O, Cannon DS, Weiss RB, Hedeker D, Mermelstein RJ (2017) Classification tree analysis as a method for uncovering relations between CHRNA5A3B4 and CHRNB3A6 in predicting smoking progression in adolescent smokers. Nicotine Tob Res 19(4):410–416. https://doi.org/10.1093/ntr/ntw197
Hoenselaar R (2012) Saturated fat and cardiovascular disease: the discrepancy between the scientific literature and dietary advice. Nutrition (Burbank, Los Angeles County, Calif) 28(2):118–123. https://doi.org/10.1016/j.nut.2011.08.017
Boateng G, Batsis JA, Halter R, Kotz D (2017) ActivityAware: an app for real-time daily activity level monitoring on the amulet wrist-worn device. In: Proceedings of the IEEE international conference on pervasive computing and communications workshops 2017. https://doi.org/10.1109/percomw.2017.7917601
Pande A, Zhu J, Das AK, Zeng Y, Mohapatra P, Han JJ (2015) Using smartphone sensors for improving energy expenditure estimation. IEEE J Transl Eng Health Med 3:2700212. https://doi.org/10.1109/jtehm.2015.2480082
Taylor D, Murphy J, Ahmad M, Purkayastha S, Scholtz S, Ramezani R, Vlaev I, Blakemore AI, Darzi A (2016) Quantified-self for obesity: physical activity behaviour sensing to improve health outcomes. Stud Health Technol Inform 220:414–416
Prioleau T, Moore E, Ghovanloo M (2017) Unobtrusive and wearable systems for automatic dietary monitoring. IEEE Trans Biomed Eng 99:1–1. https://doi.org/10.1109/tbme.2016.2631246
Tabayoyong W, Abouassaly R (2015) Prostate cancer screening and the associated controversy. Surg Clin N Am 95(5):1023–1039. https://doi.org/10.1016/j.suc.2015.05.001
Beleites C, Neugebauer U, Bocklitz T, Krafft C, Popp J (2013) Sample size planning for classification models. Anal Chim Acta 760:25–33. https://doi.org/10.1016/j.aca.2012.11.007
Beccuti G, Pannain S (2011) Sleep and obesity. Curr Opin Clin Nutr Metab Care 14(4):402–412. https://doi.org/10.1097/MCO.0b013e3283479109
Turta O, Rautava S (2016) Antibiotics, obesity and the link to microbes—what are we doing to our children? BMC Med 14:57. https://doi.org/10.1186/s12916-016-0605-7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Selya, A.S., Anshutz, D. (2018). Machine Learning for the Classification of Obesity from Dietary and Physical Activity Patterns. In: Giabbanelli, P., Mago, V., Papageorgiou, E. (eds) Advanced Data Analytics in Health. Smart Innovation, Systems and Technologies, vol 93. Springer, Cham. https://doi.org/10.1007/978-3-319-77911-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-77911-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77910-2
Online ISBN: 978-3-319-77911-9
eBook Packages: EngineeringEngineering (R0)