Skip to main content

Machine Learning for the Classification of Obesity from Dietary and Physical Activity Patterns

  • Chapter
  • First Online:
Advanced Data Analytics in Health

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 93))

Abstract

Conventional epidemiological analyses in health-related research have been successful in identifying individual risk factors for adverse health outcomes, e.g. cigarettes’ effect on lung cancer. However, for conditions that are multifactorial or for which multiple variables interact to affect risk, these approaches have been less successful. Machine learning approaches such as classifiers can improve risk prediction due to their ability to empirically detect patterns of variables that are “diagnostic” of a particular outcome, over the conventional approach of examining isolated, statistically independent relationships that are specified a priori. This chapter presents a proof-of-concept using several classifiers (discriminant analysis, support vector machines (SVM), and neural nets) to classify obesity from 18 dietary and physical activity variables. Random subsampling cross-validation was used to measure prediction accuracy. Classifiers outperformed logistic regressions: quadratic discriminant analysis (QDA) correctly classified 59% of cases versus logistic regression’s 55% using original, unbalanced data; and radial-basis SVM classified nearly 61% of cases using balanced data, versus logistic regression’s 59% prediction accuracy. Moreover, radial SVM predicted both categories (obese and non-obese) above chance simultaneously, while some other methods achieved above-chance prediction accuracy for only one category, usually to the detriment of the other. These findings show that obesity can be more accurately classified by a combination or pattern of dietary and physical activity behaviors, than by individual variables alone. Classifiers have the potential to inform more effective nutritional guidelines and treatments for obesity. More generally, machine learning methods can improve risk prediction for health outcomes over conventional epidemiological approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III (1961) Factors of risk in the development of coronary heart disease–six year follow-up experience. The Framingham study. Ann Intern Med 55:33–50

    Article  Google Scholar 

  2. Goff DJ, Lloyd-Jones D, Bennett G, Coady S, D’Agostino RBS, Gibbons R, Greenland P, Lackland D, Levy D, O’Donnell CRJ, Schwartz J, Smith SJ, Sorlie P, Shero S, Stone N, WIlson P (2014) 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. Circulation 129(suppl 2):S49–S73. https://doi.org/10.1161/01.cir.0000437741.48606.98

    Article  Google Scholar 

  3. Crescenzo R, Bianco F, Mazzoli A, Giacco A, Cancelliere R, di Fabio G, Zarrelli A, Liverini G, Iossa S (2015) Fat quality influences the obesogenic effect of high fat diets. Nutrients 7(11):9475–9491. https://doi.org/10.3390/nu7115480

    Article  Google Scholar 

  4. Riccardi G, Giacco R, Rivellese AA (2004) Dietary fat, insulin sensitivity and the metabolic syndrome. Clin Nutr 23(4):447–456. https://doi.org/10.1016/j.clnu.2004.02.006

    Article  Google Scholar 

  5. U.S. Department of Health and Human Services, U.S. Department of Agriculture (2015) 2015–2020 dietary guidelines for Americans, 8th edn

    Google Scholar 

  6. Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. In: Springer series in statistics. Springer

    Google Scholar 

  7. Link BG, Phelan J (1995) Social conditions as fundamental causes of disease. J Health Soc Behav 80–94

    Article  Google Scholar 

  8. Carocci A, Rovito N, Sinicropi MS, Genchi G (2014) Mercury toxicity and neurodegenerative effects. Rev Environ Contam Toxicol 229:1–18. https://doi.org/10.1007/978-3-319-03777-6_1

    Article  Google Scholar 

  9. Solan TD, Lindow SW (2014) Mercury exposure in pregnancy: a review. J Perinat Med 42(6):725–729. https://doi.org/10.1515/jpm-2013-0349

    Article  Google Scholar 

  10. Ralston NV, Ralston CR, Raymond LJ (2016) Selenium health benefit values: updated criteria for mercury risk assessments. Biol Trace Elem Res 171(2):262–269. https://doi.org/10.1007/s12011-015-0516-z

    Article  Google Scholar 

  11. Hanson SJ, Schmidt A (2011) High-resolution imaging of the fusiform face area (FFA) using multivariate non-linear classifiers shows diagnosticity for non-face categories. Neuroimage 54(2):1715–1734. https://doi.org/10.1016/j.neuroimage.2010.08.028

    Article  Google Scholar 

  12. Hanson SJ, Matsuka T, Haxby JV (2004) Combinatorial codes in ventral temporal lobe for object recognition: Haxby (2001) revisited: is there a “face” area? Neuroimage 23(1):156–166. https://doi.org/10.1016/j.neuroimage.2004.05.020

    Article  Google Scholar 

  13. Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  14. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  15. Rumelhart DE, McClelland JL (1986) Psychological and biological models. MIT Press, Cambridge, MA

    Google Scholar 

  16. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  17. American Psychological Association (2010) Publication manual of the American Psychological Association, 6th edn. American Psychological Association, Washington, D.C

    Google Scholar 

  18. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2 edn. Lawrence Erlbaum Associates

    Google Scholar 

  19. Satia-Abouta J, Patterson RE, Schiller RN, Kristal AR (2002) Energy from fat is associated with obesity in U.S. men: results from the prostate cancer prevention Trial. Prev Med 34(5):493–501. https://doi.org/10.1006/pmed.2002.1018

    Article  Google Scholar 

  20. Ogden CL, Carroll MD, Fryar CD, Flegal KM (2015) Prevalence of obesity among adults and youth: United States, 2011–2014. NCHS data brief, vol 219, Hyattsville, MD

    Google Scholar 

  21. National Institutes of Health (1998) Clinical guidelines on the identification, evaluation, and treatment of overweight and obesity in adults. vol NIH Publication No. 98-4083. U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, and National Heart, Lung, and Blood Institute

    Google Scholar 

  22. Hill JO, Wyatt HR, Peters JC (2012) Energy balance and obesity. Circulation 126(1):126–132. https://doi.org/10.1161/circulationaha.111.087213

    Article  Google Scholar 

  23. Tucker LA, Kano MJ (1992) Dietary fat and body fat: a multivariate study of 205 adult females. Am J Clin Nutr 56(4):616–622

    Article  Google Scholar 

  24. Walker TB, Parker MJ (2014) Lessons from the war on dietary fat. J Am Coll Nutr 33(4):347–351. https://doi.org/10.1080/07315724.2013.870055

    Article  Google Scholar 

  25. Nau C, Ellis H, Huang H, Schwartz BS, Hirsch A, Bailey-Davis L, Kress AM, Pollak J, Glass TA (2015) Exploring the forest instead of the trees: an innovative method for defining obesogenic and obesoprotective environments. Health Place 35:136–146. https://doi.org/10.1016/j.healthplace.2015.08.002

    Article  Google Scholar 

  26. Giabbanelli PJ, Adams J (2016) Identifying small groups of foods that can predict achievement of key dietary recommendations: data mining of the UK National Diet and Nutrition survey, 2008–12. Public Health Nutr 19(9):1543–1551. https://doi.org/10.1017/S1368980016000185

    Article  Google Scholar 

  27. Seyednasrollah F, Makela J, Pitkanen N, Juonala M, Hutri-Kahonen N, Lehtimaki T, Viikari J, Kelly T, Li C, Bazzano L, Elo LL, Raitakari OT (2017) Prediction of adulthood obesity using genetic and childhood clinical risk factors in the cardiovascular risk in Young Finns study. Circ Cardiovasc Genet 10(3). https://doi.org/10.1161/circgenetics.116.001554

    Google Scholar 

  28. Dugan TM, Mukhopadhyay S, Carroll A, Downs S (2015) Machine learning techniques for prediction of early childhood obesity. Appl Clin Inform 6(3):506–520. https://doi.org/10.4338/aci-2015-03-ra-0036

    Article  Google Scholar 

  29. Sze MA, Schloss PD (2016) Looking for a signal in the noise: revisiting obesity and the microbiome. mBio 7(4). https://doi.org/10.1128/mbio.01018-16

    Article  Google Scholar 

  30. Lee BJ, Kim KH, Ku B, Jang JS, Kim JY (2013) Prediction of body mass index status from voice signals based on machine learning for automated medical applications. Artif Intell Med 58(1):51–61. https://doi.org/10.1016/j.artmed.2013.02.001

    Article  Google Scholar 

  31. Centers for Disease Control and Prevention, National Center for Health Statistics (2014) National Health and Nutrition Examination Survey (NHANES) MEC In-Person Dietary Interviewers Procedures Manual. Centers for Disease Control and Prevention

    Google Scholar 

  32. Centers for Disease Control and Prevention (2016) Defining adult overweight and obesity. Centers for Disease Control and Prevention. https://www.cdc.gov/obesity/adult/defining.html. Accessed 21 June 2017

  33. Team RC (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  34. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  35. Dal Pazzolo A, Caelen O, Bontempi G (2015) Unbalanced: racing for unbalanced methods selection. R package version, 2.0 edn

    Google Scholar 

  36. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Book  Google Scholar 

  37. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2015) e1071: misc functions of the department of statistics. R package version 1.6-7 edn. Probability Theory Group (Formerly: E1071), TU Wien

    Google Scholar 

  38. Thaiss CA, Itav S, Rothschild D, Meijer M, Levy M, Moresi C, Dohnalova L, Braverman S, Rozin S, Malitsky S, Dori-Bachash M, Kuperman Y, Biton I, Gertler A, Harmelin A, Shapiro H, Halpern Z, Aharoni A, Segal E, Elinav E (2016) Persistent microbiome alterations modulate the rate of post-dieting weight regain. Nature. https://doi.org/10.1038/nature20796

  39. Kerr J, Patterson RE, Ellis K, Godbole S, Johnson E, Lanckriet G, Staudenmayer J (2016) Objective assessment of physical activity: classifiers for public health. Med Sci Sports Exerc 48(5):951–957. https://doi.org/10.1249/mss.0000000000000841

    Article  Google Scholar 

  40. Karimi-Alavijeh F, Jalili S, Sadeghi M (2016) Predicting metabolic syndrome using decision tree and support vector machine methods. ARYA Atherosclerosis 12(3):146–152

    Google Scholar 

  41. Crutzen R, Giabbanelli PJ, Jander A, Mercken L, de Vries H (2015) Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: building classifiers on adolescent-parent paired data. BMC Public Health 15:747. https://doi.org/10.1186/s12889-015-2092-8

    Article  Google Scholar 

  42. Crutzen R, Giabbanelli P (2014) Using classifiers to identify binge drinkers based on drinking motives. Subst Use Misuse 49(1–2):110–115

    Article  Google Scholar 

  43. Golino HF, Amaral LS, Duarte SF, Gomes CM, Soares Tde J, Dos Reis LA, Santos J (2014) Predicting increased blood pressure using machine learning. J Obes 2014:637635. https://doi.org/10.1155/2014/637635

    Article  Google Scholar 

  44. Dierker L, Rose J, Tan X, Li R (2010) Uncovering multiple pathways to substance use: a comparison of methods for identifying population subgroups. J Prim Prev 31(5–6):333–348. https://doi.org/10.1007/s10935-010-0224-6

    Article  Google Scholar 

  45. Pugach O, Cannon DS, Weiss RB, Hedeker D, Mermelstein RJ (2017) Classification tree analysis as a method for uncovering relations between CHRNA5A3B4 and CHRNB3A6 in predicting smoking progression in adolescent smokers. Nicotine Tob Res 19(4):410–416. https://doi.org/10.1093/ntr/ntw197

    Article  Google Scholar 

  46. Hoenselaar R (2012) Saturated fat and cardiovascular disease: the discrepancy between the scientific literature and dietary advice. Nutrition (Burbank, Los Angeles County, Calif) 28(2):118–123. https://doi.org/10.1016/j.nut.2011.08.017

    Article  Google Scholar 

  47. Boateng G, Batsis JA, Halter R, Kotz D (2017) ActivityAware: an app for real-time daily activity level monitoring on the amulet wrist-worn device. In: Proceedings of the IEEE international conference on pervasive computing and communications workshops 2017. https://doi.org/10.1109/percomw.2017.7917601

  48. Pande A, Zhu J, Das AK, Zeng Y, Mohapatra P, Han JJ (2015) Using smartphone sensors for improving energy expenditure estimation. IEEE J Transl Eng Health Med 3:2700212. https://doi.org/10.1109/jtehm.2015.2480082

    Article  Google Scholar 

  49. Taylor D, Murphy J, Ahmad M, Purkayastha S, Scholtz S, Ramezani R, Vlaev I, Blakemore AI, Darzi A (2016) Quantified-self for obesity: physical activity behaviour sensing to improve health outcomes. Stud Health Technol Inform 220:414–416

    Google Scholar 

  50. Prioleau T, Moore E, Ghovanloo M (2017) Unobtrusive and wearable systems for automatic dietary monitoring. IEEE Trans Biomed Eng 99:1–1. https://doi.org/10.1109/tbme.2016.2631246

  51. Tabayoyong W, Abouassaly R (2015) Prostate cancer screening and the associated controversy. Surg Clin N Am 95(5):1023–1039. https://doi.org/10.1016/j.suc.2015.05.001

    Article  Google Scholar 

  52. Beleites C, Neugebauer U, Bocklitz T, Krafft C, Popp J (2013) Sample size planning for classification models. Anal Chim Acta 760:25–33. https://doi.org/10.1016/j.aca.2012.11.007

    Article  Google Scholar 

  53. Beccuti G, Pannain S (2011) Sleep and obesity. Curr Opin Clin Nutr Metab Care 14(4):402–412. https://doi.org/10.1097/MCO.0b013e3283479109

    Article  Google Scholar 

  54. Turta O, Rautava S (2016) Antibiotics, obesity and the link to microbes—what are we doing to our children? BMC Med 14:57. https://doi.org/10.1186/s12916-016-0605-7

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arielle S. Selya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Selya, A.S., Anshutz, D. (2018). Machine Learning for the Classification of Obesity from Dietary and Physical Activity Patterns. In: Giabbanelli, P., Mago, V., Papageorgiou, E. (eds) Advanced Data Analytics in Health. Smart Innovation, Systems and Technologies, vol 93. Springer, Cham. https://doi.org/10.1007/978-3-319-77911-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77911-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77910-2

  • Online ISBN: 978-3-319-77911-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics