Mobile Health pp 151-174 | Cite as

Challenges and Opportunities in Automated Detection of Eating Activity



Motivated by applications in nutritional epidemiology and food journaling, computing researchers have proposed numerous techniques for automating dietary monitoring over the years. Although progress has been made, a truly practical system that can automatically recognize what people eat in real-world settings remains elusive. Eating detection is a foundational element of automated dietary monitoring (ADM) since automatically recognizing when a person is eating is required before identifying what and how much is being consumed. Additionally, eating detection can serve as the basis for new types of dietary self-monitoring practices such as semi-automated food journaling.This chapter discusses the problem of automated eating detection and presents a variety of practical techniques for detecting eating activities in real-world settings. These techniques center on three sensing modalities: first-person images taken with wearable cameras, ambient sounds, and on-body inertial sensors [34, 35, 36, 37]. The chapter begins with an analysis of how first-person images reflecting everyday experiences can be used to identify eating moments using two approaches: human computation and convolutional neural networks. Next, we present an analysis showing how certain sounds associated with eating can be recognized and used to infer eating activities. Finally, we introduce a method for detecting eating moments with on-body inertial sensors placed on the wrist.


  1. 1.
    Amft, O. and Tröster, G., “On-Body Sensing Solutions for Automatic Dietary Monitoring,” IEEE pervasive computing, vol. 8, Apr. 2009.Google Scholar
  2. 2.
    Bäckström, T. and Magi, C., “Properties of line spectrum pair polynomials—A review,” Signal Processing, vol. 86, pp. 3286–3298, Nov. 2006.Google Scholar
  3. 3.
    Boushey, C. J., Coulston, A. M., Rock, C. L., and Monsen, E., Nutrition in the Prevention and Treatment of Disease. Academic Press, 2001.Google Scholar
  4. 4.
    Castro, D., Hickson, S., Bettadapura, V., Thomaz, E., Abowd, G.D., Christensen, H. and Essa, I., “Predicting daily activities from egocentric images using deep learning,” in Proceedings of the 2015 ACM International symposium on Wearable Computers, pp.75–82, 2015.Google Scholar
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L., “Imagenet: A large-scale hierarchical image database,” in CVPR, pp. 248–255, IEEE, 2009.Google Scholar
  6. 6.
    Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.,” KDD, pp. 226–231, 1996.Google Scholar
  7. 7.
    Farb, P. and Armelagos, G., Consuming passions, the anthropology of eating. Houghton Mifflin, 1980.Google Scholar
  8. 8.
    Fouse, A., Weibel, N., Hutchins, E., and Hollan, J. D., “ChronoViz: a system for supporting navigation of time-coded data.,” CHI Extended Abstracts, pp. 299–304, 2011.Google Scholar
  9. 9.
    Gillet, O. and Richard, G., “Automatic transcription of drum loops,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. iv–269–iv–272, IEEE, 2004.Google Scholar
  10. 10.
    Go, V. L. W., Nguyen, C. T. H., Harris, D. M., and Lee, W.-N. P., “Nutrient-gene interaction: metabolic genotype-phenotype relationship.,” The Journal of nutrition, vol. 135, pp. 3016S–3020S, Dec. 2005.Google Scholar
  11. 11.
    Gowdy, J., Limited wants, unlimited means: A reader on hunter-gatherer economics and the environment. Island Press, 1997.Google Scholar
  12. 12.
    Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R., “Improving neural networks by preventing co-adaptation of feature detectors,” CoRR, 2012.Google Scholar
  13. 13.
    Hoyle, R., Templeman, R., Armes, S., Anthony, D., Crandall, D., and Kapadia, A., “Privacy behaviors of lifeloggers using wearable cameras,” in the 2014 ACM International Joint Conference, (New York, New York, USA), pp. 571–582, ACM Press, 2014.Google Scholar
  14. 14.
    Jacobs, D. R., “Challenges in research in nutritional epidemiology,” Nutritional Health, pp. 29–42, 2012.Google Scholar
  15. 15.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T., “Caffe: Convolutional architecture for fast feature embedding,” in ACM Multimedia, pp. 675–678, 2014.Google Scholar
  16. 16.
    Kahneman, D., Krueger, A. B., Schkade, D. A., and Schwarz, N., “A Survey Method for Characterizing Daily Life Experience: The Day Reconstruction Method,” Science, 2004.Google Scholar
  17. 17.
    Kelly, P., Marshall, S. J., Badland, H., Kerr, J., Oliver, M., Doherty, A. R., and Foster, C., “An ethical framework for automated, wearable cameras in health behavior research.,” American journal of preventive medicine, vol. 44, pp. 314–319, Mar. 2013.Google Scholar
  18. 18.
    Kleitman, N., Sleep and wakefulness. Chicago: The University of Chicago Press, July 1963.Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., and Hinton, G. E., “Imagenet classification with deep convolutional neural networks,” in NIPS, pp. 1097–1105, 2012.Google Scholar
  20. 20.
    LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P., “Gradient-based learning applied to document recognition,” IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.Google Scholar
  21. 21.
    Liu, J., Johns, E., Atallah, L., Pettitt, C., Lo, B., Frost, G., and Yang, G.-Z., “An Intelligent Food-Intake Monitoring System Using Wearable Sensors,” in Wearable and Implantable Body Sensor Networks (BSN), 2012 Ninth International Conference on, pp. 154–160,  IEEE Computer Society, 2012.Google Scholar
  22. 22.
    Lu, H., Pan, W., Lane, N., Choudhury, T., and Campbell, A., “SoundSense: scalable sound sensing for people-centric applications on mobile phones,” Proceedings of the 7th international conference on Mobile systems, applications, and services, pp. 165–178, 2009.Google Scholar
  23. 23.
    Makhoul, J., “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, pp. 561–580, Apr. 1975.Google Scholar
  24. 24.
    Mathieu, B., Essid, S., Fillon, T., Prado, J., and Richard, G., “YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software,” in proceedings of the 11th ISMIR conference, 2010, Sept. 2010.Google Scholar
  25. 25.
    Michels, K. B., “A renaissance for measurement error.,” International journal of epidemiology, vol. 30, pp. 421–422, June 2001.Google Scholar
  26. 26.
    Mintz, S. W. and Du Bois, C. M., “The anthropology of food and eating,” Annual review of anthropology, pp. 99–119, 2002.Google Scholar
  27. 27.
    Moore, B. C. J., Glasberg, B. R., and Baer, T., “A Model for the Prediction of Thresholds, Loudness, and Partial Loudness,” Journal of the Audio Engineering Society, vol. 45, no. 4, pp. 224–240, 1997.Google Scholar
  28. 28.
    Nguyen, D. H., Marcu, G., Hayes, G. R., Truong, K. N., Scott, J., Langheinrich, M., and Roduner, C., “Encountering SenseCam: personal recording technologies in everyday life,” pp. 165–174, 2009.Google Scholar
  29. 29.
    Rossi, M., Feese, S., Amft, O., Braune, N., Martis, S., and Tröster, G., “AmbientSense: A real-time ambient sound recognition system for smartphones,” in Pervasive Computing and Communications Workshops (PERCOM Workshops), 2013 IEEE International Conference on, pp. 230–235, 2013.Google Scholar
  30. 30.
    Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T., “LabelMe: A Database and Web-Based Tool for Image Annotation,” International Journal of Computer Vision, vol. 77, May 2008.Google Scholar
  31. 31.
    Scheirer, E. and Slaney, M., “Construction and evaluation of a robust multifeature speech/music discriminator,” IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331–1334, 1997., vol. 2, pp. 1331–1334, 1997.Google Scholar
  32. 32.
    Schussler, H., “A stability theorem for discrete systems,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, pp. 87–89, Feb. 1976.Google Scholar
  33. 33.
    Sorokin, A. and Forsyth, D., “Utility data annotation with Amazon Mechanical Turk,” Audio, Transactions of the IRE Professional Group on, pp. 1–8, June 2008.Google Scholar
  34. 34.
    Thomaz, E., Abowd, G., and Essa, I., “A Practical Approach for Recognizing Eating Moments with Wrist-Mounted Inertial Sensing,” in UbiComp ’15: Proceedings of the 2015 ACM international joint conference on Pervasive and ubiquitous computing, pp. 1–12, July 2015.Google Scholar
  35. 35.
    Thomaz, E., Parnami, A., Bidwell, J., Essa, I. A., and Abowd, G. D., “Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras.,” UbiComp, pp. 739–748, 2013.Google Scholar
  36. 36.
    Thomaz, E., Parnami, A., Essa, I. A., and Abowd, G. D., “Feasibility of identifying eating moments from first-person images leveraging human computation.,” SenseCam, pp. 26–33, 2013.Google Scholar
  37. 37.
    Thomaz, E., Zhang, C., Essa, I., and Abowd, G. D., “Inferring Meal Eating Activities in Real World Settings from Ambient Sounds,” in the 20th Intelligent User Interfaces Conference (IUI), (New York, New York, USA), pp. 427–431, ACM Press, 2015.Google Scholar
  38. 38.
    von Ahn, L. and Dabbish, L., “Labeling images with a computer game,” in CHI ’04: Proceedings of the SIGCHI conference on Human factors in computing systems,  ACM Request Permissions, Apr. 2004.Google Scholar
  39. 39.
    von Ahn, L., Liu, R., and Blum, M., “Peekaboom: a game for locating objects in images,” in CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems,  ACM Request Permissions, Apr. 2006.Google Scholar
  40. 40.
    Willett, W., Nutritional Epidemiology. Oxford University Press, Oct. 2012.Google Scholar
  41. 41.
    Wyatt, D., Choudhury, T., and Bilmes, J., “Conversation detection and speaker segmentation in privacy-sensitive situated speech data.,” Proceedings of Interspeech, pp. 586–589, 2007.Google Scholar
  42. 42.
    Yatani, K. and Truong, K. N., “BodyScope: a wearable acoustic sensor for activity recognition,” UbiComp ’12: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 341–350, 2012.Google Scholar
  43. 43.
    Zeiler, M. D. and Fergus, R., “Visualizing and understanding convolutional networks,” in ECCV, pp. 818–833, Springer, 2014.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.The University of Texas at AustinAustinUSA
  2. 2.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations