A Deep Network for Automatic Video-Based Food Bite Detection
Past research has now provided compelling evidence pointing towards correlations among individual eating styles and the development of (un)healthy eating patterns, obesity and other medical conditions. In this setting, an automatic, non-invasive food bite detection system can be a really useful tool in the hands of nutritionists, dietary experts and medical doctors in order to explore real-life eating behaviors and dietary habits. Unfortunately, the automatic detection of food bites can be challenging due to occlusions between hands and mouth, use of different kitchen utensils and personalized eating habits. On the other hand, although accurate, manual bite detection is time-consuming for the annotator, making it infeasible for large scale experimental deployments or real-life settings. To this regard, we propose a novel deep learning methodology that relies solely on human body and face motion data extracted from videos depicting people eating meals. The purpose is to develop a system that can accurately, robustly and automatically identify food bite instances, with the long-term goal to complement or even replace manual bite-annotation protocols currently in use. The experimental results on a large dataset reveal the superb classification performance of the proposed methodology on the task of bite detection and paves the way for additional research on automatic bite detection systems.
KeywordsDeep learning Bite detection Video analysis Motion features
This work was supported by the European Project: PROTEIN Grant no. 817732 with the H2020 Research and Innovation Programme.
- 7.Theodoridis, T., Solachidis, V., Dimitropoulos, K., Gymnopoulos, L. Daras, P.: A survey on AI nutrition recommender systems. In: 12th International Conference on Pervasive Technologies Related to Assistive Environments Conference, Rhodes, Greece (2019)Google Scholar
- 9.Simon, T., Joo, H., Matthews, I. Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 4645–4653 (2017)Google Scholar
- 10.Cao, Z., Simon, T., Wei, S. Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 1302–1310 (2017)Google Scholar
- 12.Papapanagiotou, V., Diou, C., Langlet, B., Ioakimidis, I. Delopoulos, A.: A parametric probabilistic context-free grammar for food intake analysis based on continuous meal weight measurements. In 37th Annual International Conference of the IEEE on Engineering in Medicine and Biology Society (EMBC), pp. 7853–7856 (2015)Google Scholar
- 14.Kyritsis, K., Diou, C., Delopoulos, A.: End-to-end learning for measuring in-meal eating behavior from a smartwatch. In: 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, pp. 5511–5514 (2018)Google Scholar
- 16.Mirtchouk, M. Merck, C. Kleinberg, S.: Automated estimation of food type and amount consumed from body-worn audio and motion sensors. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 451–462 (2016)Google Scholar