Multimedia Tools and Applications

, Volume 74, Issue 14, pp 5263–5287 | Cite as

FoodCam: A real-time food recognition system on a smartphone

Article

Abstract

We propose a mobile food recognition system, FoodCam, the purposes of which are estimating calorie and nutrition of foods and recording a user’s eating habits. In this paper, we propose image recognition methods which are suitable for mobile devices. The proposed method enables real-time food image recognition on a consumer smartphone. This characteristic is completely different from the existing systems which require to send images to an image recognition server. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract image features and finally classify it into one of the one hundred food categories with a linear SVM. As image features, we adopt two kinds of features: one is the combination of the standard bag-of-features and color histograms with χ2 kernel feature maps, and the other is a HOG patch descriptor and a color patch descriptor with the state-of-the-art Fisher Vector representation. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, and it shows the estimated direction in an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly and continuously. We implemented this system as a standalone mobile application for Android smartphones so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 79.2 % classification rate for the top 5 category candidates for a 100-category food dataset with the ground-truth bounding boxes when we used HOG and color patches with the Fisher Vector coding as image features. In addition, we obtained positive evaluation by a user study compared to the food recording system without object recognition.

Keywords

Food recognition Dietary recording Smartphone application Fisher vector Mobile image recognition 

References

  1. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359CrossRefGoogle Scholar
  2. Chae J, Woo I, Kim S, Maciejewski R, Zhu F, Delp E, Boushey C, Ebert D (2011) Volume estimation using food specific shape templates in mobile image-based dietary assessment. In: Proceedings of the IS&T/SPIE conference on computational imaging IX, vol 7873, p 78730KGoogle Scholar
  3. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of British machine vision conferenceGoogle Scholar
  4. Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Proceedings of ECCV workshop on statistical learning in computer vision (SLCV), pp 59–74Google Scholar
  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE computer vision and pattern recognitionGoogle Scholar
  6. Deng Y, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 23(8):800–810CrossRefGoogle Scholar
  7. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874Google Scholar
  8. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRefGoogle Scholar
  9. He Y, Xu C, Khanna N, Boushey C, Delp E (2013) Food image analysis: segmentation identification and weight estimation. In: Proceedings of IEEE international conference on multimedia and expoGoogle Scholar
  10. Jia D, Alex B, Sanjeev S, Hao S, Aditya K, Fei-Fei L (2012) Imagenet large scale visual recognition challenge 2012 (ILSVRC2012). http://www.image-net.org/challenges/LSVRC/2012/
  11. Kitamura K, Yamasaki T, Aizawa K (2008) Food log by analyzing food images. In: Proceedings of ACM international conference multimedia, pp 999–1000Google Scholar
  12. Kitamura K, Yamasaki T, Aizawa K (2009) Foodlog: capture, analysis and retrieval of personal food images via web. In: Proceedings of ACM multimedia workshop on multimedia for cooking and eating activities, pp 23–30Google Scholar
  13. Kumar N, Belhumeur P, Biswas A, Jacobs D, Kress W, Lopez I, Soares J (2012) Leafsnap: a computer vision system for automatic plant species identification. In: Proceedings of European conference on computer visionGoogle Scholar
  14. Lampert CH, BlaschkoMB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of IEEE computer vision and pattern recognitionGoogle Scholar
  15. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE computer vision and pattern recognition, pp 2169–2178Google Scholar
  16. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  17. Mariappan A, Bosch M, Zhu F, Boushey C, Kerr D, Ebert D, Delp E (2009) Personal dietary assessment using mobile devices. In: Proceedings of the IS&T/SPIE conference on computational imaging VIIGoogle Scholar
  18. Maruyama T, Kawano Y, Yanai K (2012) Real-time mobile recipe recommendation system using food ingredient recognition. In: Proceedings of ACM MMworkshop on interactivemultimedia on mobile and portable devices(IMMPD)Google Scholar
  19. Matsuda Y, Hoashi H, Yanai K (2012) Recognition of multiple-food images by detecting candidate regions. In: Proceedings of IEEE international conference on multimedia and expoGoogle Scholar
  20. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of IEEE computer vision and pattern recognitionGoogle Scholar
  21. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European conference on computer visionGoogle Scholar
  22. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE computer vision and pattern recognitionGoogle Scholar
  23. Rother C, Kolmogorov V, Blake A (2004) Grabcut: interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH, pp 309–314Google Scholar
  24. Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE Trans Pattern Anal Mach IntellGoogle Scholar
  25. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE computer vision and pattern recognition, pp 3360–3367Google Scholar
  26. Yang S, Chen M, Pomerleau D, Sukthankar R (2010) Food recognition using statistics of pairwise local features. In: Proceedings of IEEE computer vision and pattern recognitionGoogle Scholar
  27. Yu F, Ji R, Chang S (2011) Active query sensing for mobile location search. In: Proceedings of ACM international conference multimediaGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.The University of Electro-CommunicationsTokyoJapan

Personalised recommendations