A Benchmark Dataset to Study the Representation of Food Images

  • Giovanni Maria Farinella
  • Dario Allegra
  • Filippo Stanco
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8927)

Abstract

It is well-known that people love food. However, an insane diet can cause problems in the general health of the people. Since health is strictly linked to the diet, advanced computer vision tools to recognize food images (e.g. acquired with mobile/wearable cameras), as well as their properties (e.g., calories), can help the diet monitoring by providing useful information to the experts (e.g., nutritionists) to assess the food intake of patients (e.g., to combat obesity). The food recognition is a challenging task since the food is intrinsically deformable and presents high variability in appearance. Image representation plays a fundamental role. To properly study the peculiarities of the image representation in the food application context, a benchmark dataset is needed. These facts motivate the work presented in this paper. In this work we introduce the UNICT-FD889 dataset. It is the first food image dataset composed by over \(800\) distinct plates of food which can be used as benchmark to design and compare representation models of food images. We exploit the UNICT-FD889 dataset for Near Duplicate Image Retrieval (NDIR) purposes by comparing three standard state-of-the-art image descriptors: Bag of Textons, PRICoLBP and SIFT. Results confirm that both textures and colors are fundamental properties in food representation. Moreover the experiments point out that the Bag of Textons representation obtained considering the color domain is more accurate than the other two approaches for NDIR.

Keywords

Food dataset Food recognition Near duplicate image retrieval Textons PRICoLBP SIFT 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kong, F., Tan, J.: Dietcam: Automatic dietary assessment with mobile camera phones. Pervasive and Mobile Computing 8(1), 147–163 (2012)CrossRefGoogle Scholar
  2. 2.
    Xu, C., He, Y., Khannan, N., Parra, A., Boushey, C., Delp, E.: Image-based food volume estimation. In: International Workshop on Multimedia for Cooking and Eating Activities, pp. 75–80 (2013)Google Scholar
  3. 3.
    Kim, S., Schap, T.R., Bosch, M., Maciejewski, R., Delp, E.J., Ebert, D.S., Boushey, C.J.: Development of a mobile user interface for image-based dietary assessment. In: International Conference on Mobile and Ubiquitous Multimedia, pp. 1–13 (2010)Google Scholar
  4. 4.
    Arab, L., Estrin, D., Kim, D.H., Burke, J., Goldman, J.: Feasibility testing of an automated image-capture method to aid dietary recall (2011)Google Scholar
  5. 5.
    Zhu, F., Bosch, M., Woo, I., Kim, S., Boushey, C.J., Ebert, D.S., Delp, E.J.: The use of mobile devices in aiding dietary assessment and evaluation. Journal of Selected Topics in Signal Processing 4(4), 756–766 (2010)CrossRefGoogle Scholar
  6. 6.
    O’Loughlin, G., Cullen, S.J., McGoldrick, A., O’Connor, S., Blain, R., O’Malley, S., Warrington, G.D.: Using a wearable camera to increase the accuracy of dietary analysis. American Journal of Preventive Medicine 44(3), 297–301 (2013)CrossRefGoogle Scholar
  7. 7.
    Chen, M., Dhingra, K., Wu, W., Yang, L., Sukthankar, R., Yang, J.: Pfid: Pittsburgh fast-food image dataset. In: IEEE International Conference Image Processing, pp. 289–292 (2009)Google Scholar
  8. 8.
    Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: IEEE Computer Vision and Pattern Recognition, pp. 2249–2256 (2010)Google Scholar
  9. 9.
    Farinella, G.M., Moltisanti, M., Battiato, S.: Classifying food images represented as Bag of Textons. in: IEEE International Conference on Image Processing (ICIP), pp. 5212–5216 (2014)Google Scholar
  10. 10.
    Oliveira, R.D., Cherubini, M., Oliver, N.: Looking at near-duplicate videos from a human-centric perspective. ACM Transaction on Multimedia Comput. Commun. Appl. 6(3), 15:1–15:22 (2010)Google Scholar
  11. 11.
    Ke, Y., Sukthankar, R., Huston, L.: Efficient near-duplicate detection and sub-image retrieval. In: ACM International Conference on Multimedia, pp. 869–876 (2004)Google Scholar
  12. 12.
    Hu, Y., Cheng, X., Chia, L.T., Xie, X., Rajan, D., Tan, A.H.: Coherent phrase model for efficient image near-duplicate retrieval. IEEE Transactions on Multimedia 11(8), 1434–1445 (2009)CrossRefGoogle Scholar
  13. 13.
    Varma, M., Zisserman, A.: A Statistical Approach to Texture Classification from Single Images. International Journal of Computer Vision 62(1–2), 61–81 (2005)CrossRefGoogle Scholar
  14. 14.
    Qi, X., Xiao, R., Guo, J., Zhang, L.: Pairwise rotation invariant co-occurrence local binary pattern. In: European Converence on Computer Vision, pp. 158–171 (2012)Google Scholar
  15. 15.
    Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  16. 16.
    Chen, H.C., Jia, W., Yue, Y., Li, Z., Sun, Y.N., Fernstrom, J.D., Sun, M.: Model-based measurement of food portion size for image-based dietary assessment using 3d/2d registration (2013)Google Scholar
  17. 17.
    Jimnez, A.R., Jain, A.K., Ruz, R.C., Rovira, J.L.P.: Automatic fruit recognition: a survey and new results using range/attenuation images. Pattern Recognition 32(10), 1719–1736 (1999)CrossRefGoogle Scholar
  18. 18.
    Joutou, T., Yanai, K.: A food image recognition system with multiple kernel learning. In: IEEE International Conference on Image Processing, pp. 285–288 (2009)Google Scholar
  19. 19.
    Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. In: IEEE International Conference on Multimedia and Expo, pp. 25–30 (2012)Google Scholar
  20. 20.
    Julesz, B.: Textons, the elements of texture perception, and their interactions. Nature 290(5802), 91–97 (1981)CrossRefGoogle Scholar
  21. 21.
    Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and Texture Analysis for Image Segmentation. International Journal of Computer Vision 43(1), 7–27 (2001)CrossRefMATHGoogle Scholar
  22. 22.
    Leung, T., Malik, J.: Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons. Int. J. Comput. Vision 43(1), 29–44 (2001)CrossRefMATHGoogle Scholar
  23. 23.
    Battiato, S., Farinella, G.M., Gallo, G., Ravì, D.: Exploiting textons distributions on spatial hierarchy for scene classification. Eurasip Journal on Image and Video Processing, pp. 1–13 (2010)Google Scholar
  24. 24.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)CrossRefGoogle Scholar
  25. 25.
    Qi, X., Xiao, R., Li, C., Qiao, Y., Guo, J., Tang, X.: Pairwise rotation invariant co-occurrence local binary pattern. IEEE Transactions on Pattern Analysis and Machine Intelligence (2014)Google Scholar
  26. 26.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, pp. 1150–1157 (1999)Google Scholar
  27. 27.
    Brown, M., Lowe, D.: Automatic panoramic image stitching using invariant features. International Journal of Computer Vision 74(1), 59–73 (2007)CrossRefGoogle Scholar
  28. 28.
    Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: British Machine Vision Conference, pp. 1–10 (2008)Google Scholar
  29. 29.
    Battiato, S., Farinella, G.M., Puglisi, G., Ravì, R.: Aligning codebooks for near duplicate image detection. Multimedia Tools and Applications 72(2), 1483–1506 (2014)Google Scholar
  30. 30.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable libraryof computer vision algorithms (2008). http://www.vlfeat.org/
  31. 31.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  32. 32.
    Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2033–2040 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Giovanni Maria Farinella
    • 1
  • Dario Allegra
    • 1
  • Filippo Stanco
    • 1
  1. 1.Image Processing Laboratory, Department of Mathematics and Computer ScienceUniversity of CataniaCataniaItaly

Personalised recommendations