Food Recognition Using Consensus Vocabularies

  • Giovanni Maria Farinella
  • Marco MoltisantiEmail author
  • Sebastiano Battiato
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9281)


Food recognition is an interesting and challenging problem with applications in medical, social and anthropological research areas. The high variability of food images makes the recognition task difficult for current state-of-the-art methods. It has been proved that the exploitation of multiple features to capture complementary aspects of the image contents is useful to improve the discrimination of different food items. In this paper we exploit an image representation based on the consensus among visual vocabularies built on different feature spaces. Starting from a set of visual codebooks, a consensus clustering technique is used to build a consensus vocabulary used to represent food pictures with a Bag-of-Visual-Words paradigm. This new representation is employed together with a SVM for recognition purpose.


Image Representation Color Histogram Multiple Kernel Learn Consensus Cluster Visual Vocabulary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Battiato, S., Farinella, G.M., Puglisi, G., Ravì, D.: Aligning codebooks for near duplicate image detection. Multimedia Tools and Applications, 1–24 (2013)Google Scholar
  2. 2.
    Chen, M., Dhingra, K., Wu, W., Yang, L., Sukthankar, R., Yang, J.: Pfid: Pittsburgh fast-food image dataset. IEEE International Conference on Image Processing, 289–292 (2009)Google Scholar
  3. 3.
    Jiménez, A.R., Jain, A.K., Ceres, R., Pons, J.: Automatic fruit recognition: a survey and new results using range/attenuation images. Pattern recognition 32(10), 1719–1736 (1999)CrossRefGoogle Scholar
  4. 4.
    Joutou, T., Yanai, K.: A food image recognition system with multiple kernel learning. IEEE International Conference on Image Processing, 285–288 (2009)Google Scholar
  5. 5.
    Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1265–1278 (2005)CrossRefGoogle Scholar
  6. 6.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  7. 7.
    Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. IEEE International Conference on Multimedia and Expo, 25–30 (2012)Google Scholar
  8. 8.
    Matsuda, Y., Yanai, K.: Multiple-food recognition considering co-occurrence employing manifold ranking. In: International Conference on Pattern Recognition, pp. 2017–2020 (2012)Google Scholar
  9. 9.
    Perronnin, F.: Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(7), 1243–1256 (2008)CrossRefGoogle Scholar
  10. 10.
    Saffari, A., Bischof, H.: Clustering in a boosting framework, pp. 75–82. Computer Vision Winter Workshop (2007)Google Scholar
  11. 11.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. IEEE Conference on Computer Vision and Pattern Recognition, 1–8 (2008)Google Scholar
  12. 12.
    Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12), 1866–1881 (2005)CrossRefGoogle Scholar
  13. 13.
    van Gemert, J.C., Veenman, C.J., Smeulders, A.W., Geusebroek, J.-M.: Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(7), 1271–1283 (2010)CrossRefGoogle Scholar
  14. 14.
    Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. IEEE Conference on Computer Vision and Pattern Recognition, 2249–2256 (2010)Google Scholar
  15. 15.
    Farinella, G.M., Moltisanti, M., Battiato, S.: Classifying Food Images Represented as Bag of Textons. IEEE International Conference on Image Processing, 5212–5216 (2014)Google Scholar
  16. 16.
    Farinella, G.M., Allegra, D., Stanco, F.: A benchmark dataset to study the representation of food images. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8927, pp. 584–599. Springer, Heidelberg (2015) Google Scholar
  17. 17.
    Anthimopoulos, M.M., Gianola, L., Scarnato, L., Diem, P., Mougiakakou, S.G.: A Food Recognition System for Diabetic Patients Based on an Optimized Bag-of-Features Model. IEEE Journal of Biomedical and Health Informatics 18(4), 1261–1271 (2014)CrossRefGoogle Scholar
  18. 18.
    Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 446–461. Springer, Heidelberg (2014) Google Scholar
  19. 19.
    Hu, Y., Cheng, X., Chia, L.-T., Xie, X., Rajan, D., Tan, A.-H.: Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval. IEEE Transactions on Multimedia 11(8), 1434–1445 (2009)CrossRefGoogle Scholar
  20. 20.
    Varma, M., Zisserman, A.: A Statistical Approach to Texture Classication from Single Images. International Journal of Computer Vision 62(1-2), 61–81 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Giovanni Maria Farinella
    • 1
  • Marco Moltisanti
    • 1
    Email author
  • Sebastiano Battiato
    • 1
  1. 1.Image Processing LAB, Department of Mathematics and Computer ScienceUniversitá degli Studi di CataniaCataniaItaly

Personalised recommendations