Skip to main content

Comparison Study on Convolution Neural Networks (CNNs) vs. Human Visual System (HVS)

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1018))

Abstract

Computer vision image recognition has undergone remarkable evolution due to available large-scale datasets (e.g., ImageNet, UECFood) and the evolution of deep Convolutional Neural Networks (CNN). CNN’s learning method is data-driven from a sufficiently large training data, containing organized hierarchical image features, such as annotations, labels, and distinct regions of interest (ROI). However, acquiring such dataset with comprehensive annotations, in many domains is still a challenge. Currently, there are three main techniques to employ CNN: train the network from zero; use an off-the-shelf already trained network, and perform unsupervised pre-training with supervised adjustments. Deep learning networks for image classification, regression and feature learning include Inception-v3, ResNet-50, ResNet-101, GoogLeNet, AlexNet, VGG-16, and VGG-19.

In this paper we exploit the use of three CNN to solve detection problems. First, the different CNN architectures are evaluated. The studied CNN models contain 5 thousand to 160 million parameters, which can vary depending on the number of layers. Secondly, the different studied CNN’s are evaluated, based on dataset sizes, and spatial image context. Results showing performance vs. training time vs. accuracy are analyzed. Thirdly, based on human knowledge and human visual system (HVS) classification, the accuracy of CNN’s is studied and compared. Based on obtained results it is possible to conclude that the HVS is more accurate when there is a data set with a wide variety range. However, if the data-set if focused on only niche images, the CNN shows better results than the HVS.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Anthimopoulos, M.M., Gianola, L., Scarnato, L., Diem, P., Mougiakakou, S.G.: A food recognition system for diabetic patients based on an optimized bag-of-features model. IEEE J. Biomed. Health Inform. 18(4), 1261–1271 (2014)

    Article  Google Scholar 

  2. Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, vol. 2, p. 8 (2011)

    Google Scholar 

  3. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)

  4. Chen, J., Ngo, C.W.: Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 32–41. ACM (2016)

    Google Scholar 

  5. Chen, M., Dhingra, K., Wu, W., Yang, L., Sukthankar, R., Yang, J.: PFID: Pittsburgh fast-food image dataset. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 289–292. IEEE (2009)

    Google Scholar 

  6. Christodoulidis, S., Anthimopoulos, M., Mougiakakou, S.: Food recognition for dietary assessment using deep convolutional neural networks. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 458–465. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_56

    Chapter  Google Scholar 

  7. Farinella, G.M., Moltisanti, M., Battiato, S.: Classifying food images represented as Bag of Textons. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 5212–5216. IEEE (2014)

    Google Scholar 

  8. Geman, S.: Hierarchy in machine and natural vision. In: Proceedings of the Scandinavian Conference on Image Analysis, vol. 1, pp. 179–184 (1999)

    Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  10. Hassannejad, H., Matrella, G., Ciampolini, P., De Munari, I., Mordonini, M., Cagnoni, S.: Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, pp. 41–49. ACM (2016)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23

    Chapter  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Hoashi, H., Joutou, T., Yanai, K.: Image recognition of 85 food categories by feature fusion. In: 2010 IEEE International Symposium on Multimedia (ISM), pp. 296–301. IEEE (2010)

    Google Scholar 

  14. Kagaya, H., Aizawa, K.: Highly accurate food/non-food image classification based on a deep convolutional neural network. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 350–357. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_43

    Chapter  Google Scholar 

  15. Kawano, Y., Yanai, K.: Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_1

    Chapter  Google Scholar 

  16. Kawano, Y., Yanai, K.: Food image recognition with deep convolutional features. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 589–593. ACM (2014)

    Google Scholar 

  17. Kawano, Y., Yanai, K.: FoodCam: a real-time mobile food recognition system employing fisher vector. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds.) MMM 2014. LNCS, vol. 8326, pp. 369–373. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04117-9_38

    Chapter  Google Scholar 

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  19. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  20. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  21. Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Exploring context with deepstructured models for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1352–1366 (2017)

    Article  Google Scholar 

  22. Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., Ma, Y.: DeepFood: deep learning-based food image recognition for computer-aided dietary assessment. In: Chang, C.K., Chiari, L., Cao, Y., Jin, H., Mokhtari, M., Aloulou, H. (eds.) ICOST 2016. LNCS, vol. 9677, pp. 37–48. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39601-9_4

    Chapter  Google Scholar 

  23. Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. In: 2012 IEEE International Conference on Multimedia and Expo (ICME), pp. 25–30. IEEE (2012)

    Google Scholar 

  24. O’Hara, S., Draper, B.A.: Introduction to the bag of features paradigm for image classification and retrieval. arXiv preprint arXiv:1101.3354 (2011)

  25. Oliveira, L., Costa, V., Neves, G., Oliveira, T., Jorge, E., Lizarraga, M.: A mobile, lightweight, poll-based food identification system. Pattern Recogn. 47(5), 1941–1952 (2014)

    Article  Google Scholar 

  26. Puri, M., Zhu, Z., Yu, Q., Divakaran, A., Sawhney, H.: Recognition and volume estimation of food intake using a mobile device. In: 2009 Workshop on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2009)

    Google Scholar 

  27. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)

  28. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)

    Google Scholar 

  29. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  30. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  31. Yanai, K., Kawano, Y.: Food image recognition using deep convolutional network with pre-training and fine-tuning. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2015)

    Google Scholar 

  32. Zhu, F., et al.: The use of mobile devices in aiding dietary assessment and evaluation. IEEE J. Sel. Top. Sig. Process. 4(4), 756–766 (2010)

    Article  Google Scholar 

  33. Zhu, Y., Urtasun, R., Salakhutdinov, R., Fidler, S.: segDeepM: exploiting segmentation and context in deep neural networks for object detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4703–4711. IEEE (2015)

    Google Scholar 

Download references

Acknowledgements

“This article is financed by national funds through FCT Fundação para a Ciência e Tecnologia, I.P., under the project UID/Multi/04016/2016. Furthermore, we would like to thank the Instituto Politécnico de Viseu, the University of Coimbra, the University of Lisbon, and the University of François Rabelais, for their support.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Martins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Caldeira, M., Martins, P., Cecílio, J., Furtado, P. (2019). Comparison Study on Convolution Neural Networks (CNNs) vs. Human Visual System (HVS). In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis. BDAS 2019. Communications in Computer and Information Science, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-030-19093-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-19093-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19092-7

  • Online ISBN: 978-3-030-19093-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics