Skip to main content
Log in

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

Datasets derived from public resources are cited and made available with the article, and the augmented dataset is available upon request.

References

  1. Akrout, B.; Fakhfakh, S.: How to prevent drivers before their sleepiness using deep learning-based approach. Electronics 12(4), 965 (2023). https://doi.org/10.3390/electronics12040965

    Article  Google Scholar 

  2. Akrout, B.; Walid, M.: A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambientd Intell. Humaniz. Comput. 14(1), 527–552 (2023)

    Article  Google Scholar 

  3. Abed, A.; Akrout, B.; Amous, I.: Shoppers interaction classification based on an improved DenseNet model using RGB-D data. In: 2022 8th International Conference on Systems and Informatics (ICSAI), pp. 1–6 (2022). https://doi.org/10.1109/ICSAI57119.2022.10005508

  4. Abed, A.; Akrout, B.; Amous, I.: Semantic heads segmentation and counting in crowded retail environment with convolutional neural networks using top view depth images. SN Comput. Sci. 4(61), 2661–8907 (2022)

    Google Scholar 

  5. Abed, A.; Akrout, B.; Amous, I.: A novel deep convolutional neural network architecture for customer counting in the retail environment. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds.) Intelligent Systems and Pattern Recognition, pp. 327–340. Springer, Cham (2022)

    Chapter  Google Scholar 

  6. Paolanti, M.; Liciotti, D.; Pietrini, R.; Mancini, A.; Frontoni, E.: Modelling and forecasting customer navigation in intelligent retail environments. J. Intell. Robot. Syst. 91, 165–180 (2018). https://doi.org/10.1007/s10846-017-0674-7

    Article  Google Scholar 

  7. Liu, J.; Liu, Y.; Zhang, G.; Zhu, P.; Chen, Y.Q.: Detecting and tracking people in real time with RGB-D data. Pattern Recogn. Lett. 53, 16–23 (2015). https://doi.org/10.1016/j.patrec.2014.09.013

    Article  ADS  CAS  Google Scholar 

  8. Liang, B.; Zheng, L.: A survey on human action recognition using depth sensors. In: Neuromuscular Junction, pp. 1–8. Handbook of Experimental Pharmacology. IEEE, Adelaide (2015)

  9. Liciotti, D.; Paolanti, M.; Frontoni, E.; Zingaretti, P.: People detection and tracking from an RGB-D camera in top-view configuration: review of challenges and applications. In: Battiato, S., Farinella, G.M., Leo, M., Gallo, G. (eds.) New Trends in Image Analysis and Processing—ICIAP 2017, pp. 207–218. Springer, Cham (2017)

    Google Scholar 

  10. Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 7, 87–93 (2018). https://doi.org/10.1007/s13735-017-0141-z

    Article  CAS  Google Scholar 

  11. Ronneberger, O.; Fischer, P.; Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer, Munich (2015)

  12. Raghavachari, C.; Aparna, V.; Chithira, S.; Balasubramanian, V.: A comparative study of vision based human detection techniques in people counting applications. Procedia Comput. Sci. 58, 461–469 (2015). https://doi.org/10.1016/j.procs.2015.08.064

    Article  Google Scholar 

  13. Paolanti, M.; Pietrini, R.; Mancini, A.; Frontoni, E.; Zingaretti, P.: Deep understanding of shopper behaviours and interactions using RGB-D vision. Mach. Vis. Appl. 31, 66 (2020). https://doi.org/10.1007/s00138-020-01118-w

    Article  Google Scholar 

  14. Liciotti, D.: TVHeads (Top-View Heads) Dataset. publisher: Mendeley https://data.mendeley.com/datasets/nz4hy7yrps/1 (2018)

  15. Sun, S.; Akhtar, N.; Song, H.; Zhang, C.; Li, J.; Mian, A.: Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. IEEE Trans. Intell. Transp. Syst. 20, 3599–3612 (2019)

    Article  Google Scholar 

  16. Akrout, B.: A new structure of decision tree based on oriented edges gradient map for circles detection and the analysis of nano-particles. Micron (Oxford, England: 1993) 145, 103055 (2021). https://doi.org/10.1016/j.micron.2021.103055

    Article  CAS  PubMed  Google Scholar 

  17. Khan, A.I.; Al-Habsi, S.: Machine learning in computer vision. Procedia Comput. Sci. 167, 1444–1451 (2020). https://doi.org/10.1016/j.procs.2020.03.355

    Article  Google Scholar 

  18. Krizhevsky, A.; Sutskever, I.; Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Communications of the ACM, vol. 60, pp. 1097–1105. Curran Associates Inc., Red Hook (2012)

  19. Akrout, B.; Mahdi, W.: A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambient Intell. Humaniz. Comput. (2021). https://doi.org/10.1007/s12652-021-03311-9

    Article  Google Scholar 

  20. Bondi, E.; Seidenari, L.; Bagdanov, A.D.; Del Bimbo, A.: Real-time people counting from depth imagery of crowded environments. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 337–342. IEEE, Seoul (2014)

  21. Wang, C.; Zhang, H.; Yang, L.; Liu, S.; Cao, X.: Deep people counting in extremely dense crowds. In: MM ’15: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1299–1302. Association for Computing Machinery, New York (2015)

  22. Fu, M.; Xu, P.; Li, X.; Liu, Q.; Ye, M.; Zhu, C.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81–88 (2015). https://doi.org/10.1016/j.engappai.2015.04.006

    Article  Google Scholar 

  23. Del Pizzo, L.; Foggia, P.; Greco, A.; Percannella, G.; Vento, M.: Counting people by RGB or depth overhead cameras. Pattern Recogn. Lett. 81, 41–50 (2016). https://doi.org/10.1016/j.patrec.2016.05.033

    Article  ADS  Google Scholar 

  24. Liciotti, D.; Paolanti, M.; Pietrini, R.; Frontoni, E.; Zingaretti, P.: Convolutional networks for semantic heads segmentation using top-view depth data in crowded environment. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1384–1389. IEEE, Beijing (2018)

  25. Zhang, C.; Li, H.; Wang, X.; Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841. IEEE, Boston (2015)

  26. Mrazovac, B.; Bjelica, M.Z.; Kukolj, D.; Todorovi, B.M.: A human detection method for residential smart energy systems based on ZigBee RSSI changes. IEEE Trans. Consum. Electron. 58(3), 6 (2012)

    Article  Google Scholar 

  27. Badrinarayanan, V.; Kendall, A.; Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615

    Article  PubMed  Google Scholar 

  28. Simonyan, K.; Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs] (2015)

  29. He, K.; Zhang, X.; Ren, S.; Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas (2016)

  30. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915 [cs] (2017)

  31. Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 [cs] (2017)

  32. Takikawa, T.; Acuna, D.; Jampani, V.; Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238. IEEE, Seoul (2019)

  33. Yuan, Y.; Chen, X.; Chen, X.; Wang, J.: Segmentation transformer: Object-contextual representations for semantic segmentation. arXiv: 1909.11065 (2019)

  34. Pervaiz, M.; Jalal, A.; Kim, K.: Hybrid algorithm for multi people counting and tracking for smart surveillance. In: 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), pp. 530–535 (2021). IEEE

  35. Zhang, Z.; Xia, S.; Cai, Y.; Yang, C.; Zeng, S.: A soft-yolov4 for high-performance head detection and counting. Mathematics 9(23), 3096 (2021)

    Article  Google Scholar 

  36. Cao, H.; Peng, B.; Jia, L.; Li, B.; Knoll, A.; Chen, G.: Orientation-aware people detection and counting method based on overhead fisheye camera. In: 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 1–7 (2022). IEEE

  37. Ye, J.C.; Sung, W.K.: Understanding geometry of encoder-decoder CNNs. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning, vol. 97, pp. 7064–7073. PMLR, Long Beach, United States (2019)

    Google Scholar 

  38. Reddy, A.S.B.; Juliet, D.S.: Transfer learning with ResNet-50 for malaria cell-image classification. In: 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0945–0949. IEEE, Chennai, India (2019)

  39. Ji, Q.; Huang, J.; He, W.; Sun, Y.: Optimized deep convolutional neural networks for identification of macular diseases from optical coherence tomography images. Algorithms 12, 51 (2019). https://doi.org/10.3390/a12030051

    Article  MathSciNet  Google Scholar 

  40. Ioffe, S.; Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Francis Bach, D.B. (ed.) International Conference on Machine Learning, vol. 37, pp. 448–456. PMLR, Lille, France (2015)

    Google Scholar 

  41. Agarap, A.F.: Deep learning using rectified linear units (relu). arXiv:1803.08375 [cs, stat] (2019)

  42. Niu, Z.; Zhong, G.; Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021). https://doi.org/10.1016/j.neucom.2021.03.091

    Article  Google Scholar 

  43. Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6688–6697. IEEE, Seoul, Korea (2019)

  44. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S.: Cbam: convolutional block attention module. In: Ferrari, V., Weiss, Y., Hebert, M. (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Springer, Munich (2018)

    Google Scholar 

  45. Komodakis, N.; Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Kipf, T., Welling, M. (eds.) International Conference on Learning Representations. Palais des Congres Neptune. Springer, Toulon (2017)

    Google Scholar 

  46. Santurkar, S.; Tsipras, D.; Ilyas, A.; Ma, A.: How does batch normalization help optimization? Adv. Neural Inf. Process. Syst. 31(3), 11 (2018)

    Google Scholar 

  47. Xu, B.; Wang, N.; Chen, T.; Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 [cs, stat] (2015)

  48. Asgari Taghanaki, S.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54, 137–178 (2021). https://doi.org/10.1007/s10462-020-09854-1

    Article  Google Scholar 

  49. Duchi, J.; Hazan, E.; Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Adv. Neural Inf. Process. Syst. 12(7), 39 (2011)

    MathSciNet  Google Scholar 

  50. Ruder, S.: An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs] (2017)

  51. Taghanaki, S.A.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G.: Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2, 41 (2019). https://doi.org/10.17977/um018v2i12019p41-46

    Article  Google Scholar 

  52. Ruder, S.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs] (2014)

  53. He, F.; Liu, T.; Tao, D.: Control batch size and learning rate to generalize well: theoretical and empirical evidence. Adv. Neural Inf. Process. Syst. 32, 10 (2019)

    Google Scholar 

  54. Prechelt, L.: Early stopping—but when? In: Montavon, G., Orr, G.B., Muller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn, pp. 53–67. Springer, Berlin, Heidelberg (2012)

    Chapter  Google Scholar 

  55. Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11, 125 (2020). https://doi.org/10.3390/info11020125

    Article  Google Scholar 

  56. Ma, R.; Tao, P.; Tang, H.: Optimizing data augmentation for semantic segmentation on small-scale dataset. In: Kipf, T., Welling, M. (eds.) Proceedings of the 2nd International Conference on Control and Computer Vision, pp. 77–81. Association for Computing Machinery, Jeju (2019)

    Chapter  Google Scholar 

  57. Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901). https://doi.org/10.5169/SEALS-266450

    Article  Google Scholar 

  58. Fiorio, C.; Gustedt, J.: Two linear time union-find strategies for image processing. Theor. Comput. Sci. 154, 165–181 (1996). https://doi.org/10.1016/0304-3975(94)00262-2

    Article  MathSciNet  Google Scholar 

  59. Wu, K.; Otoo, E.; Shoshani, A.: Optimizing connected component labeling algorithms. In: Fitzpatrick, J.M., Reinhardt, J.M. (eds.) Medical Imaging 2005: Image Processing, vol. 5747, pp. 1965–1976. SPIE, San Diego (2005)

    Chapter  Google Scholar 

  60. Hayat, U.; Ali, A.; Murtaza, G.; Ullah, M.; Ullah, I.; de Celis, N.; Rajpoot, N.: Classification of well log data using vanishing component analysis. Pure Appl. Geophys. 117(6), 2719–2737 (2020)

Download references

Acknowledgements

This study is supported via funding from Prince Sattam Bin Abdulaziz University project number (PSAU/2023/R/1444)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Belhassen Akrout.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abed, A., Akrout, B. & Amous, I. Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images. Arab J Sci Eng 49, 3735–3749 (2024). https://doi.org/10.1007/s13369-023-08159-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-08159-z

Keywords

Navigation