Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Abed, Almustafa; Akrout, Belhassen; Amous, Ikram

doi:10.1007/s13369-023-08159-z

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Research Article-Computer Engineering and Computer Science
Published: 15 August 2023

Volume 49, pages 3735–3749, (2024)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

113 Accesses
Explore all metrics

Abstract

Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

Article 17 November 2022

A Novel Deep Convolutional Neural Network Architecture for Customer Counting in the Retail Environment

Naïve Approach for Bounding Box Annotation and Object Detection Towards Smart Retail Systems

Data Availability

Datasets derived from public resources are cited and made available with the article, and the augmented dataset is available upon request.

References

Akrout, B.; Fakhfakh, S.: How to prevent drivers before their sleepiness using deep learning-based approach. Electronics 12(4), 965 (2023). https://doi.org/10.3390/electronics12040965
Article Google Scholar
Akrout, B.; Walid, M.: A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambientd Intell. Humaniz. Comput. 14(1), 527–552 (2023)
Article Google Scholar
Abed, A.; Akrout, B.; Amous, I.: Shoppers interaction classification based on an improved DenseNet model using RGB-D data. In: 2022 8th International Conference on Systems and Informatics (ICSAI), pp. 1–6 (2022). https://doi.org/10.1109/ICSAI57119.2022.10005508
Abed, A.; Akrout, B.; Amous, I.: Semantic heads segmentation and counting in crowded retail environment with convolutional neural networks using top view depth images. SN Comput. Sci. 4(61), 2661–8907 (2022)
Google Scholar
Abed, A.; Akrout, B.; Amous, I.: A novel deep convolutional neural network architecture for customer counting in the retail environment. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds.) Intelligent Systems and Pattern Recognition, pp. 327–340. Springer, Cham (2022)
Chapter Google Scholar
Paolanti, M.; Liciotti, D.; Pietrini, R.; Mancini, A.; Frontoni, E.: Modelling and forecasting customer navigation in intelligent retail environments. J. Intell. Robot. Syst. 91, 165–180 (2018). https://doi.org/10.1007/s10846-017-0674-7
Article Google Scholar
Liu, J.; Liu, Y.; Zhang, G.; Zhu, P.; Chen, Y.Q.: Detecting and tracking people in real time with RGB-D data. Pattern Recogn. Lett. 53, 16–23 (2015). https://doi.org/10.1016/j.patrec.2014.09.013
Article ADS CAS Google Scholar
Liang, B.; Zheng, L.: A survey on human action recognition using depth sensors. In: Neuromuscular Junction, pp. 1–8. Handbook of Experimental Pharmacology. IEEE, Adelaide (2015)
Liciotti, D.; Paolanti, M.; Frontoni, E.; Zingaretti, P.: People detection and tracking from an RGB-D camera in top-view configuration: review of challenges and applications. In: Battiato, S., Farinella, G.M., Leo, M., Gallo, G. (eds.) New Trends in Image Analysis and Processing—ICIAP 2017, pp. 207–218. Springer, Cham (2017)
Google Scholar
Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 7, 87–93 (2018). https://doi.org/10.1007/s13735-017-0141-z
Article CAS Google Scholar
Ronneberger, O.; Fischer, P.; Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer, Munich (2015)
Raghavachari, C.; Aparna, V.; Chithira, S.; Balasubramanian, V.: A comparative study of vision based human detection techniques in people counting applications. Procedia Comput. Sci. 58, 461–469 (2015). https://doi.org/10.1016/j.procs.2015.08.064
Article Google Scholar
Paolanti, M.; Pietrini, R.; Mancini, A.; Frontoni, E.; Zingaretti, P.: Deep understanding of shopper behaviours and interactions using RGB-D vision. Mach. Vis. Appl. 31, 66 (2020). https://doi.org/10.1007/s00138-020-01118-w
Article Google Scholar
Liciotti, D.: TVHeads (Top-View Heads) Dataset. publisher: Mendeley https://data.mendeley.com/datasets/nz4hy7yrps/1 (2018)
Sun, S.; Akhtar, N.; Song, H.; Zhang, C.; Li, J.; Mian, A.: Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. IEEE Trans. Intell. Transp. Syst. 20, 3599–3612 (2019)
Article Google Scholar
Akrout, B.: A new structure of decision tree based on oriented edges gradient map for circles detection and the analysis of nano-particles. Micron (Oxford, England: 1993) 145, 103055 (2021). https://doi.org/10.1016/j.micron.2021.103055
Article CAS PubMed Google Scholar
Khan, A.I.; Al-Habsi, S.: Machine learning in computer vision. Procedia Comput. Sci. 167, 1444–1451 (2020). https://doi.org/10.1016/j.procs.2020.03.355
Article Google Scholar
Krizhevsky, A.; Sutskever, I.; Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Communications of the ACM, vol. 60, pp. 1097–1105. Curran Associates Inc., Red Hook (2012)
Akrout, B.; Mahdi, W.: A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambient Intell. Humaniz. Comput. (2021). https://doi.org/10.1007/s12652-021-03311-9
Article Google Scholar
Bondi, E.; Seidenari, L.; Bagdanov, A.D.; Del Bimbo, A.: Real-time people counting from depth imagery of crowded environments. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 337–342. IEEE, Seoul (2014)
Wang, C.; Zhang, H.; Yang, L.; Liu, S.; Cao, X.: Deep people counting in extremely dense crowds. In: MM ’15: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1299–1302. Association for Computing Machinery, New York (2015)
Fu, M.; Xu, P.; Li, X.; Liu, Q.; Ye, M.; Zhu, C.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81–88 (2015). https://doi.org/10.1016/j.engappai.2015.04.006
Article Google Scholar
Del Pizzo, L.; Foggia, P.; Greco, A.; Percannella, G.; Vento, M.: Counting people by RGB or depth overhead cameras. Pattern Recogn. Lett. 81, 41–50 (2016). https://doi.org/10.1016/j.patrec.2016.05.033
Article ADS Google Scholar
Liciotti, D.; Paolanti, M.; Pietrini, R.; Frontoni, E.; Zingaretti, P.: Convolutional networks for semantic heads segmentation using top-view depth data in crowded environment. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1384–1389. IEEE, Beijing (2018)
Zhang, C.; Li, H.; Wang, X.; Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841. IEEE, Boston (2015)
Mrazovac, B.; Bjelica, M.Z.; Kukolj, D.; Todorovi, B.M.: A human detection method for residential smart energy systems based on ZigBee RSSI changes. IEEE Trans. Consum. Electron. 58(3), 6 (2012)
Article Google Scholar
Badrinarayanan, V.; Kendall, A.; Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Article PubMed Google Scholar
Simonyan, K.; Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs] (2015)
He, K.; Zhang, X.; Ren, S.; Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas (2016)
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915 [cs] (2017)
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 [cs] (2017)
Takikawa, T.; Acuna, D.; Jampani, V.; Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238. IEEE, Seoul (2019)
Yuan, Y.; Chen, X.; Chen, X.; Wang, J.: Segmentation transformer: Object-contextual representations for semantic segmentation. arXiv: 1909.11065 (2019)
Pervaiz, M.; Jalal, A.; Kim, K.: Hybrid algorithm for multi people counting and tracking for smart surveillance. In: 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), pp. 530–535 (2021). IEEE
Zhang, Z.; Xia, S.; Cai, Y.; Yang, C.; Zeng, S.: A soft-yolov4 for high-performance head detection and counting. Mathematics 9(23), 3096 (2021)
Article Google Scholar
Cao, H.; Peng, B.; Jia, L.; Li, B.; Knoll, A.; Chen, G.: Orientation-aware people detection and counting method based on overhead fisheye camera. In: 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 1–7 (2022). IEEE
Ye, J.C.; Sung, W.K.: Understanding geometry of encoder-decoder CNNs. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning, vol. 97, pp. 7064–7073. PMLR, Long Beach, United States (2019)
Google Scholar
Reddy, A.S.B.; Juliet, D.S.: Transfer learning with ResNet-50 for malaria cell-image classification. In: 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0945–0949. IEEE, Chennai, India (2019)
Ji, Q.; Huang, J.; He, W.; Sun, Y.: Optimized deep convolutional neural networks for identification of macular diseases from optical coherence tomography images. Algorithms 12, 51 (2019). https://doi.org/10.3390/a12030051
Article MathSciNet Google Scholar
Ioffe, S.; Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Francis Bach, D.B. (ed.) International Conference on Machine Learning, vol. 37, pp. 448–456. PMLR, Lille, France (2015)
Google Scholar
Agarap, A.F.: Deep learning using rectified linear units (relu). arXiv:1803.08375 [cs, stat] (2019)
Niu, Z.; Zhong, G.; Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021). https://doi.org/10.1016/j.neucom.2021.03.091
Article Google Scholar
Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6688–6697. IEEE, Seoul, Korea (2019)
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S.: Cbam: convolutional block attention module. In: Ferrari, V., Weiss, Y., Hebert, M. (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Springer, Munich (2018)
Google Scholar
Komodakis, N.; Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Kipf, T., Welling, M. (eds.) International Conference on Learning Representations. Palais des Congres Neptune. Springer, Toulon (2017)
Google Scholar
Santurkar, S.; Tsipras, D.; Ilyas, A.; Ma, A.: How does batch normalization help optimization? Adv. Neural Inf. Process. Syst. 31(3), 11 (2018)
Google Scholar
Xu, B.; Wang, N.; Chen, T.; Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 [cs, stat] (2015)
Asgari Taghanaki, S.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54, 137–178 (2021). https://doi.org/10.1007/s10462-020-09854-1
Article Google Scholar
Duchi, J.; Hazan, E.; Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Adv. Neural Inf. Process. Syst. 12(7), 39 (2011)
MathSciNet Google Scholar
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs] (2017)
Taghanaki, S.A.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G.: Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2, 41 (2019). https://doi.org/10.17977/um018v2i12019p41-46
Article Google Scholar
Ruder, S.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs] (2014)
He, F.; Liu, T.; Tao, D.: Control batch size and learning rate to generalize well: theoretical and empirical evidence. Adv. Neural Inf. Process. Syst. 32, 10 (2019)
Google Scholar
Prechelt, L.: Early stopping—but when? In: Montavon, G., Orr, G.B., Muller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn, pp. 53–67. Springer, Berlin, Heidelberg (2012)
Chapter Google Scholar
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11, 125 (2020). https://doi.org/10.3390/info11020125
Article Google Scholar
Ma, R.; Tao, P.; Tang, H.: Optimizing data augmentation for semantic segmentation on small-scale dataset. In: Kipf, T., Welling, M. (eds.) Proceedings of the 2nd International Conference on Control and Computer Vision, pp. 77–81. Association for Computing Machinery, Jeju (2019)
Chapter Google Scholar
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901). https://doi.org/10.5169/SEALS-266450
Article Google Scholar
Fiorio, C.; Gustedt, J.: Two linear time union-find strategies for image processing. Theor. Comput. Sci. 154, 165–181 (1996). https://doi.org/10.1016/0304-3975(94)00262-2
Article MathSciNet Google Scholar
Wu, K.; Otoo, E.; Shoshani, A.: Optimizing connected component labeling algorithms. In: Fitzpatrick, J.M., Reinhardt, J.M. (eds.) Medical Imaging 2005: Image Processing, vol. 5747, pp. 1965–1976. SPIE, San Diego (2005)
Chapter Google Scholar
Hayat, U.; Ali, A.; Murtaza, G.; Ullah, M.; Ullah, I.; de Celis, N.; Rajpoot, N.: Classification of well log data using vanishing component analysis. Pure Appl. Geophys. 117(6), 2719–2737 (2020)

Download references

Acknowledgements

This study is supported via funding from Prince Sattam Bin Abdulaziz University project number (PSAU/2023/R/1444)

Author information

Authors and Affiliations

MIRACL-ENET’COM, University of Sfax, National School of Electronics and Telecommunications of Sfax, Road Tunis City El Ons, 3018, Sfax, Tunisia
Almustafa Abed & Ikram Amous
Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, 11942, Alkharj, Saudi Arabia
Belhassen Akrout
Multimedia Information Systems and Advanced Computing Laboratory (MIRACL), Sfax University, 3021, sfax, Tunisia
Almustafa Abed, Belhassen Akrout & Ikram Amous

Authors

Almustafa Abed
View author publications
You can also search for this author in PubMed Google Scholar
Belhassen Akrout
View author publications
You can also search for this author in PubMed Google Scholar
Ikram Amous
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Belhassen Akrout.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Abed, A., Akrout, B. & Amous, I. Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images. Arab J Sci Eng 49, 3735–3749 (2024). https://doi.org/10.1007/s13369-023-08159-z

Download citation

Received: 27 September 2022
Accepted: 28 March 2023
Published: 15 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s13369-023-08159-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Abstract

Access this article

Similar content being viewed by others

Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

A Novel Deep Convolutional Neural Network Architecture for Customer Counting in the Retail Environment

Naïve Approach for Bounding Box Annotation and Object Detection Towards Smart Retail Systems

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Abstract

Access this article

Similar content being viewed by others

Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

A Novel Deep Convolutional Neural Network Architecture for Customer Counting in the Retail Environment

Naïve Approach for Bounding Box Annotation and Object Detection Towards Smart Retail Systems

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation