Abstract
Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.
Similar content being viewed by others
Data Availability
Datasets derived from public resources are cited and made available with the article, and the augmented dataset is available upon request.
References
Akrout, B.; Fakhfakh, S.: How to prevent drivers before their sleepiness using deep learning-based approach. Electronics 12(4), 965 (2023). https://doi.org/10.3390/electronics12040965
Akrout, B.; Walid, M.: A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambientd Intell. Humaniz. Comput. 14(1), 527–552 (2023)
Abed, A.; Akrout, B.; Amous, I.: Shoppers interaction classification based on an improved DenseNet model using RGB-D data. In: 2022 8th International Conference on Systems and Informatics (ICSAI), pp. 1–6 (2022). https://doi.org/10.1109/ICSAI57119.2022.10005508
Abed, A.; Akrout, B.; Amous, I.: Semantic heads segmentation and counting in crowded retail environment with convolutional neural networks using top view depth images. SN Comput. Sci. 4(61), 2661–8907 (2022)
Abed, A.; Akrout, B.; Amous, I.: A novel deep convolutional neural network architecture for customer counting in the retail environment. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds.) Intelligent Systems and Pattern Recognition, pp. 327–340. Springer, Cham (2022)
Paolanti, M.; Liciotti, D.; Pietrini, R.; Mancini, A.; Frontoni, E.: Modelling and forecasting customer navigation in intelligent retail environments. J. Intell. Robot. Syst. 91, 165–180 (2018). https://doi.org/10.1007/s10846-017-0674-7
Liu, J.; Liu, Y.; Zhang, G.; Zhu, P.; Chen, Y.Q.: Detecting and tracking people in real time with RGB-D data. Pattern Recogn. Lett. 53, 16–23 (2015). https://doi.org/10.1016/j.patrec.2014.09.013
Liang, B.; Zheng, L.: A survey on human action recognition using depth sensors. In: Neuromuscular Junction, pp. 1–8. Handbook of Experimental Pharmacology. IEEE, Adelaide (2015)
Liciotti, D.; Paolanti, M.; Frontoni, E.; Zingaretti, P.: People detection and tracking from an RGB-D camera in top-view configuration: review of challenges and applications. In: Battiato, S., Farinella, G.M., Leo, M., Gallo, G. (eds.) New Trends in Image Analysis and Processing—ICIAP 2017, pp. 207–218. Springer, Cham (2017)
Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 7, 87–93 (2018). https://doi.org/10.1007/s13735-017-0141-z
Ronneberger, O.; Fischer, P.; Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer, Munich (2015)
Raghavachari, C.; Aparna, V.; Chithira, S.; Balasubramanian, V.: A comparative study of vision based human detection techniques in people counting applications. Procedia Comput. Sci. 58, 461–469 (2015). https://doi.org/10.1016/j.procs.2015.08.064
Paolanti, M.; Pietrini, R.; Mancini, A.; Frontoni, E.; Zingaretti, P.: Deep understanding of shopper behaviours and interactions using RGB-D vision. Mach. Vis. Appl. 31, 66 (2020). https://doi.org/10.1007/s00138-020-01118-w
Liciotti, D.: TVHeads (Top-View Heads) Dataset. publisher: Mendeley https://data.mendeley.com/datasets/nz4hy7yrps/1 (2018)
Sun, S.; Akhtar, N.; Song, H.; Zhang, C.; Li, J.; Mian, A.: Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. IEEE Trans. Intell. Transp. Syst. 20, 3599–3612 (2019)
Akrout, B.: A new structure of decision tree based on oriented edges gradient map for circles detection and the analysis of nano-particles. Micron (Oxford, England: 1993) 145, 103055 (2021). https://doi.org/10.1016/j.micron.2021.103055
Khan, A.I.; Al-Habsi, S.: Machine learning in computer vision. Procedia Comput. Sci. 167, 1444–1451 (2020). https://doi.org/10.1016/j.procs.2020.03.355
Krizhevsky, A.; Sutskever, I.; Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Communications of the ACM, vol. 60, pp. 1097–1105. Curran Associates Inc., Red Hook (2012)
Akrout, B.; Mahdi, W.: A novel approach for driver fatigue detection based on visual characteristics analysis. J. Ambient Intell. Humaniz. Comput. (2021). https://doi.org/10.1007/s12652-021-03311-9
Bondi, E.; Seidenari, L.; Bagdanov, A.D.; Del Bimbo, A.: Real-time people counting from depth imagery of crowded environments. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 337–342. IEEE, Seoul (2014)
Wang, C.; Zhang, H.; Yang, L.; Liu, S.; Cao, X.: Deep people counting in extremely dense crowds. In: MM ’15: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1299–1302. Association for Computing Machinery, New York (2015)
Fu, M.; Xu, P.; Li, X.; Liu, Q.; Ye, M.; Zhu, C.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81–88 (2015). https://doi.org/10.1016/j.engappai.2015.04.006
Del Pizzo, L.; Foggia, P.; Greco, A.; Percannella, G.; Vento, M.: Counting people by RGB or depth overhead cameras. Pattern Recogn. Lett. 81, 41–50 (2016). https://doi.org/10.1016/j.patrec.2016.05.033
Liciotti, D.; Paolanti, M.; Pietrini, R.; Frontoni, E.; Zingaretti, P.: Convolutional networks for semantic heads segmentation using top-view depth data in crowded environment. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1384–1389. IEEE, Beijing (2018)
Zhang, C.; Li, H.; Wang, X.; Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841. IEEE, Boston (2015)
Mrazovac, B.; Bjelica, M.Z.; Kukolj, D.; Todorovi, B.M.: A human detection method for residential smart energy systems based on ZigBee RSSI changes. IEEE Trans. Consum. Electron. 58(3), 6 (2012)
Badrinarayanan, V.; Kendall, A.; Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Simonyan, K.; Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs] (2015)
He, K.; Zhang, X.; Ren, S.; Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas (2016)
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915 [cs] (2017)
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 [cs] (2017)
Takikawa, T.; Acuna, D.; Jampani, V.; Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238. IEEE, Seoul (2019)
Yuan, Y.; Chen, X.; Chen, X.; Wang, J.: Segmentation transformer: Object-contextual representations for semantic segmentation. arXiv: 1909.11065 (2019)
Pervaiz, M.; Jalal, A.; Kim, K.: Hybrid algorithm for multi people counting and tracking for smart surveillance. In: 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), pp. 530–535 (2021). IEEE
Zhang, Z.; Xia, S.; Cai, Y.; Yang, C.; Zeng, S.: A soft-yolov4 for high-performance head detection and counting. Mathematics 9(23), 3096 (2021)
Cao, H.; Peng, B.; Jia, L.; Li, B.; Knoll, A.; Chen, G.: Orientation-aware people detection and counting method based on overhead fisheye camera. In: 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 1–7 (2022). IEEE
Ye, J.C.; Sung, W.K.: Understanding geometry of encoder-decoder CNNs. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning, vol. 97, pp. 7064–7073. PMLR, Long Beach, United States (2019)
Reddy, A.S.B.; Juliet, D.S.: Transfer learning with ResNet-50 for malaria cell-image classification. In: 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0945–0949. IEEE, Chennai, India (2019)
Ji, Q.; Huang, J.; He, W.; Sun, Y.: Optimized deep convolutional neural networks for identification of macular diseases from optical coherence tomography images. Algorithms 12, 51 (2019). https://doi.org/10.3390/a12030051
Ioffe, S.; Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Francis Bach, D.B. (ed.) International Conference on Machine Learning, vol. 37, pp. 448–456. PMLR, Lille, France (2015)
Agarap, A.F.: Deep learning using rectified linear units (relu). arXiv:1803.08375 [cs, stat] (2019)
Niu, Z.; Zhong, G.; Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021). https://doi.org/10.1016/j.neucom.2021.03.091
Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6688–6697. IEEE, Seoul, Korea (2019)
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S.: Cbam: convolutional block attention module. In: Ferrari, V., Weiss, Y., Hebert, M. (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Springer, Munich (2018)
Komodakis, N.; Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Kipf, T., Welling, M. (eds.) International Conference on Learning Representations. Palais des Congres Neptune. Springer, Toulon (2017)
Santurkar, S.; Tsipras, D.; Ilyas, A.; Ma, A.: How does batch normalization help optimization? Adv. Neural Inf. Process. Syst. 31(3), 11 (2018)
Xu, B.; Wang, N.; Chen, T.; Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 [cs, stat] (2015)
Asgari Taghanaki, S.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54, 137–178 (2021). https://doi.org/10.1007/s10462-020-09854-1
Duchi, J.; Hazan, E.; Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Adv. Neural Inf. Process. Syst. 12(7), 39 (2011)
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs] (2017)
Taghanaki, S.A.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G.: Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2, 41 (2019). https://doi.org/10.17977/um018v2i12019p41-46
Ruder, S.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs] (2014)
He, F.; Liu, T.; Tao, D.: Control batch size and learning rate to generalize well: theoretical and empirical evidence. Adv. Neural Inf. Process. Syst. 32, 10 (2019)
Prechelt, L.: Early stopping—but when? In: Montavon, G., Orr, G.B., Muller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn, pp. 53–67. Springer, Berlin, Heidelberg (2012)
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11, 125 (2020). https://doi.org/10.3390/info11020125
Ma, R.; Tao, P.; Tang, H.: Optimizing data augmentation for semantic segmentation on small-scale dataset. In: Kipf, T., Welling, M. (eds.) Proceedings of the 2nd International Conference on Control and Computer Vision, pp. 77–81. Association for Computing Machinery, Jeju (2019)
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901). https://doi.org/10.5169/SEALS-266450
Fiorio, C.; Gustedt, J.: Two linear time union-find strategies for image processing. Theor. Comput. Sci. 154, 165–181 (1996). https://doi.org/10.1016/0304-3975(94)00262-2
Wu, K.; Otoo, E.; Shoshani, A.: Optimizing connected component labeling algorithms. In: Fitzpatrick, J.M., Reinhardt, J.M. (eds.) Medical Imaging 2005: Image Processing, vol. 5747, pp. 1965–1976. SPIE, San Diego (2005)
Hayat, U.; Ali, A.; Murtaza, G.; Ullah, M.; Ullah, I.; de Celis, N.; Rajpoot, N.: Classification of well log data using vanishing component analysis. Pure Appl. Geophys. 117(6), 2719–2737 (2020)
Acknowledgements
This study is supported via funding from Prince Sattam Bin Abdulaziz University project number (PSAU/2023/R/1444)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abed, A., Akrout, B. & Amous, I. Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images. Arab J Sci Eng 49, 3735–3749 (2024). https://doi.org/10.1007/s13369-023-08159-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-08159-z