Abstract
Real-time video surveillance systems are widely deployed in various environments, including public areas, commercial buildings, and public infrastructures. Person detection is a key and crucial task in different video surveillance applications, such as person detection, segmentation, and tracking. Researchers presented different image processing and artificial intelligence-based approaches (including machine and deep learning) for person detection and tracking, but mainly comprised of frontal view camera perspective. A real-time person tracking and segmentation system is introduced in this work, using an overhead camera perspective. The system applied a deep learning-based algorithm, i.e., SiamMask, a simple, versatile, fast, and surpassing other real-time tracking algorithms. The algorithm also performs segmentation of the target person by combining a mask branch to the fully convolutional twin neural network for target or person tracking. First, the person video sequences are obtained from an overhead perspective, and then additional training is performed with the help of transfer learning. Finally, a comparison is performed with other tracking algorithms. The SiamMask algorithm delivers good results, with a tracking accuracy of 95%.
Similar content being viewed by others
References
Choi, J.W., Moon, D., Yoo, J.H.: Robust multi-person tracking for real-time intelligent video surveillance. ETRI J. 37(3), 551 (2015)
Liu, P., Li, X., Liu, H., Fu, Z.: Multidisciplinary Digital Publishing Institute: online learned Siamese network with auto-encoding constraints for robust multi-object tracking. Electronics 8(6), 595 (2019)
Potdar, K., Pai, C.D., Akolkar, S.: A convolutional neural network based live object recognition system as blind aid (2018). arXiv preprint. arXiv:1811.10399
Vera, P., Monjaraz, S., Salas, J.: Counting pedestrians with a zenithal arrangement of depth cameras. Mach. Vis. Appl. 27(2), 303 (2016)
Ertler, C., Possegger, H., Opitz, M., Bischof, H.: Pedestrian detection in RGB-D images from an elevated viewpoint. In: Kropatsch, W., Janusch, I., Artner, N. (eds.) Proceedings of the 22nd Computer Vision Winter Workshop. TU Wien, Pattern Recognition and Image Processing Group, Vienna (2017)
Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Adnan, A.: Robust background subtraction based person’s counting from overhead view. In: 9th IEEE Annual Ubiquitous Computing. Electronics Mobile Communication Conference (UEMCON), pp. 746–752 (2018)
Ahmed, I., Ahmad, M., Ahmad, A., Jeon, G.: Top view multiple people tracking by detection using deep SORT and YOLOv3 with transfer learning: within 5G infrastructure. Int. J. Mach. Learn. Cybern. 1–15 (2020)
Nguyen, D.T., Li, W., Ogunbona, P.O.: Human detection from images and videos: a survey. Pattern Recognit. 51, 148 (2016)
Buongiorno, A., Trotta, G.F., Bevilacqua, V.: Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing 300, 17 (2018)
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey (2019). arXiv preprint arXiv:1905.05055
Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey (2019). arXiv preprint. arXiv:1904.09172
Zhou, S., Ke, M., Qiu, J., Wang, J.: A survey of multi-object video tracking algorithms. In: Abawajy, J., Choo, K.K.R., Islam, R., Xu, Z., Atiquzzaman, M. (eds.) International Conference on Applications and Techniques. Cyber Security and Intelligence ATCI 2018, pp. 351–369. Springer International Publishing, Cham (2019)
Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recognit. 76, 323 (2018)
Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633 (2018). https://doi.org/10.1007/s10586-017-0968-3
Migniot, C., Ababsa, F.: Hybrid 3D–2D human tracking in a top view. J. Real Time Image Process. 11(4), 769 (2016)
Ahmad, M., Ahmed, I., Khan, F.A., Qayum, F., Aljuaid, H.: Convolutional neural network-based person tracking using overhead views. Int. J. Distrib. Sens. Netw. 16(6), 1550147720934738 (2020)
Ahmed, I., Ahmad, M., Nawaz, M., Haseeb, K., Khan, S., Jeon, G.: Efficient topview person detector using point based transformation and lookup table. Comput. Commun. 147, 188 (2019)
Ahmed, I., Din, S., Jeon, G., Piccialli, F.: Exploring deep learning models for overhead view multiple object detection. IEEE Internet Things J. 7(7), 5737 (2020)
Kristoffersen, M., Dueholm, J., Gade, R., Moeslund, T.: Pedestrian counting with occlusion handling using stereo thermal cameras. Sensors 16(1), 62 (2016)
Burbano, A., Bouaziz, S., Vasiliu, M.: 3D-sensing distributed embedded system for people tracking and counting. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 470–475 (2015)
Tseng, T., Liu, A., Hsiao, P., Huang, C., Fu, L.: Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4077–4082 (2014)
García, J., Gardel, A., Bravo, I., Lázaro, J.L., Martínez, M., Rodríguez, D.: Directional people counter based on head tracking. IEEE Trans. Ind. Electron. 60(9), 3991 (2013)
Ahmed, I., Ahmad, A., Piccialli, F., Sangaiah, A.K., Jeon, G.: A robust features-based person tracker for overhead views in industrial environment. IEEE Internet Things J. 5(3), 1598 (2018)
Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 529–534 (2013)
Ryan, D., Denman, S., Sridharan, S., Fookes, C.: An evaluation of crowd counting methods, features and regression models. Comput. Vis. Image Underst. 130, 1 (2015)
Lin, Q., Zhou, C., Wang, S., Xu, X.: Human behavior understanding via top-view vision. AASRI Procedia 3, 184 (2012)
Hsu, T.-W., Yang, Y.-H., Yeh, T.-H., Liu, A.-S., Fu, L.-C., Zeng, Y.-C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 004058–004063 (2016)
Nakatani, R., Kouno, D., Shimada, K., Endo, T.: A person identification method using a top-view head image from an overhead camera. JACIII 16(6), 696 (2012)
Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Energy efficient camera solution for video surveillance. Int. J. Adv. Comput. Sci. Appl. 10(3) (2019). http://dx.doi.org/10.14569/IJACSA.2019.0100367
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
Iguernaissi, R., Merad, D., Drap, P.: People counting based on kinect depth data. In: Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods—Volume 1: ICPRAM. INSTICC (SciTePress, 2018), pp. 364–370. https://doi.org/10.5220/0006585703640370
Ozturk, O., Yamasaki, T., Aizawa, K.: Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1020–1027 (2009)
Snidaro, L., Micheloni, C., Chiavedale, C.: Video security for ambient intelligence. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 35(1), 133 (2005)
Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan A.: Int. J. Adv. Comput. Sci. Appl. 10(4) (2019). https://doi.org/10.14569/IJACSA.2019.0100470
Gao, C., Liu, J., Feng, Q., Lv, J.: Person detection from overhead view: a survey. Multimedia Tools Appl. 75(15), 9315 (2016). https://doi.org/10.1007/s11042-016-3344-z
Velipasalar, S., Tian, Y., Hampapur A.: Automatic counting of interacting people by using a single uncalibrated camera. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 1265–1268 (2006)
Bagaa, M., Taleb, T., Ksentini, A.: Efficient tracking area management framework for 5G networks. IEEE Trans. Wirel. Commun. 15(6), 4117 (2016)
Yu, S., Chen, X., Sun, W., Xie D.: A robust method for detecting and counting people. In: 2008 International Conference on Audio, Language and Image Processing, pp. 1545–1549 (2008)
Wateosot, C., Suvonvorn, N., et al.: Top-view based people counting using mixture of depth and color information. In: The Second Asian Conference on Information Systems, ACIS (Citeseer, 2013)
Perng, J., Wang, T., Hsu, Y., Wu B.: The design and implementation of a vision-based people counting system in buses. In: 2016 International Conference on System Science and Engineering (ICSSE), pp. 1–3 (2016)
Yahiaoui, T., Meurie, C., Khoudour, L.: A people counting system based on dense and close stereovision. In: Cabestaing, F., Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) Image and Signal Processing, pp. 59–66. Springer, Berlin (2008)
Cao, J., Sun, L., Odoom, M.G., Luan, F., Song X.: Counting people by using a single camera without calibration. In: 2016 Chinese Control and Decision Conference (CCDC), pp. 2048–2051 (2016)
Mukherjee, S., Saha, B., Jamal, I., Leclerc, R., Ray N.: Anovel framework for automatic passenger counting. In: 2011 18th IEEE International Conference on Image Processing, pp. 2969–2972 (2011)
Pang, Y., Yuan, Y., Li, X., Pan, J.: Efficient HOG human detection. Signal Process. 91(4), 773 (2011)
Wu, C.J., Houben, S., Marquardt, N.: EagleSense: tracking people and devices in interactive spaces using real-time top-view depth-sensing. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Association for Computing Machinery, New York, NY, USA, 2017). CHI ’17, pp. 3929–3942. https://doi.org/10.1145/3025453.3025562
Wetzel, J., Laubenheimer, A., Heizmann, M.: Joint probabilistic people detection in overlapping depth images. IEEE Access 8, 28349 (2020)
Ahmed, I., Carter, J.N.: A robust person detector for overhead views. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 1483–1486 (2012)
Ahmed, I., Ahmad, M., Adnan, A., Ahmad, A., Khan, M.: Person detector for different overhead views using machine learning. Int. J. Mach. Learn. Cybern. 10(10), 2657 (2019). https://doi.org/10.1007/s13042-019-00950-5
Ullah, K., Ahmed, I., Ahmad, M., Khan, I.: Comparison of person tracking algorithms using overhead view implemented in OpenCV. In: 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON) (IEEE, 2019), pp. 284–289
Ullah, K., Ahmed, I., Ahmad, M., Rahman, A.U., Nawaz, M., Adnan, A.: Rotation invariant person tracker using top view. J. Ambient Intell. Humaniz. Comput. 1–17 (2019)
Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., Tian, Q.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C., et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., Yang, M.H.: Learning attribute-specific representations for visual tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8835–8842 (2019)
Ahmad, M., Ahmed, I., Adnan, A.: Overhead view person detection using YOLO. In: 2019 IEEE 10th Annual Ubiquitous Computing. Electronics Mobile Communication Conference (UEMCON), pp. 0627–0633 (2019)
Ahmad, M., Ahmed, I., Ullah, K., Ahmad, M.: A deep neural network approach for top view people detection and counting. In: IEEE 10th Annual Ubiquitous Computing. Electronics Mobile Communication Conference (UEMCON), pp. 1082–1088 (2019)
Ahmed, I., Ahmad, M., Khan, F.A., Asif, M.: Comparison of deep-learning-based segmentation models: using top view person images. IEEE Access 8, 136361 (2020)
Ahmed, I., Din, S., Jeon, G., Piccialli, F., Fortino, G.: Towards collaborative robotics in top view surveillance: a framework for multiple object tracking by detection using deep learning. IEEE/CAA J. Autom. Sin. 1–18 (2020). https://doi.org/10.1109/JAS.2020.1003453
Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P.H., Vedaldi, A.: Learning feed-forward one-shot learners (2016). arXiv preprint arXiv:1606.05233
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8971–8980(2018)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional Siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer, Berlin (2016)
Acknowledgements
This work was supported by Incheon National University Research Concentration Professors Grant in 2019.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahmed, I., Jeon, G. A real-time person tracking system based on SiamMask network for intelligent video surveillance. J Real-Time Image Proc 18, 1803–1814 (2021). https://doi.org/10.1007/s11554-021-01144-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01144-5