Abstract
The need for automated real-time visual systems in applications such as smart camera surveillance, smart environments, and drones necessitates the improvement of methods for visual active monitoring and control. Traditionally, the active monitoring task has been handled through a pipeline of modules such as detection, filtering, and control. However, such methods are difficult to jointly optimize and tune their various parameters for real-time processing in resource constraint systems. In this paper a deep Convolutional Camera Controller Neural Network is proposed to go directly from visual information to camera movement to provide an efficient solution to the active vision problem. It is trained end-to-end without bounding box annotations to control a camera and follow multiple targets from raw pixel values. Evaluation through both a simulation framework and real experimental setup, indicate that the proposed solution is robust to varying conditions and able to achieve better monitoring performance than traditional approaches both in terms of number of targets monitored as well as in effective monitoring time. The advantage of the proposed approach is that it is computationally less demanding and can run at over 10 FPS (\(\sim 4\times \) speedup) on an embedded smart camera providing a practical and affordable solution to real-time active monitoring.
Similar content being viewed by others
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pp. 265–283. USENIX Association, Berkeley, CA, USA (2016). http://dl.acm.org/citation.cfm?id=3026877.3026899
Al Haj, M., Fernández, C., Xiong, Z., Huerta, I., Gonzàlez, J., Roca, X.: Beyond the Static Camera: Issues and Trends in Active Vision. Springer, London (2011)
Al Machot, F., Ali, M., Haj Mosa, A.: Real-time raindrop detection based on cellular neural networks for ADAS. J Real Time Image Proc. 16, 1 (2019)
Angella, F., Reithler, L., Gallesio, F.: Optimal deployment of cameras for video surveillance systems. In: 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 388–392 (2007). https://doi.org/10.1109/AVSS.2007.4425342
Bateux, Q., Marchand, E., Leitner, J., Chaumette, F., Corke, P.: Training deep neural networks for visual servoing. In: 2018 IEEEInternational Conference on Robotics and Automation (ICRA), pp. 3307–3314 (2018). https://doi.org/10.1109/ICRA.2018.8461068
Bernardin, K., van de Camp, F., Stiefelhagen, R.: Automatic person detection and tracking using fuzzy controlled active cameras. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007). https://doi.org/10.1109/CVPR.2007.383502
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple on-line and realtime tracking. In: 2016 IEEE International Confer-ence on Image Processing (ICIP), pp. 3464–3468 (2016). https://doi.org/10.1109/ICIP.2016.7533003
Bhanu, B., Ravishankar, C.V., Roy-Chowdhury, A.K., Aghajan, H., Terzopoulos, D.: Distributed Video Sensor Networks, 1st edn. Springer Publishing Company, Incorporated, Berlin (2011)
Biswas, A., Guha, P., Mukerjee, A., Venkatesh, K.S.: Intrusion detection and tracking with pan-tilt cameras. In: 2006 IET International Conference on Visual Information Engineering, pp. 565–571 (2006). https://doi.org/10.1049/cp:20060593
Bobda, C., Velipasalar, S.: Distributed Embedded Smart Cameras: Architectures, Design and Applications, 1st edn. Springer, New York (2014)
Bo Bo, N., Deboeverie, F., Veelaert, P., Philips, W.: Real-time multi-people tracking by greedy likelihood maximization. In: Proceedings of the 9th International Conference on Distributed Smart Cameras, ICDSC ’15, pp. 32–37. ACM, New York (2015). https://doi.org/10.1145/2789116.2789125. http://doi.acm.org/10.1145/2789116.2789125
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-nms – improving object detection with one line of code. In: Proceedings ofthe IEEE international conference on computer vision (ICCV)(2017)
Bojarski, M., Testa, D.D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., Zieba, K.: End to end learning for self-driving cars. CoRR (2016). arXiv:1604.07316
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000)
Campmany, V., Silva, S., Espinosa, A., Moure, J., Vázquez, D., López, A.: GPU-based pedestrian detection for autonomous driving. Procedia Comput. Sci. 80, 2377–2381 (2016). https://doi.org/10.1016/j.procs.2016.05.455. International Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA
Chahyati,D., Fanany,M.I., Arymurthy, A.M.: Tracking people by detection using cnn features. Procedia Computer Science, 124, 167–172 (2017)
Chen, H., Zhao, X., Tan, M.: A novel pan-tilt camera control approach for visual tracking. In: Proceeding of the 11th World Congress on Intelligent Control and Automation, pp. 2860–2865 (2014). https://doi.org/10.1109/WCICA.2014.7053182
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
Dhillon, P.S.: Robust real-time face tracking using an active camera. In: Herrero, Á., Gastaldo, P., Zunino, R., Corchado, E. (eds.) Computational Intelligence in Security for Information Systems, pp. 179–186. Springer, Berlin (2009)
Ding, C., Song, B., Morye, A., Farrell, J.A., Roy-Chowdhury, A.K.: Collaborative sensing in a distributed PTZ camera network. IEEE Trans. Image Process. 21(7), 3282–3295 (2012). https://doi.org/10.1109/IROS.2009.5353915
Dinh, T., Yu, Q., Medioni, G.: Real time tracking using an active pan-tilt-zoom network camera. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3786–3793 (2009). https://doi.org/10.1109/IROS.2009.5353915
Fan, H., Ling, H.: Siamese cascaded region proposal networksfor real-time visual tracking. In: 2019 IEEE/CVF conferenceon computer vision and pattern recognition (CVPR), pp. 7944–7953 (2019)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track andtrack to detect. In: Proceedings of the IEEE international conference on computer Vision (ICCV), pp. 3038–3046 (2017)
Ferryman, J., Shahrokni, A.: Pets2009: Dataset and challenge. In: 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 1–6 (2009). https://doi.org/10.1109/PETS-WINTER.2009.5399556
Haj, M.A., Bagdanov, A.D., Gonzalez, J., Roca, F..: Reactive object tracking with a single PTZ camera. In: 2010 20th International Conference on Pattern Recognition, pp. 1690–1693 (2010). https://doi.org/10.1109/ICPR.2010.418
Hosang, J., Benenson, R., Schiele, B.: Learning non-maximumsuppression. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp. 4507–4515 (2017)
Kiran, M., Tiwari, V., Nguyen-Meidine, L., Morin, L., Granger, E.: On the interaction between deep detectors and siamese trackers in video surveillance. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE Computer Society, Los Alamitos, CA, USA pp. 1–8 (2019)
Kulathumani, V., Parupati, S., Ross, A., Jillela, R.: Collaborative Face Recognition Using a Network of Embedded Cameras, pp. 373–387. Springer, London (2011)
Kyrkou, C., Christoforou, E.G., Timotheou, S., Theocharides, T., Panayiotou, C., Polycarpou, M.: Optimizing the detectionperformance of smart camera networks through a probabilistic image-based model. In: IEEE transactions on circuits and systems for video technology 28(5), 1197–1211 (2018)
Ser-Nam L., Elgammal, A., Davis, L.S.: Image-based pan-tilt camera control in a multi-camera surveillance environment. In: 2003 International conference on multimedia and Expo. ICME’03. Proceedings (Cat. No.03TH8698), vol. 1, pp. I–645 (2003)
Luo, W., Sun, P., Zhong, F., Liu, W., Zhang, T., Wang, Y.: End-to-end active object tracking via reinforcement learning. In: Proceed-ings of the 35th international conference on machine learning, proceedings of machine learning research, vol. 80, pp. 3286–3295 (2018)
Miao, X., Zhen, X., Liu, X., Deng, C., Athitsos, V., Huang, H.: Direct shape regression networks for end-to-end face alignment. In: 2018 IEEE/CVF conference on computer vi-sion and pattern recognition, pp. 5040–5049 (2018)
Micheloni, C., Rinner, B., Foresti, G.L.: Video analysis in pan-tilt-zoom camera networks. IEEE Signal Process. Mag. 27(5), 78–90 (2010). https://doi.org/10.1109/CVPR.2017.690
Neff, C., Mendieta, M., Mohan, S., Baharani, M., Rogers, S., Tabkhi, H.: Revamp2t: Real-time edge video analytics for multi camera privacy-aware pedestrian tracking. IEEE Int of Things J 7(4), 2591–2602 (2020)
Patil, H.R., Bhagat, K.S.: Detection and tracking of moving objects; a survey. Int. J. Eng. Res. Appl. 5(11), 138–142 (2015)
Pflugfelder, R.P.: Siamese learning visual tracking: a survey. CoRR (2017). arXiv:1707.00569
Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Salih, Y., Malik, A.S.: Depth and geometry from a single 2d image using triangulation. In: 2012 IEEE International Conference on Multimedia and Expo Workshops, pp. 511–515 (2012). https://doi.org/10.1109/ICMEW.2012.95
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74
Wang, R., Dong, H., Han, T.X., Mei, L.: Robust tracking via monocular active vision for an intelligent teaching system. Vis. Comput. 32(11), 1379–1394 (2016). https://doi.org/10.1007/s00371-015-1206-8
Zhao, Z., Zheng, P., Xu, S., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019). http://dl.acm.org/citation.cfm?id=3026877.30268990
Funding
This work has been supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 739551 (KIOSCoE) and from the Government of the Republic of Cyprus through the Directorate General for European Programmes, Coordination and Development.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Below is the link to the electronic supplementary material.
Supplementary file2 (MP4 459 KB)
Supplementary file3 (AVI 571 KB)
Supplementary file4 (AVI 1420 KB)
Supplementary file5 (AVI 1147 KB)
Supplementary file6 (AVI 340 KB)
Supplementary file7 (AVI 2892 KB)
Supplementary file8 (AVI 11225 KB)
Supplementary file9 (AVI 1647 KB)
Supplementary file10 (AVI 3247 KB)
Supplementary file11 (AVI 3246 KB)
Supplementary file12 (AVI 1675 KB)
Supplementary file13 (AVI 1459 KB)
Rights and permissions
About this article
Cite this article
Kyrkou, C. \(\text{C}^{3}\text{Net}\): end-to-end deep learning for efficient real-time visual active camera control. J Real-Time Image Proc 18, 1421–1433 (2021). https://doi.org/10.1007/s11554-021-01077-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01077-z