Journal of Signal Processing Systems

, Volume 90, Issue 6, pp 891–900 | Cite as

Watch Out: Embedded Video Tracking with BST for Unmanned Aerial Vehicles

  • Francesco Battistone
  • Alfredo PetrosinoEmail author
  • Vincenzo Santopietro


The paper presents the development of a real time tracking system, named Watch Out, that is able to efficiently run on an Nvidia Jetson board mounted on a UAV (Unmanned Aerial Vehicle). The approach to long term video tracking implemented in Watch Out is named Best Structured Tracker (BST): a set of local trackers independently tracks patches of the original target in an online learning manner, while an outlier detection procedure filters out the less meaningful ones, and a resampling procedure allows to correctly reinitialise the trackers that have been filtered out. Performance of the tracking algorithm has been verified both on VOT2016 challenge datasets and in real situations using an Nvidia Jetson board mounted on a drone. Results show that the proposed system can track almost every possible target in real time.


Tracking Online learning Outlier detection Drone Jetson GPU 


  1. 1.
    Akin, O., Erdem, E., Erdem, A., & Mikolajczyk, K. (2016). Deformable part-based tracking by coupled global and local correlation filters. Journal of Visual Communication and Image Representation, 38, 763–774.CrossRefGoogle Scholar
  2. 2.
    Berg, A., Ahlberg, J., & Felsberg, M. (2015). A thermal object tracking benchmark. In 12th IEEE international conference on advanced video and signal based surveillance (AVSS), 2015 (pp. 1–6).Google Scholar
  3. 3.
    Bilgic, B., Horn, B.K.P., & Masaki, I. (2010). Efficient integral image computation on the gpu. In Intelligent vehicles symposium (IV), 2010 IEEE (pp. 528–533).Google Scholar
  4. 4.
    Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with larank. In Proceedings of the 24th international conference on machine learning. ICML ’07 (pp. 89–96). New York, NY, USA: ACM.Google Scholar
  5. 5.
    Cehovin, L., Kristan, M., & Leonardis, A. (2013). Robust visual tracking using an adaptive coupled-layer visual model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 941–953.CrossRefGoogle Scholar
  6. 6.
    Cehovin, L., Leonardis, A., & Kristan, M. (2016). Robust visual tracking using template anchors. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–8).Google Scholar
  7. 7.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  8. 8.
    Dickmanns, E.D., & Mysliwetz, B.D. (1992). Recursive 3-d road and relative ego-state recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 199–213.CrossRefGoogle Scholar
  9. 9.
    Felsberg, M., Kristan, M., Matas, J., Leonardis, A., Pflugfelder, R., Häger, G., Berg, A., Eldesokey, A., Ahlberg, J., Čehovin, L., Vojír~, T., Lukežič, A., Fernández, G., Petrosino, A., Garcia-Martin, A., Montero, A.S., Varfolomieiev, A., Erdem, A., Han, B., Chang, C.M., Du, D., Erdem, E., Khan, F.S., Porikli, F., Zhao, F., Bunyak, F., Battistone, F., Zhu, G., Seetharaman, G., Li, H., Qi, H., Bischof, H., Possegger, H., Nam, H., Valmadre, J., Zhu, J., Feng, J., Lang, J., Martinez, J.M., Palaniappan, K., Lebeda, K., Gao, K., Mikolajczyk, K., Wen, L., Bertinetto, L., Poostchi, M., Maresca, M., Danelljan, M., Arens, M., Tang, M., Baek, M., Fan, N., Al-Shakarji, N., Miksik, O., Akin, O., Torr, P.H.S., Huang, Q., Martin-Nieto, R., Pelapur, R., Bowden, R., Laganière, R., Krah, S.B., Li, S., Yao, S., Hadfield, S., Lyu, S., Becker, S., Golodetz, S., Hu, T., Mauthner, T., Santopietro, V., Li, W., Hübner, W., Li, X., Li, Y., Xu, Z., & He, Z. (2016). The thermal infrared visual object tracking VOT-TIR2016 challenge results, (pp. 824–849). Cham: Springer International Publishing.Google Scholar
  10. 10.
    Godec, M., Roth, P.M., & Bischof, H. (2011). Hough-based tracking of non-rigid objects. In 2011 international conference on computer vision (pp. 81–88).Google Scholar
  11. 11.
    Haar, A. (1910). Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69(3), 331–371.MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S., & Torr, P. (2015). Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99), 1–1.Google Scholar
  13. 13.
    Hare, S., Saffari, A., & Torr, P.H.S. (2011). Struck: Structured output tracking with kernels. In 2011 International conference on computer vision (pp. 263–270).Google Scholar
  14. 14.
    Harris, M. (2007). Optimizing parallel reduction in cuda. NVDIA Developer Technology.Google Scholar
  15. 15.
    Harris, M., Sengupta, S., & Owens, J.D. (2007). Parallel prefix sum (scan) with cuda. GPU Gems, 3(39), 851–876.Google Scholar
  16. 16.
    Hou, L., Wan, W., Lee, K.H., Hwang, J.N., Okopal, G., & Pitton, J. (2015). Robust human tracking based on dpm constrained multiple-kernel from a moving camera. Journal of Signal Processing Systems, 1–13.Google Scholar
  17. 17.
    Kalal, Z., Matas, J., & Mikolajczyk, K. (2010). P-n learning: Bootstrapping binary classifiers by structural constraints. In IEEE Conference on computer vision and pattern recognition (CVPR), 2010 (pp. 49–56).Google Scholar
  18. 18.
    Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1409–1422.CrossRefGoogle Scholar
  19. 19.
    Kolsch, M., & Turk, M. (2004). Fast 2d hand tracking with flocks of features and multi-cue integration. In Proceedings of the 2004 conference on computer vision and pattern recognition workshop (CVPRW’04). CVPRW ’04, (Vol. 10 p. 158). Washington, DC, USA: IEEE Computer Society.Google Scholar
  20. 20.
    Kosecka, J., Blasi, R., Taylor, C.J., & Malik, J. (1998). A comparative study of vision-based lateral control strategies for autonomous highway driving. In IEEE international conference on robotics and automation, 1998. Proceedings. 1998, (Vol. 3 pp. 1903–1908).Google Scholar
  21. 21.
    Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernandez, G., Vojir, T., Hager, G., Nebehay, G., Pflugfelder, R., Gupta, A., Bibi, A., Lukezic, A., Garcia-Martin, A., Saffari, A., Petrosino, A., & Montero, A.S. (2015). The visual object tracking vot2015 challenge results. In 2015 IEEE international conference on computer vision workshop (ICCVW) (pp. 564–586).Google Scholar
  22. 22.
    Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin, L., Vojir, T., Häger, G., Lukežič, A., & Fernandez, G. (2016). The visual object tracking vot2016 challenge results. Springer.Google Scholar
  23. 23.
    Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Čehovin, L., Nebehay, G., Vojíř, T., Fernández, G., Lukežič, A., Dimitriev, A., Petrosino, A., Saffari, A., Li, B., Han, B., Heng, C., Garcia, C., Pangeršič, D., Häger, G., Khan, F.S., Oven, F., Possegger, H., Bischof, H., Nam, H., Zhu, J., Li, J., Choi, J.Y., Choi, J.W., Henriques, J.F., van de Weijer, J., Batista, J., Lebeda, K., Öfjäll, K., Yi, K.M., Qin, L., Wen, L., Maresca, M.E., Danelljan, M., Felsberg, M., Cheng, M.M., Torr, P., Huang, Q., Bowden, R., Hare, S., Lim, S.Y., Hong, S., Liao, S., Hadfield, S., Li, S.Z., Duffner, S., Golodetz, S., Mauthner, T., Vineet, V., Lin, W., Li, Y., Qi, Y., Lei, Z., & Niu, Z.H. (2015). The visual object tracking VOT2014 challenge results, (pp. 191–217). Cham: Springer International Publishing.Google Scholar
  24. 24.
    Lebeda, K., Hadfield, S., Matas, J., & Bowden, R. (2013). Long-term tracking through failure cases. In 2013 IEEE international conference on computer vision workshops (pp. 153–160).Google Scholar
  25. 25.
    Lebeda, K., Hadfield, S., Matas, J., & Bowden, R. (2016). Texture-independent long-term tracking using virtual corners. IEEE Transactions on Image Processing, 25(1), 359–371.MathSciNetCrossRefGoogle Scholar
  26. 26.
    Lienhart, R., & Maydt, J. (2002). An extended set of haar-like features for rapid object detection. In International conference on image processing. 2002. Proceedings. 2002, (Vol. 1 pp. I–900–I–903).Google Scholar
  27. 27.
    Lukezic, A., Cehovin, L., & Kristan, M. (2016). Deformable parts correlation filters for robust visual tracking. arXiv:1605.03720.
  28. 28.
    Ma, L., Stepanyan, V., Cao, C., Faruque, I., Woolsey, C., & Hovakimyan, N. (2006). Flight test bed for visual tracking of small UAVs. American Institute of Aeronautics and Astronautics.Google Scholar
  29. 29.
    Maresca, M.E., & Petrosino, A. (2013). Matrioska: a multi-level approach to fast tracking by learning. In Petrosino, A. (Ed.) ICIAP (2). Lecture Notes in computer science, (Vol. 8157 pp. 419–428): Springer.Google Scholar
  30. 30.
    Maresca, M.E., & Petrosino, A. (2015). Clustering local motion estimates for robust and efficient object tracking, (pp. 244–253). Cham: Springer International Publishing.Google Scholar
  31. 31.
    Mateo Lozano, O., & Otsuka, K. (2009). Real-time visual tracker by stream processing. Journal of Signal Processing Systems, 57(2), 285–295.CrossRefGoogle Scholar
  32. 32.
    Muscoloni, A., & Mattoccia, S. (2014). Real-time tracking with an embedded 3d camera with fpga processing. In 2014 international conference on 3d imaging (IC3D) (pp. 1–7).Google Scholar
  33. 33.
    Nebehay, G., & Pflugfelder, R. (2015). Clustering of static-adaptive correspondences for deformable object tracking. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2784–2791).Google Scholar
  34. 34.
    Qadir, A., Neubert, J., & Semke, W. (2012). On-board visual tracking with unmanned aircraft system (UAS). arXiv:1203.2386.
  35. 35.
    Rathinam, S., Almeida, P., Kim, Z., Jackson, S., Tinka, A., Grossman, W., & Sengupta, R. (2007). Autonomous searching and tracking of a river using an uav. In 2007 American control conference (pp. 359–364).Google Scholar
  36. 36.
    Thiang, I.N., Maw, Dr.L., & Tun, H.M. (2016). Vision-based object tracking algorithm with ar. drone. IJSTR Volume 5 - Issue 6 June 2016 Edition.Google Scholar
  37. 37.
    Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the twenty-first international conference on machine learning. ICML ’04 (p. 104). New York, NY, USA: ACM.Google Scholar
  38. 38.
    Viola, P., & Jones, M.J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.CrossRefGoogle Scholar
  39. 39.
    Vojir, T., & Matas, J. (2014). The enhanced flock of trackers. In Registration and recognition in images and videos (pp. 113–136). Berlin: Springer.Google Scholar
  40. 40.
    Wang, X., Valstar, M., Martinez, B., Khan, M.H., & Pridmore, T. (2015). Tric-track: Tracking by regression with incrementally learned cascades. In 2015 IEEE international conference on computer vision (ICCV) (pp. 4337–4345).Google Scholar
  41. 41.
    Xiao, J., Stolkin, R., & Leonardis, A. (2015). Single target tracking using adaptive clustered decision trees and dynamic multi-level appearance models. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4978–4987).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Francesco Battistone
    • 1
  • Alfredo Petrosino
    • 1
    Email author
  • Vincenzo Santopietro
    • 1
  1. 1.Department of Science and TechnologyUniversity of Naples ParthenopeNaplesItaly

Personalised recommendations