Neural Processing Letters

, Volume 47, Issue 3, pp 859–876 | Cite as

Deep Learning and Preference Learning for Object Tracking: A Combined Approach

  • Shuchao Pang
  • Juan José del Coz
  • Zhezhou Yu
  • Oscar Luaces
  • Jorge DíezEmail author


Object tracking is one of the most important processes for object recognition in the field of computer vision. The aim is to find accurately a target object in every frame of a video sequence. In this paper we propose a combination technique of two algorithms well-known among machine learning practitioners. Firstly, we propose a deep learning approach to automatically extract the features that will be used to represent the original images. Deep learning has been successfully applied in different computer vision applications. Secondly, object tracking can be seen as a ranking problem, since the regions of an image can be ranked according to their level of overlapping with the target object (ground truth in each video frame). During object tracking, the target position and size can change, so the algorithms have to propose several candidate regions in which the target can be found. We propose to use a preference learning approach to build a ranking function which will be used to select the bounding box that ranks higher, i.e., that will likely enclose the target object. The experimental results obtained by our method, called \( DPL ^{2}\) (Deep and Preference Learning), are competitive with respect to other algorithms.


Deep learning Preference learning Object tracking 



This work was funded by Ministerio de Economía y Competitividad de España (Grant TIN2015-65069-C2-2-R), Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant 20120061110045) and the Project of Science and Technology Development Plan of Jilin Province, China (Grant 20150204007GX). The paper was written while Shuchao Pang was visiting the University of Oviedo at Gijón.


  1. 1.
    Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09), pp 983–990Google Scholar
  2. 2.
    Bahamonde A, Bayón GF, Díez J, Quevedo JR, Luaces O, del Coz JJ, Alonso J, Goyache F (2004) Feature subset selection for learning preferences: a case study. In: Proceedings of ICML’04. ACMGoogle Scholar
  3. 3.
    Bai Y, Tang M (2012) Robust tracking via weakly supervised ranking svm. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1854–1861Google Scholar
  4. 4.
    Bao C, Wu Y, Ling H, Ji H (2012) Real time robust l1 tracker using accelerated proximal gradient approach. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), pp 1830–1837Google Scholar
  5. 5.
    Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577CrossRefGoogle Scholar
  6. 6.
    Dai P, Liu K, Xie Y, Li C (2014) Online co-training ranking svm for visual tracking. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 6568–6572Google Scholar
  7. 7.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  8. 8.
    Dinh TB, Vo N, Medioni G (2011) Context tracker: exploring supporters and distracters in unconstrained environments. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11), pp 1177–1184Google Scholar
  9. 9.
    Hare S, Saffari A, Torr PH (2011) Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision (ICCV’11), pp 263–270Google Scholar
  10. 10.
    Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Computer Vision—ECCV 2012. Springer, Berlin, pp 702–715Google Scholar
  11. 11.
    Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: International Conference on Artificial Neural Networks, pp 97–102Google Scholar
  12. 12.
    Jepson AD, Fleet DJ, El-Maraghi TF (2003) Robust online appearance models for visual tracking. IEEE Trans Pattern Anal Mach Intell 25(10):1296–1311CrossRefGoogle Scholar
  13. 13.
    Jia X, Lu H, Yang MH (2012) Visual tracking via adaptive structural local sparse appearance model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), pp 1822–1829Google Scholar
  14. 14.
    Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, pp 133–142Google Scholar
  15. 15.
    Kalal Z, Matas J, Mikolajczyk K (2010) Pn learning: bootstrapping binary classifiers by structural constraints. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10), pp 49–56Google Scholar
  16. 16.
    Kwon J, Lee KM (2010) Visual tracking decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10), pp 1269–1276Google Scholar
  17. 17.
    Kwon J, Lee KM (2011) Tracking by sampling trackers. In: IEEE International Conference on Computer Vision (ICCV’11), pp 1195–1202Google Scholar
  18. 18.
    Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113CrossRefGoogle Scholar
  19. 19.
    LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw 3361(10)Google Scholar
  20. 20.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  21. 21.
    Li X, Hu W, Shen C, Zhang Z, Dick A, Hengel AVD (2013) A survey of appearance models in visual object tracking. ACM Trans Intell Syst Technol (TIST) 4(4):58Google Scholar
  22. 22.
    Lin TY, Cui Y, Belongie S, Hays J, Tech C (2015) Learning deep representations for ground-to-aerial geolocalization. In: Proceedings of the IEEE CVPR’15, pp 5007–5015Google Scholar
  23. 23.
    Liu B, Huang J, Yang L, Kulikowsk C (2011) Robust tracking using local sparse appearance model and k-selection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11), pp 1313–1320Google Scholar
  24. 24.
    Luque-Baena RM, Ortiz-de Lazcano-Lobato JM, López-Rubio E, Domínguez E, Palomo EJ (2013) A competitive neural network for multiple object tracking in video sequence analysis. Neural Process Lett 37(1):47–67CrossRefGoogle Scholar
  25. 25.
    Pang S, del Coz JJ, Yu Z, Luaces O, Díez J (2016) Combining deep learning and preference learning for object tracking. In: Hirose A, Ozawa S, Doya K, Ikeda K, Lee M, Liu D (eds) Neural Information Processing: 23rd International Conference, ICONIP 2016, Kyoto, Japan, October 16–21, 2016, Proceedings, Part III, pp 70–77. Springer, BerlinGoogle Scholar
  26. 26.
    Pang S, Yu Z (2015) Face recognition: a novel deep learning approach. J Opt Technol 82(4):237–245CrossRefGoogle Scholar
  27. 27.
    Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1–3):125–141CrossRefGoogle Scholar
  28. 28.
    Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970CrossRefGoogle Scholar
  29. 29.
    Vapnik V (1998) Statistical learning theory. Wiley, New York, NYzbMATHGoogle Scholar
  30. 30.
    Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. Neural Information Processing Systems Foundation Inc, Lake Tahoe, pp 809–817Google Scholar
  31. 31.
    Wu Y, Lim J, Yang M (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848CrossRefGoogle Scholar
  32. 32.
    Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13), pp 2411–2418Google Scholar
  33. 33.
    Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv (CSUR) 38(4):13CrossRefGoogle Scholar
  34. 34.
    Zhang K, Zhang L, Yang MH (2012) Real-time compressive tracking. In: Computer Vision—ECCV 2012. Springer, Berlin, pp 864–877Google Scholar
  35. 35.
    Zhang T, Ghanem B, Liu S, Ahuja N (2012) Robust visual tracking via multi-task sparse learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), pp 2042–2049Google Scholar
  36. 36.
    Zhong W, Lu H, Yang MH (2012) Robust object tracking via sparsity-based collaborative model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), pp 1838–1845Google Scholar
  37. 37.
    Zhou S, Chen Q, Wang X (2013) Convolutional deep networks for visual data classification. Neural Process Lett 38(1):17–27CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.Artificial Intelligence CenterUniversity of Oviedo at GijónGijónSpain

Personalised recommendations