Multimedia Tools and Applications

, Volume 78, Issue 6, pp 7543–7562 | Cite as

Robust visual tracking based on convolutional neural network with extreme learning machine

  • Rui SunEmail author
  • Xu Wang
  • Xiaoxing Yan


Recently, deep learning has attracted substantial attention as a promising solution to many problems in computer vision. Among various deep learning architectures, convolutional neural network (CNN) has demonstrated superior performance as a feature learning method. In this paper, we present a novel hybrid model of CNN and extreme learning machine (ELM) for object tracking. Training a conventional CNN requires a substantial amount of computation and a large dataset. ELM randomly generates the parameters of hidden layers and calculates network weights between output and hidden layers via the regularized least-square method, thereby dramatically reducing the learning time while producing accurate results with minimal training data. Therefore, we integrate the ELM auto-encoder architecture into the CNN model. In addition, an effective updating scheme is designed for the model training to overcome the tracking drift problem. The joint CNN-ELM tracker is robust to object variations such as illumination, occlusion, and rotation in a video sequence. Numerous experiments on various challenging videos demonstrate that the proposed tracker performs favourably compared to several state-of-the-art methods.


Convolutional neural network Extreme learning machine Visual tracking Deep learning 



This work was supported by the National Natural Science Foundation of China (61471154) and Anhui Province science and technology project (1704d0802181).


  1. 1.
    Avidan S (2004) Support vector tracking. IEEE Trans Pattern Anal Mach Intell 26(8):1064–1072CrossRefGoogle Scholar
  2. 2.
    Avidan S (2007) Ensemble tracking. IEEE Trans Pattern Anal Mach Intell 29(2):261–271CrossRefGoogle Scholar
  3. 3.
    Babenko B, Yang MH, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619–1632CrossRefGoogle Scholar
  4. 4.
    Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828CrossRefGoogle Scholar
  5. 5.
    Black MJ, Jepson AD (1996) EigenTracking: Robust matching and tracking of articulated objects using a view-based representation. Proc. ECCV, Cambridge, pp 329–342Google Scholar
  6. 6.
    Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577CrossRefGoogle Scholar
  7. 7.
    Duan MX, Li KL, Li KQ (2018) An Ensemble CNN2ELM for Age Estimation. IEEE Trans on Information Forensics and Security 18(3):758–772CrossRefGoogle Scholar
  8. 8.
    Fan J, Xu W, Wu Y, Gong Y (2010) Human tracking using convolutional neural networks. IEEE Trans Neural Network 21(10):1610–1623CrossRefGoogle Scholar
  9. 9.
    Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video Captioning With Attention-Based LSTM and Semantic Consistency. IEEE Trans on Multimedia 19(9):2045–2055CrossRefGoogle Scholar
  10. 10.
    Gao J, Ling H, Hu W, Xing J (2014) Transfer learning based visual tracking with Gaussian processes regression. Proc. 13th Eur. Conf. Comput. Vis, Zurich, pp 188–203Google Scholar
  11. 11.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Columbus, pp 580–587Google Scholar
  12. 12.
    Grabner H, Bischof H (2006) On-line boosting and vision. Proc. IEEE Conf. Comput. Vis. Pattern Recognit, New York, pp 260–267Google Scholar
  13. 13.
    Hare S, Saffari A, Torr PHS (2011) Struck: Structured output tracking with kernels. Proc. ICCV, Barcelona, pp 263–270Google Scholar
  14. 14.
    Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. Proc. ECCV, Florence, pp 702–715Google Scholar
  15. 15.
    Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596CrossRefGoogle Scholar
  16. 16.
    Huang GB (2014) An insight into extreme learning machines: Random neurons, random features and kernels. Cogn Comput 6(3):376–390CrossRefGoogle Scholar
  17. 17.
    Huang G, Huang GB, Song S (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48CrossRefGoogle Scholar
  18. 18.
    Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst,Man, Cybern B, Cybern 42(2):513–529CrossRefGoogle Scholar
  19. 19.
    Kalal Z, Matas J, Mikolajczyk K (2010) P-N learning: Bootstrapping binary classifiers by structural constraints. Proc. IEEE Conf. Computer Visual Pattern Recognition, San Francisco, pp 49–56Google Scholar
  20. 20.
    Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422CrossRefGoogle Scholar
  21. 21.
    Kim J, Kim JH, Jang GL, Lee M (2017) Fast Learning method for Convolutional neural networks using extreme learning machine and its application to lane detection. Neural Netw 87:109–121CrossRefGoogle Scholar
  22. 22.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Proc. NIPS, Lake Tahoe, pp 1097–1105Google Scholar
  23. 23.
    Leichter I (2012) Mean shift trackers with cross-bin metrics. IEEE Trans Pattern Anal Mach Intell 34(4):695–706CrossRefGoogle Scholar
  24. 24.
    Li X, Hu W, Shen C, Zhang Z, Dick A, Van Den Hengel A (2013) A survey of appearance models in visual object tracking. ACM Trans Intell Syst Technol 4:1–58Google Scholar
  25. 25.
    Li H, Li Y, Porikli F (2014) Robust online visual tracking with a single convolutional neural network. Proc. 12th Asian Conf. Comput. Vis, Singapore, pp 194–209Google Scholar
  26. 26.
    Li H, Shen C, Shi Q (2011) Real-time visual tracking using compressive sensing. Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Colorado Springs, pp 1305–1312Google Scholar
  27. 27.
    Martinel N, Micheloni C, Foresti LG (2015) The evolution of neural learning systems: a novel architecture combining the strengths of NTs, CNNs, and ELMs. IEEE Systems, Man, Cybernetics Magazine 7:17–26CrossRefGoogle Scholar
  28. 28.
    Mei X, Ling H (2011) Robust visual tracking and vehicle classification via sparse representation. IEEE Trans Pattern Anal Mach Intell 33(11):2259–2272CrossRefGoogle Scholar
  29. 29.
    Ross DA, Lim J, Lin R-S, Yang M-H (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1):125–141CrossRefGoogle Scholar
  30. 30.
    Serre T, Wolf L, Beleschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426CrossRefGoogle Scholar
  31. 31.
    Shen C, Brooks MJ, Van den Hengel A (2007) Fast global kernel density mode seeking: Applications to localization and tracking. IEEE Trans Image Process 16(5):1457–1469MathSciNetCrossRefGoogle Scholar
  32. 32.
    Song H (2014) Robust visual tracking via online informative feature selection. Electron Lett 50(25):1931–1933CrossRefGoogle Scholar
  33. 33.
    Song J, Gao L, Nie F, Shen HT et al (2016) Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation. IEEE Trans Image Process 25(11):4999–5011MathSciNetCrossRefGoogle Scholar
  34. 34.
    Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder. IEEE Trans Image Process 27(7):3210–3221MathSciNetCrossRefGoogle Scholar
  35. 35.
    Wang X, Gao L, Song J, Zhen X, Sebe N, Shen HT (2018) Deep appearance and motion learning for egocentric activity recognition. Neurocomputing 275:438–447CrossRefGoogle Scholar
  36. 36.
    Wang X, Gao L, Wang P, Sun X, Liu X (2018) Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length. IEEE Trans on Multimedia 20(3):634–644CrossRefGoogle Scholar
  37. 37.
    Wang L, Liu T, Wang G, Chan KL, Yang Q (2015) Video tracking using learned hierarchical features. IEEE Trans Image Process 24(4):1424–1435MathSciNetCrossRefGoogle Scholar
  38. 38.
    Wang B, Tang L, Yang J, Zhao B, Wang S (2015) Visual tracking based on extreme learning machine and sparse representation. Sensors 15:26877–26905CrossRefGoogle Scholar
  39. 39.
    Wang N, Yeung D-Y (2013) Learning a deep compact image representation for visual tracking. Proc. Adv. Neural Inf. Process. Syst, Lake Tahoe, pp 809–817Google Scholar
  40. 40.
    Wen L, Cai Z, Lei Z, Yi D, Li SZ (2014) Robust online learned spatio-temporal context model for visual tracking. IEEE Trans Image Process 23(2):785–796MathSciNetCrossRefGoogle Scholar
  41. 41.
    Weng Q, Mao Z, Lin J, Guo W (2017) Land-Use Classification via Extreme Learning Classifier Based on Deep Convolutional Features. IEEE Geosci Remote Sens Lett 14(5):704–708CrossRefGoogle Scholar
  42. 42.
    Xing J, Gao J, Li B, Hu W, Yan S (2013) Robust object tracking with online multi-lifespan dictionary learning. in Proc. IEEE Int. Conf. Computer Vision (ICCV), pp. 665–672Google Scholar
  43. 43.
    Yang Y, Wang Y, Yuan X (2012) Bidirectional extreme learning machine for regression problem and its learning effectiveness. IEEE Trans Neural Network Learning System 23(9):1498–1505CrossRefGoogle Scholar
  44. 44.
    Yilmaz A, Javed O, Shah M (2006) Object tracking: A survey. ACM Comput Surv 38(4):1–45CrossRefGoogle Scholar
  45. 45.
    Yoo Y, Oh SY (2016) Fast Training of Convolutional Neural Network Classifiers through Extreme Learning machines. in Proc. IEEE Int. Conf. Neural Networks, pp. 1702–1708Google Scholar
  46. 46.
    Zhang T, Ghanem B, Liu S, Ahuja N (2013) Robust visual tracking via structured multi-task sparse learning. Int J Comput Vis 101(2):367–383MathSciNetCrossRefGoogle Scholar
  47. 47.
    Zhang S, Lan X, Qi Y, Yuen PC (2017) Robust Visual Tracking via Basis Matching. IEEE Trans. Circuits and systems for video technology 27(3):421–430CrossRefGoogle Scholar
  48. 48.
    Zhang D, Maei H, Wang X, Wang YF (2017) Deep Reinforcement Learning for Visual Object Tracking in Videos. arXiv:1701.08936v2Google Scholar
  49. 49.
    Zhang S, Yao H, Sun X, Lu X (2013) Sparse coding based visual tracking: Review and experimental comparison. Pattern Recogn 46(7):1772–1788CrossRefGoogle Scholar
  50. 50.
    Zhang K, Zhang L, Yang M-H (2012) Real-time compressive tracking. Proc. ECCV, Florence, pp 864–877Google Scholar
  51. 51.
    Zhong W, Lu H, Yang MH (2012) Robust object tracking via sparsity based collaborative model. Proc. IEEE Conf. Computer Visual Pattern Recognition, Providence, pp 1838–1845Google Scholar
  52. 52.
    Zhong W, Lu H, Yang M-H (2014) Robust object tracking via sparse collaborative appearance model. IEEE Trans Image Process 23(5):2356–2368MathSciNetCrossRefGoogle Scholar
  53. 53.
    Zhou SK, Chellappa R, Moghaddam B (2004) Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Trans Image Process 13(11):1491–1506CrossRefGoogle Scholar
  54. 54.
    Zhou T, Lu Y, Di H (2017) Locality-Constrained Collaborative Model for Robust Visual Tracking. IEEE Trans. Circuits and Systems for Video Technology 27(2):313–325CrossRefGoogle Scholar
  55. 55.
    Zhou X, Xie L, Zhang P, Zhang Y (2015) An ensemble of deep neural networks for object tracking. Proc IEEE Int Conf Image Process:843–847Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer and InformationHefei University of TechnologyHefeiPeople’s Republic of China
  2. 2.Anhui Province Key Laboratory of Industry Safety and Emergency TechnologyHefeiPeople’s Republic of China

Personalised recommendations