Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 5, pp 5881–5918 | Cite as

Brain programming as a new strategy to create visual routines for object tracking

Towards automation of video tracking design
  • Gustavo OlagueEmail author
  • Daniel E. Hernández
  • Paul Llamas
  • Eddie Clemente
  • José L. Briseño
Article
  • 197 Downloads

Abstract

This work describes the use of brain programming for automating the video tracking design process. The challenge is that of creating visual programs that learn to detect a toy dinosaur from a database while tested in a visual-tracking scenario. When planning an object tracking system, two sub-tasks need to be approached: detection of moving objects in each frame and correct association of detection to the same object over time. Visual attention is a skill performed by the brain whose functionality is to perceive salient visual features. The automatic design of visual attention programs through an optimization paradigm is applied to the detection-based tracking of objects in a video from a moving camera. A system based on the acquisition and integration steps of the natural dorsal stream was engineered to emulate its selectivity and goal-driven behavior useful to the task of tracking objects. This is considered a challenging problem since many difficulties can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid structures, object-to-object and object-to-scene occlusions, as well as camera motion, models, and parameters. Tracking relies on the quality of the detection process and automatically designing such stage could significantly improve tracking methods. Experimental results confirm the validity of our approach using three different kinds of robotic systems. Moreover, a comparison with the method of regions with convolutional neural networks is provided to illustrate the benefit of the approach.

Keywords

Artificial dorsal stream Deep genetic programming Evolutionary computer vision Visual tracking Focus of attention Deep learning 

Notes

Acknowledgements

This research was funded by CICESE through Project 634-128 – “Programación cerebral aplicada al estudio del pensamiento y la visión”. In addition, the authors acknowledge the valuable comments of the anonymous reviewers, the Editor of Multimedia Tools and Applications, and the International Editorial Board whose enthusiasm is gladly appreciated.

References

  1. 1.
    Ali A, Aggarwal JK (2001) Segmentation and recognition of continuous human activity. In: Proceedings of IEEE workshop on detection and recognition of events in video, pp 28–35. https://ieeexplore.ieee.org/document/938863/
  2. 2.
    Amazon Web Service. Amazon AI. https://aws.amazon.com/machine-learning/
  3. 3.
    Avidan S (2004) Support vector tracking. IEEE Trans Pattern Anal Mach Intell 26(8):1064–1072. https://ieeexplore.ieee.org/document/1307012/ CrossRefGoogle Scholar
  4. 4.
    Bensebaa Amina, Larabi Slimane (2018) Direction estimation of moving pedestrian groups for intelligent vehicles. Vis Comput 34(6–8):1109–1118.  https://doi.org/10.1007/s00371-018-1520-z CrossRefGoogle Scholar
  5. 5.
    Black MJ, Jepson AD (1998) Eigentracking: robust matching and tracking of articulated objects using a view-based representation. Int J Comput Vis 26(1):63–84. https://link.springer.com/article/10.1023/A:1007939232436 CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Chen S, Li Y, Kwok NM (2011) Active vision in robotic systems: a survey of recent developments. Int J Robot Res 30(11):1343–1377. http://journals.sagepub.com/doi/abs/10.1177/0278364911410755 CrossRefGoogle Scholar
  8. 8.
    Choudhury SK, Sa PK, Padhy RP, Sharma S, Bakshi S (2018) Improved pedestrian detection using motion segmentation and silhouette orientation. Multimed Tools Appl 17(1):13075–13114.  https://doi.org/10.1007/s11042-017-4933-1 CrossRefGoogle Scholar
  9. 9.
    Clemente E, Olague G, Dozal L, Mancilla M (2012) Object recognition with an optimized ventral stream model using genetic programming. Appl Evol Comput LNCS 7248:315–325.  https://doi.org/10.1007/978-3-642-29178-4_32 CrossRefGoogle Scholar
  10. 10.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://ieeexplore.ieee.org/document/1000236/ CrossRefGoogle Scholar
  11. 11.
    Cremers D, Schnȯrr C (2003) Statistical shape knowledge in variational motion segmentation. Image Vis Comput 21(1):77–86. https://www.sciencedirect.com/science/article/pii/S0262885602001282 CrossRefGoogle Scholar
  12. 12.
  13. 13.
  14. 14.
    Deng J, Dong W, Socher R, Li L-J, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 248–255. https://ieeexplore.ieee.org/document/5206848/
  15. 15.
    Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Revue Neurosci 18:193–222. https://www.ncbi.nlm.nih.gov/pubmed/7605061 CrossRefGoogle Scholar
  16. 16.
    Dozal L, Olague G, Clemente E, Sánchez M (2012) Evolving visual attention programs through EVO features. Appl Evol Comput LNCS 7248:326–335.  https://doi.org/10.1007/978-3-642-29178-4_33 CrossRefGoogle Scholar
  17. 17.
    Dozal L, Olague G, Clemente, Hernández DE (2014) Brain programming for the evolution of an artificial dorsal stream. Cogn Comput 6(3):528–557.  https://doi.org/10.1007/s12559-014-9251-6 CrossRefGoogle Scholar
  18. 18.
    Fan J, Wu Y, Dai S (2010) Discriminative spatial attention for robust tracking. Springer, Berlin, pp 480–493. https://link.springer.com/chapter/10.1007/978-3-642-15549-9_35 Google Scholar
  19. 19.
    Fieguth P, Terzopoulos D (1997) Color-based tracking of heads and other mobile objects at video frame rates. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 21–27. https://ieeexplore.ieee.org/document/609292/
  20. 20.
    Fukushima K (1975) Cognitron: a self-organizing multilayered neural network. Biol Cybern 20(6):121–136.  https://doi.org/10.1007/BF00342633 CrossRefGoogle Scholar
  21. 21.
    Fukushima K (1980) Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36 (4):193–202.  https://doi.org/10.1007/BF00344251 CrossRefzbMATHGoogle Scholar
  22. 22.
    Girshick R, Donahue J, Darrel T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 580–587. https://ieeexplore.ieee.org/document/6909475/
  23. 23.
    Google Cloud Machine Learning. https://cloud.google.com/products/ai/
  24. 24.
    Google TensorFlow. https://www.tensorflow.org
  25. 25.
    Hernández DE, Olague G, Clemente E, Dozal L (2012) Evolving a conspicuous point detector based on an artificial dorsal stream: SLAM system. Gen Evol Comput Conf, 1087–1094. https://dl.acm.org/citation.cfm?doid=2330163.2330314
  26. 26.
    Hernández D, Olague G, Clemente E, Dozal L (2012) Evolutionary purposive or behavioral vision for camera trajectory estimation. Appl Evol Comput LNCS 7248:336–345.  https://doi.org/10.1007/978-3-642-29178-4_34 CrossRefGoogle Scholar
  27. 27.
    Hernández DE, Clemente E, Olague G, Briseṅo JL (2016) Evolutionary multi-objective visual cortex for object classification in natural images. J Comput Sci 17:216–233.  https://doi.org/10.1016/j.jocs.2015.10.011 CrossRefGoogle Scholar
  28. 28.
    Hernández DE, Olague G, Hernández B, Clemente E (2017) CUDA-based parallelization of a bio-inspired model for fast object classification. Neural Comput Appl, 1–12. Available online https://link.springer.com/article/10.1007/s00521-017-2873-3
  29. 29.
    Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern Part C (Appl Rev) 34(3):334–352. https://ieeexplore.ieee.org/document/1310448/ CrossRefGoogle Scholar
  30. 30.
    Hubel DH (1982) Exploration of the primary visual cortex, 1955-78. Nature 299:515–524.  https://doi.org/10.1038/299515a0 CrossRefGoogle Scholar
  31. 31.
    Hubel DH, Wiesel TN (1959) Receptive fields of single neurones in the cat’s striate cortex. J Physiol 148(3):574–591.  https://doi.org/10.1113/jphysiol.1959.sp006308 CrossRefGoogle Scholar
  32. 32.
  33. 33.
    Intille SS, Davis JW, Bobick AF (1997) Real-time closed-world tracking. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 697–703. https://ieeexplore.ieee.org/document/609402/
  34. 34.
    Isard M, Blake A (1998) Condensation – conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28. https://link.springer.com/article/10.1023/A:1008078328650 CrossRefGoogle Scholar
  35. 35.
    Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203. https://www.nature.com/articles/35058500 CrossRefGoogle Scholar
  36. 36.
    Kang Jinman, Cohen I, Medioni G (2003) Continuous tracking within and across camera streams. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 267–272. https://ieeexplore.ieee.org/document/1211363/
  37. 37.
    Kim K, Davis LS (2011) Object detection and tracking for intelligent video surveillance. Springer, Berlin, pp 265–288. https://link.springer.com/chapter/10.1007 Google Scholar
  38. 38.
    Ko T (2011) A survey on behaviour analysis in video surveillance applications, chapter 16, pp 279–294 InTech. https://www.intechopen.com/books/video-surveillance/a-survey-on-behavior-analysis-in-video-surveillance-applications
  39. 39.
    Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiol 4(4):219–227. Reprinted in Matters of Intelligence, pp. 115–141, 1987. https://link.springer.com/chapter/10.1007/978-94-009-3833-5_5 Google Scholar
  40. 40.
    Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report, https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf
  41. 41.
    LeCun Y, Bottou L, Bengio Ya, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://ieeexplore.ieee.org/document/726791/ CrossRefGoogle Scholar
  42. 42.
    Li B, Chellappa R, Zheng Q, Der SZ (2001) Model-based temporal object verification using video. IEEE Trans Image Process 10(6):897–908. https://ieeexplore.ieee.org/document/923286/ CrossRefGoogle Scholar
  43. 43.
    Li Z, Wang W, Wang Y, Chen F, Yi W (2013) Visual tracking by proto-objects. Pattern Recogn 46(8):2187–2201. https://www.sciencedirect.com/science/article/pii/S0031320313000575 CrossRefGoogle Scholar
  44. 44.
    Ma L, Cheng J, Liu J, Wang J, Lu H (2010) Visual attention model based object tracking. Springer, Berlin, pp 483–493. https://link.springer.com/chapter/10.1007/978-3-642-15696-0_45 Google Scholar
  45. 45.
    Mahadevan V, Vasconcelos N (2009) Saliency-based discriminant tracking. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1007–1013. https://ieeexplore.ieee.org/document/5206573/
  46. 46.
    Mancas M, Ferrera VPP, Riche N, Taylor JGG (eds) (2016) From human attention to computational attention: a multidisciplinary approach, volume 10 springer series in cognitive and neural systems. Springer. https://www.springer.com/gp/book/9781493934331
  47. 47.
  48. 48.
  49. 49.
    Nanda A, Sa PK, Choudhury SK, Bakshi S, Majhi B (2017) A neuromorphic person re-identification framework for video surveillance. IEEE Access 5:6471–6482. https://ieeexplore.ieee.org/document/7885600/ Google Scholar
  50. 50.
    Nanda A, Chauhan DS, Sa PK, Bakshi S (2018) Illumination and scale invariant relevant visual features with hypergraph-based learning for multi-shot person re-identification. Multimed Tools Appl, 1–26. First online  https://doi.org/10.1007/s11042-017-4875-7 CrossRefGoogle Scholar
  51. 51.
    Olague G (2016) Evolutionary computer vision – the first footprints. Springer. https://www.springer.com/gp/book/9783662436929
  52. 52.
    Olague G, Clemente E, Dozal L, Hernández DE (2014) Evolving an artificial visual cortex for object recognition with brain programming. In: Schütze O et al. (eds) EVOLVE – a bridge between probability set oriented numerics and evolutionary computation III, volume 500 of studies in computational intelligence, pp 97–119. https://link.springer.com/chapter/10.1007/978-3-319-01460-9_5 CrossRefGoogle Scholar
  53. 53.
    Olague G, Hernández DE, Clemente E, Chan-Ley M (2018) Evolving head tracking routines with brain programming. IEEE Access 6:26254–26270.  https://doi.org/10.1109/ACCESS.2018.2831633 CrossRefGoogle Scholar
  54. 54.
    Osaka N, Rentschler I, Biederman I (eds) (2007) Object recognition attention, and action. Springer. https://www.springer.com/gp/book/9784431730187
  55. 55.
    Ouerhani N, Hügli H (2003) A model of dynamic visual attention for object tracking in natural image sequences. Springer, Berlin, pp 702–709. https://link.springer.com/chapter/10.1007/3-540-44868-3_89 zbMATHGoogle Scholar
  56. 56.
    Park S, Aggarwal JK (2004) A hierarchical Bayesian network for event recognition of human actions and interactions. Multimed Syst 10(2):164–179. https://link.springer.com/article/10.1007/s00530-004-0148-1 CrossRefGoogle Scholar
  57. 57.
    Posner MI, Snyder CR, Davidson BJ (1980) Attention and the detection of signals. J Exp Psychol 109(2):160–174. https://www.ncbi.nlm.nih.gov/pubmed/7381367 CrossRefGoogle Scholar
  58. 58.
  59. 59.
    Rangarajan K, Shah M (1991) Establishing motion correspondence. CVGIP: Image Understand 54(1):56–73. https://ieeexplore.ieee.org/document/139669/ CrossRefGoogle Scholar
  60. 60.
    Rasool Reddy K, Hari Priya K, Neelima N (2015) Object detection and tracking – a survey. In: 2015 International conference on computational intelligence and communication networks (CICN), pp 418–421. https://ieeexplore.ieee.org/document/7546127/
  61. 61.
    Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nature 2:1019–1025.  https://doi.org/10.1038/14819 CrossRefGoogle Scholar
  62. 62.
    Rout JK, Singh S, Jena SK, Bakshi S (2017) Deceptive review detection using labeled and unlabeled data. Multimed Tools Appl 76(3):3187–3211. https://link.springer.com/article/10.1007/s11042-016-3819-y CrossRefGoogle Scholar
  63. 63.
    Schweitzer H, Bell JW, Wu F (2002) Very fast template matching. In: European conference on computer vision, vol LNCS 2353, pp 358–372, https://link.springer.com/chapter/10.1007/3-540-47979-1_24 CrossRefGoogle Scholar
  64. 64.
    Serby D, Meier EK, van Gool L (2004) Probabilistic object tracking using multiple features. In: Proceedings of the 17th international conference on pattern recognition, ICPR, vol 2. IEEE, pp 184–187. https://ieeexplore.ieee.org/document/1334091/
  65. 65.
    Shafique K, Shah M (2005) A noniterative greedy algorithm for multiframe point correspondence. IEEE Trans Pattern Anal Mach Intell 27(1):51–65. https://ieeexplore.ieee.org/document/1359751/ CrossRefGoogle Scholar
  66. 66.
    Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://ieeexplore.ieee.org/document/6671560/ CrossRefGoogle Scholar
  67. 67.
  68. 68.
    Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cognitive Psychology. https://www.sciencedirect.com/science/article/pii/0010028580900055
  69. 69.
    Ungerleider LG, Haxby JV (1994) ‘What’ and ‘where’ in the human brain. Curr Opin Neurobiol 4(2):157–165. https://www.ncbi.nlm.nih.gov/pubmed/8038571 CrossRefGoogle Scholar
  70. 70.
    Vaswani N, Roy Chowdhury A, Chellappa R (2003) Activity recognition using the dynamics of the configuration of interacting objects. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 633–640. https://ieeexplore.ieee.org/abstract/document/1211526/
  71. 71.
    Veenman CJ, Reinders MJT, Backer E (2001) Resolving motion correspondence for densely moving points. IEEE Trans Pattern Anal Mach Intell 23 (1):54–72. https://ieeexplore.ieee.org/document/899946/ CrossRefGoogle Scholar
  72. 72.
    Wolfe JM (2000) Visual attention. In: de Valois KK (ed) Seeing (handbook of perception and cognition), Chapter 8. Academic Press, pp 335–386. https://www.sciencedirect.com/science/article/pii/B9780124437609500106 CrossRefGoogle Scholar
  73. 73.
    Yilmaz A, Li Xin, Shah M (2004) Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):1531–1536. https://ieeexplore.ieee.org/document/1335457/ CrossRefGoogle Scholar
  74. 74.
    Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv, 38(4).  https://doi.org/10.1145/1177352.1177355 CrossRefGoogle Scholar
  75. 75.
    Zang Q, Klette R (2003) Object classification and tracking in video surveillance. Springer, Berlin, pp 198–205. https://link.springer.com/chapter/10.1007/978-3-540-45179-2_25 Google Scholar
  76. 76.
    Zhao Q (ed) (2017) Computational and cognitive neuroscience of vision, cognitive science and technology series. Springer. https://www.springer.com/gp/book/9789811002113 Google Scholar
  77. 77.
    Zhou SK, Chellappa R, Moghaddam B (2004) Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Trans Image Process 13(11):1491–1506. https://ieeexplore.ieee.org/document/1344039/ CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CICESE, Applied Physics DivisionEnsenadaMéxico
  2. 2.TecNM - Instituto Tecnológico de TijuanaTijuanaMéxico
  3. 3.TecNM - Instituto Tecnológico de EnsenadaEnsenadaMéxico

Personalised recommendations