What Do I See? Modeling Human Visual Perception for Multi-person Tracking

  • Xu Yan
  • Ioannis A. Kakadiaris
  • Shishir K. Shah
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


This paper presents a novel approach for multi-person tracking utilizing a model motivated by the human vision system. The model predicts human motion based on modeling of perceived information. An attention map is designed to mimic human reasoning that integrates both spatial and temporal information. The spatial component addresses human attention allocation to different areas in a scene and is represented using a retinal mapping based on the log-polar transformation while the temporal component denotes the human attention allocation to subjects with different motion velocity and is modeled as a static-dynamic attention map. With the static-dynamic attention map and retinal mapping, attention driven motion of the tracked target is estimated with a center-surround search mechanism. This perception based motion model is integrated into a data association tracking framework with appearance and motion features. The proposed algorithm tracks a large number of subjects in complex scenes and the evaluation on public datasets show promising improvements over state-of-the-art methods.


View Image Human Detection Virtual Scene Crowd Simulation Attentive Vision Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Achanta, R., Susstrunk, S.: Saliency detection for content-aware image resizing. In: Proc. ICIP, pp. 1005–1008 (2009)Google Scholar
  2. 2.
    Ali, S., Shah, M.: Floor fields for tracking in high density crowd scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 1–14. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Andriyenko, A., Schindler, K.: Multi-target tracking by continuous energy minimization. In: Proc. CVPR, pp. 1265–1272 (2011)Google Scholar
  4. 4.
    Andriyenko, A., Schindler, K., Roth, S.: Discrete-continuous optimization for multi-target tracking. In: Proc. CVPR, pp. 1926–1933 (2012)Google Scholar
  5. 5.
    Bazzani, L., Cristani, M., Murino, V.: Decentralized particle filter for joint individual-group tracking. In: Proc. CVPR, pp. 1886–1893 (2012)Google Scholar
  6. 6.
    Benfold, B., Reid, I.: Guiding visual surveillance by tracking human attention. In: Proc. BMVC, pp. 1–11 (2009)Google Scholar
  7. 7.
    Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: Proc. CVPR, pp. 3547–3464 (2011)Google Scholar
  8. 8.
    Bera, A., Manocha, D.: Realtime multilevel crowd tracking using reciprocal velocity obstacles. CoRR abs/1402.2826 (2014)Google Scholar
  9. 9.
    Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Gool, L.V.: Robust tracking-by-detection using a detector confidence particle filter. In: Proc. ICCV, pp. 1515–1522 (2009)Google Scholar
  10. 10.
    Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum weight independent set. In: Proc. CVPR, pp. 1273–1280 (2011)Google Scholar
  11. 11.
    Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE T-PAMI 33(3), 500–513 (2011)CrossRefGoogle Scholar
  12. 12.
    Burkard, R., Dell’Amico, M., Martello, S.: Assignment Problems. Society for Industrial and Applied Mathematics, Philadelphia (2009)CrossRefzbMATHGoogle Scholar
  13. 13.
    Butt, A.A., Collins, R.T.: Multi-target tracking by lagrangian relaxation to min-cost network flow. In: Proc. CVPR, pp. 1846–1853 (2013)Google Scholar
  14. 14.
    Choi, W., Savarese, S.: Multiple target tracking in world coordinate with single, minimally calibrated camera. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 553–567. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Filipe, S., Alexandre, L.A.: From the human visual system to the computational models of visual attention: a survey. Artificial Intelligence Review 39(1), 1–47 (2013)CrossRefGoogle Scholar
  16. 16.
    Grabner, H., Bischof, H.: On-line boosting and vision. In: Proc. CVPR, pp. 260–267 (2006)Google Scholar
  17. 17.
    Hari, R., Kujala, M.V.: Brain basis of human social interaction: From concepts ot brain imaging. Physiological Reviews 89(2), 453–479 (2009)CrossRefGoogle Scholar
  18. 18.
    Kim, S., Guy, S.J., Liu, W., Lau, R.W.H., Lin, M.C., Manocha, D.: Predicting pedestrian trajectories using velocity-space reasoning. In: Proc. WAFR, pp. 609–623 (2012)Google Scholar
  19. 19.
    Koch, C., Ullman, S.: Shifts in slective visual attention: Towards the underlying neural circuitry. Human Neurbiology 4, 219–227 (1985)Google Scholar
  20. 20.
    Kuo, C., Nevatia, R.: How does person identity recognition help multi-person tracking? In: CVPR, pp. 1217–1224 (2011)Google Scholar
  21. 21.
    Lee, K.H., Choi, M.G., Hong, Q., Lee, J.: Group behavior from video: A data-driven approach to crowd simulation. In: Proc. SCA, pp. 109–118 (2007)Google Scholar
  22. 22.
    Li, Y., Huang, C., Nevatia, R.: Learning to associate: Hybridboosted multi-target tracker for crowded scene. In: Proc. CVPR, pp. 2953–2960 (2009)Google Scholar
  23. 23.
    Liu, W., Chan, A.B., Lau, R.W.H., Manocha, D.: Leveraging long-term predictions and online-learning in agent-based multiple person tracking. CoRR abs/1402.2016 (2014)Google Scholar
  24. 24.
    Luber, M., Stork, J., Tipaldi, G., Arras, K.: People tracking with human motion prediction from social forces. In: Proc. ICRA, pp. 464–469 (2010)Google Scholar
  25. 25.
    Ma, Y., Zhang, H.: Contrast-based image attention analysis by using fuzzy growing. In: Proc. International Conference on Multimedia, pp. 374–281 (2003)Google Scholar
  26. 26.
    Mei, X., Ling, H.: Robust visual tracking using l 1 minimization. In: Proc. ICCV, pp. 1436–1443 (2009)Google Scholar
  27. 27.
    Ondrej, J., Pettré, J., Olivier, A.H., Donikian, S.: A Synthetic-Vision Based Steering Approach for Crowd Simulation. In: Proc. SIGGRAPH, pp. 123:1–123:9 (2010)Google Scholar
  28. 28.
    Ouerhani, N.: Visual attention: from bio-inspired modeling to real-time implementation. Ph.D. thesis, Univeristy of Neuchâtel, Switzerland (2003)Google Scholar
  29. 29.
    Pellegrini, S., Ess, A.: K.Schindler, van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: Proc. ICCV, pp. 261–268 (2009)Google Scholar
  30. 30.
    Piccardi, M.: Background subtraction techniques: a review. In: Proc. IEEE conference on Systems, Man and Cybernetics, pp. 3099–3104 (2004)Google Scholar
  31. 31.
    Qin, Z., Shelton, C.R.: Improving multi-target tracking via social grouping. In: Proc. CVPR, pp. 1972–1978 (2012)Google Scholar
  32. 32.
    Qureshi, F., Terzopoulos, D.: Smart camera networks in virtual reality. In: Proc. International Conference on Distributed Smart Cameras, pp. 87–94 (2007)Google Scholar
  33. 33.
    Ross, D.A., Lim, J., Lin, R., Yang, M.: Incremental learning for robust visual tracking. IJCV 77(1), 125–141 (2008)CrossRefGoogle Scholar
  34. 34.
    Schwartz, E.L., Greve, D.N., Bonmassar, G.: Space-variant active vision: Definition, overview and examples. Neural Networks 8(7), 1297–1308 (1995)CrossRefGoogle Scholar
  35. 35.
    Song, B., Jeng, T.-Y., Staudt, E., Roy-Chowdhury, A.K.: A stochastic graph evolution framework for robust multi-target tracking. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 605–619. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  36. 36.
    Thiebaux, M., Marshall, A., Marsella, S., Kallman, M.: Smartbody: Behavior realization for embodied conversational agents. In: Proc. AAMAS, pp. 1151–1158 (2008)Google Scholar
  37. 37.
    Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Robotics and Autonomous Systems 58(4), 378–398 (2010)CrossRefGoogle Scholar
  38. 38.
    Treuille, A., Cooper, S., Popović, Z.: Continuum crowds. In: Proc. SIGGRAPH, pp. 1160–1168 (2006)Google Scholar
  39. 39.
    Yan, X., Kakadiaris, I., Shah, S.: Predicting social interactions for visual tracking. In: Proc. BMVC, pp. 102.1–102.11 (2011)Google Scholar
  40. 40.
    Yang, B., Nevatia, R.: An online learned CRF model for multi-target tracking. In: Proc. CVPR, pp. 2034–2041 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Xu Yan
    • 1
  • Ioannis A. Kakadiaris
    • 1
  • Shishir K. Shah
    • 1
  1. 1.Department of Computer ScienceUniversity of HoustonHoustonUSA

Personalised recommendations