Tracking Using Motion Patterns for Very Crowded Scenes

  • Xuemei Zhao
  • Dian Gong
  • Gérard Medioni
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7573)


This paper proposes Motion Structure Tracker (MST) to solve the problem of tracking in very crowded structured scenes. It combines visual tracking, motion pattern learning and multi-target tracking. Tracking in crowded scenes is very challenging due to hundreds of similar objects, cluttered background, small object size, and occlusions. However, structured crowded scenes exhibit clear motion pattern(s), which provides rich prior information. In MST, tracking and detection are performed jointly, and motion pattern information is integrated in both steps to enforce scene structure constraint. MST is initially used to track a single target, and further extended to solve a simplified version of the multi-target tracking problem. Experiments are performed on real-world challenging sequences, and MST gives promising results. Our method significantly outperforms several state-of-the-art methods both in terms of track ratio and accuracy.


motion pattern tracking very crowded scenes 


  1. 1.
    Rodriguez, M., Ali, S., Kanade, T.: Tracking in unstructured crowded scenes. In: ICCV, pp. 1389–1396 (2009)Google Scholar
  2. 2.
    Ali, S., Shah, M.: Floor Fields for Tracking in High Density Crowd Scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 1–14. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Kratz, L., Nishino, K.: Tracking with local spatio-temporal motion patterns in extremely crowded scenes. In: CVPR, pp. 693–700 (2010)Google Scholar
  4. 4.
    Huang, C., Nevatia, R.: High performance object detection by collaborative learning of joint ranking of granules features. In: CVPR, pp. 41–48 (2010)Google Scholar
  5. 5.
    Yilmax, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Journal of Computing Surveys 38 (2006)Google Scholar
  6. 6.
    Ross, D., Lim, J., Lin, R., Yang, M.: Incremental learning for robust visual tracking. In: IJCV, pp. 125–141 (2008)Google Scholar
  7. 7.
    Kalal, Z., Matas, J., Mikolajczyk, K.: P-n learning: Bootstrapping binary classifiers by structural constraints. In: CVPR, pp. 49–56 (2010)Google Scholar
  8. 8.
    Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. In: CVPR, pp. 983–990 (2009)Google Scholar
  9. 9.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised On-Line Boosting for Robust Tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Mehran, R., Moore, B., Shah, M.: A streakline representation of flow in crowded scenes. In: ECCV, pp. 439–452 (2010)Google Scholar
  11. 11.
    Saleemi, I., Hartung, L., Shah, M.: Scene understanding by statistical modeling of motion patterns. In: CVPR, pp. 2069–2076 (2010)Google Scholar
  12. 12.
    Hu, M., Ali, S., Shah, M.: Learning motion patterns in crowded scenes using motion flow field. In: ICPR, pp. 1–5 (2011)Google Scholar
  13. 13.
    Zhao, X., Medioni, G.: Robust unsupervised motion pattern inference from video and applications. In: ICCV, pp. 715–722 (2011)Google Scholar
  14. 14.
    Zhou, B., Wang, X., Tang, X.: Random field topic model for semantic region analysis in crowded scenes from tracklets. In: CVPR, pp. 3441–3448 (2011)Google Scholar
  15. 15.
    Ali, S., Shah, M.: A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In: CVPR, pp. 1–6 (2007)Google Scholar
  16. 16.
    Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., Maybank, S.: A system for learning statistical motion patterns. In: PAMI, pp. 1450–1464 (2006)Google Scholar
  17. 17.
    Wang, X., Ma, X., Grimson, E.: Unsupervised activity perceptionby hierarchical bayesian models. In: CVPR, pp. 1–8 (2007)Google Scholar
  18. 18.
    Kuettel, D., Breitenstein, M.D., Gool, L.V., Ferrari, V.: What’s going on? discovering spatio-temporal dependencies in dynamic scenes. In: CVPR, pp. 1951–1958 (2010)Google Scholar
  19. 19.
    Rodriguez, M., Sivic, J., Laptev, I., Audibert, J.: Data-driven crowd analysis in videos. In: ICCV, pp. 1235–1242 (2011)Google Scholar
  20. 20.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR, pp. 1–8 (2008)Google Scholar
  21. 21.
    Huang, C., Wu, B., Nevatia, R.: Robust Object Tracking by Hierarchical Association of Detection Responses. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 788–801. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Prokaj, J., Medioni, G.: Inferring tracklets for multi-object tracking. In: Workshop of Aerial Video Processing Joint with IEEE CVPR, pp. 37–44 (2011)Google Scholar
  23. 23.
    Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: CVPR, pp. 1–8 (2008)Google Scholar
  24. 24.
    Rodriguez, M., Sivic, J., Laptev, I., Audibert, J.: Density-aware person detection and tracking in crowds. In: ICCV, pp. 2423–2430 (2011)Google Scholar
  25. 25.
    Tomasi, C., Kanade, T.: Detection and tracking of point features. In: IJCV (1991)Google Scholar
  26. 26.
    Mordohai, P., Medioni, G.: Dimensionality estimation, manifold learning and function approximation using tensor voting. JMLR 11, 411–450 (2010)zbMATHMathSciNetGoogle Scholar
  27. 27.
    Coifman, R.R., Lafon, S., Lee, A., Maggioni, M., Nadler, B., Warner, F., Zucker, S.: Geometric diffusion as a tool for harmonic analysis and structure definition of data, part i: Diffusion maps. The National Academy of Sciences (2005)Google Scholar
  28. 28.
    Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.: Ranking on data manifolds. In: NIPS (2004)Google Scholar
  29. 29.
    Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. In: PAMI, pp. 448–461 (2010)Google Scholar
  30. 30.
    Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: ICCV, pp. 1–8 (2007)Google Scholar
  31. 31.
    Oshin, O., Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using randomised ferns. In: ICCV, pp. 530–537 (2009)Google Scholar
  32. 32.
    Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. IJCAI, 674–679 (1981)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Xuemei Zhao
    • 1
  • Dian Gong
    • 1
  • Gérard Medioni
    • 1
  1. 1.Institute for Robotics and Intelligent SystemsUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations