Appearances Can Be Deceiving: Learning Visual Tracking from Few Trajectory Annotations

  • Santiago Manen
  • Junseok Kwon
  • Matthieu Guillaumin
  • Luc Van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)


Visual tracking is the task of estimating the trajectory of an object in a video given its initial location. This is usually done by combining at each step an appearance and a motion model. In this work, we learn from a small set of training trajectory annotations how the objects in the scene typically move. We learn the relative weight between the appearance and the motion model. We call this weight: visual deceptiveness. At test time, we transfer the deceptiveness and the displacement from the closest trajectory annotation to infer the next location of the object. Further, we condition the transference on an event model. On a set of 161 manually annotated test trajectories, we show in our experiments that learning from just 10 trajectory annotations halves the center location error and improves the success rate by about 10%.


Visual tracking Motion learning Event modelling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-319-10602-1_11_MOESM1_ESM.mp4 (14.5 mb)
Electronic Supplementary Material (MP4 14,876 KB)
978-3-319-10602-1_11_MOESM2_ESM.pdf (204 kb)
Electronic Supplementary Material (PDF 205 KB)


  1. 1.
    Ali, S., Shah, M.: Floor fields for tracking in high density crowd scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 1–14. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Cifuentes, C.G., Sturzel, M., Jurie, F., Brostow, G.J.: Motion models that only work sometimes. In: BMVC (2012)Google Scholar
  3. 3.
    Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. PAMI 27(10), 1631–1643 (2005)CrossRefGoogle Scholar
  4. 4.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC2012) Results (2012),
  5. 5.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Hare, S., Saffari, A., Torr, P.H.S.: Struck: Structured output tracking with kernels. In: ICCV (2011)Google Scholar
  7. 7.
    Jia, X., Lu, H., Yang, M.-H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR (2012)Google Scholar
  8. 8.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. PAMI 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  9. 9.
    Kalman, R.E.: A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering (1960)Google Scholar
  10. 10.
    Khan, Z., Balch, T., Dellaert, F.: MCMC-based particle filtering for tracking a variable number of interacting targets. PAMI 27(11), 1805–1918 (2005)CrossRefGoogle Scholar
  11. 11.
    Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. PAMI (2006)Google Scholar
  13. 13.
    Kuettel, D., Breitenstein, M.D., Van Gool, L., Ferrari, V.: What’s going on? Discovering spatio-temporal dependencies in dynamic scenes. In: CVPR (2010)Google Scholar
  14. 14.
    Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: ICCV (2011)Google Scholar
  15. 15.
    Leibe, B., Schindler, K., Cornelis, N., Van Gool, L.J.: Coupled object detection and tracking from static cameras and moving vehicles. PAMI 30(10), 1683–1698 (2008)CrossRefGoogle Scholar
  16. 16.
    Li, X., Dick, A., Wang, H., Shen, C., van den Hengel, A.: Graph mode-based contextual kernels for robust svm tracking. In: ICCV (2011)Google Scholar
  17. 17.
    Liu, J., Carr, P., Collins, R.T., Liu, Y.: Tracking sports players with context-conditioned motion models. In: CVPR (2013)Google Scholar
  18. 18.
    Mei, X., Ling, H.: Robust visual tracking using l1 minimization. In: ICCV (2009)Google Scholar
  19. 19.
    Oron, S., Bar-Hillel, A., Levi, D., Avidan, S.: Locally orderless tracking. In: CVPR (2012)Google Scholar
  20. 20.
    Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based probabilistic tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  21. 21.
    Prosser, P.: Hybrid algorithms for the constraint satisfaction problem. In: Computational Intelligence (1993)Google Scholar
  22. 22.
    Rodriguez, M., Ali, S., Kanade, T.: Tracking in unstructured crowded scenes. In: ICCV (2009)Google Scholar
  23. 23.
    Segal, A.V., Reid, I.D.: Latent data association: Bayesian model selection for multi-target tracking. In: ICCV (2013)Google Scholar
  24. 24.
    Smith, K., Carleton, A., Lepetit, V.: General constraints for batch multiple-target tracking applied to large-scale videomicroscopy. In: CVPR (2008)Google Scholar
  25. 25.
    Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: A benchmark. In: CVPR (2013)Google Scholar
  26. 26.
    Yang, B., Nevatia, R.: Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In: CVPR (2012)Google Scholar
  27. 27.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. In: ICASSP (2006)Google Scholar
  28. 28.
    Yuen, J., Russell, B.C., Liu, C., Torralba, A.: Labelme video: Building a video database with human annotations. In: ICCV (2009)Google Scholar
  29. 29.
    Zhang, T., Ghanem, B., Ahuja, N.: Robust multi-object tracking via cross-domain contextual information for sports video analysis. In: ICASSP (2012)Google Scholar
  30. 30.
    Zhao, X., Medioni, G.: Robust unsupervised motion pattern inference from video and applications. In: ICCV (2011)Google Scholar
  31. 31.
    Zhou, B., Wang, X., Tang, X.: Understanding collective crowd behaviors:learning a mixture model of dynamic pedestrian-agents. In: CVPR (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Santiago Manen
    • 1
  • Junseok Kwon
    • 1
  • Matthieu Guillaumin
    • 1
  • Luc Van Gool
    • 1
    • 2
  1. 1.Computer Vision LaboratoryETH ZurichSwitzerland
  2. 2.ESAT - PSI / IBBTK.U. LeuvenBelgium

Personalised recommendations