The Visual Computer

, Volume 24, Issue 7–9, pp 595–603 | Cite as

Contextual motion field-based distance for video analysis

  • Yadong Mu
  • Shuicheng Yan
  • Thomas Huang
  • Bingfeng Zhou
Original Article


In this work, we propose a general method for computing distance between video frames or sequences. Unlike conventional appearance-based methods, we first extract motion fields from original videos. To avoid the huge memory requirement demanded by the previous approaches, we utilize the “bag of motion vectors” model, and select Gaussian mixture model as compact representation. Thus, estimating distance between two frames is equivalent to calculating the distance between their corresponding Gaussian mixture models, which is solved via earth mover distance (EMD) in this paper. On the basis of the inter-frame distance, we further develop the distance measures for both full video sequences.

Our main contribution is four-fold. Firstly, we operate on a tangent vector field of spatio-temporal 2D surface manifold generated by video motions, rather than the intensity gradient space. Here we argue that the former space is more fundamental. Secondly, the correlations between frames are explicitly exploited using a generative model named dynamic conditional random fields (DCRF). Under this framework, motion fields are estimated by Markov volumetric regression, which is more robust and may avoid the rank deficiency problem. Thirdly, our definition for video distance is in accord with human intuition and makes a better tradeoff between frame dissimilarity and chronological ordering. Lastly, our definition for frame distance allows for partial distance.


Video analysis Motion field Activity classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402. IEEE Computer Society, Washington, DC (2005)Google Scholar
  2. 2.
    Boiman, O., Irani, M.: Similarity by composition. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA (2007)Google Scholar
  3. 3.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication (2000)Google Scholar
  4. 4.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733. IEEE Computer Society, Washington, DC (2003)Google Scholar
  5. 5.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vision 70(1), 41–54 (2006)CrossRefGoogle Scholar
  6. 6.
    Greenspan, H., Dvir, G., Rubner, Y.: Context-dependent segmentation and matching in image databases. Comput. Vis. Image Underst. 93(1), 86–109 (2004)CrossRefGoogle Scholar
  7. 7.
    Greenspan, H., Goldberger, J., Ridel, L.: A continuous probabilistic framework for image matching. Comput. Vis. Image Underst. 84(3), 384–406 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer (2001)Google Scholar
  9. 9.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, vol. 1, pp. 166–173. IEEE Computer Society, Washington, DC (2005)Google Scholar
  10. 10.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439. IEEE Computer Society, Washington, DC (2003)Google Scholar
  11. 11.
    Li, S.Z.: Markov Random Field Modeling in Image Analysis (Computer Science Workbench). Springer (2001)Google Scholar
  12. 12.
    Rubner, Y., Guibas, L.J., Tomasi, C.: The earth movers distance, multi-dimensional scaling, and color-based image retrieval. In: APRA Image Understanding Workshop, pp. 661–668 (1997)Google Scholar
  13. 13.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)CrossRefzbMATHGoogle Scholar
  14. 14.
    Shechtman, E., Irani, M.: Space-time behavior based correlation. In: CVPR (1), pp. 405–412. IEEE Computer Society, Washington, DC (2005)Google Scholar
  15. 15.
    Sun, J., Yuan, L., Jia, J., Shum, H.Y.: Image completion with structure propagation. ACM Trans. Graph. 24(3), 861–868 (2005)CrossRefGoogle Scholar
  16. 16.
    Trucco, E., Verri, A.: Introductory Techniques for 3-D Computer Vision. Prentice Hall PTR, Upper Saddle River, NJ (1998)Google Scholar
  17. 17.
    Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML, pp. 985–992. ACM, New York, NY (2006)CrossRefGoogle Scholar
  18. 18.
    Wang, Y., Ji, Q.: A dynamic conditional random field model for object segmentation in image sequences. In: CVPR (1), pp. 264–270. IEEE Computer Society, Washington, DC (2005)Google Scholar
  19. 19.
    Winnemöller, H., Olsen, S.C., Gooch, B.: Real-time video abstraction. ACM Trans. Graph. 25(3), 1221–1226 (2006)CrossRefGoogle Scholar
  20. 20.
    Wu, M., Schoelkopf, B.: A local learning approach for clustering. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA (2007)Google Scholar
  21. 21.
    Yu, S.X., Shi, J.: Multiclass spectral clustering. In: ICCV, pp. 313–319. IEEE Computer Society, Washington, DC (2003)Google Scholar
  22. 22.
    Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: CVPR (2), pp. 123–130. IEEE Computer Society, Los Alamitos, CA (2001)Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Institute of Computer Science and TechnologyPeking UniversityBeijingP.R. China
  2. 2.ECE DepartmentNational University of SingaporeSingaporeSingapore
  3. 3.ECE DepartmentUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations