Video Clip Retrieval by Graph Matching

  • Manal Al Ghamdi
  • Yoshihiko Gotoh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


This paper presents a new approach to video clip retrieval using the Earth Mover’s Distance (EMD). The approach builds on the many-to-many match methodology between two graph-based representations. The problem of measuring similarity between two clips is formulated as a graph matching task in two stages. First, a bipartite graph with spatio-temporal neighbourhood is constructed to explore the relation between data points and estimate the relevance between a pair of video clips. Secondly, using the EMD, the problem of matching a clip pair is converted to computing the minimum cost of transportation within the spatio-temporal graph. Experimental results on the UCF YouTube Action dataset show that the presented work attained a significant improvement in retrieval capability over conventional techniques.


graph matching Earth Mover’s Distance video retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, L., Chua, T.S.: A match and tiling approach to content-based video retrieval. In: Proceedings of IEEE International Conference on Multimedia and Expo. (2001)Google Scholar
  2. 2.
    Zaslavskiy, M., Bach, F., Vert, J.-P.: Many-to-many graph matching: a continuous relaxation approach. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 515–530. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Zhou, F., de la Torre, F.: Factorized graph matching. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  4. 4.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision (2000)Google Scholar
  5. 5.
    van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: A Comparative Review (2008)Google Scholar
  6. 6.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  7. 7.
    Al Ghamdi, M., Al Harbi, N., Gotoh, Y.: Spatio-temporal video representation with locality-constrained linear coding. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 101–110. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM Multimedia (2007)Google Scholar
  9. 9.
    Plummer, D., Lovász, L.: Matching theory. Elsevier Science (1986)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Manal Al Ghamdi
    • 1
  • Yoshihiko Gotoh
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldUnited Kingdom

Personalised recommendations