Advertisement

Video Google: Efficient Visual Search of Videos

  • Josef Sivic
  • Andrew Zisserman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)

Abstract

We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable.

Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google.

We report results for object retrieval on the full length feature films ‘Groundhog Day’ and ‘Casablanca’.

Keywords

Ground Truth Visual Word Query Image Covariant Region Text Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar
  3. 3.
    Baumberg, A.: Reliable feature matching across widely separated views. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 774–781 (2000)Google Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: 7th Int. WWW Conference (1998)Google Scholar
  5. 5.
    Harris, C.J., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, Manchester, pp. 147–151 (1988)Google Scholar
  6. 6.
    Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3-d depth cues from affine distortions of local 2-d brightness structure. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 800, pp. 389–400. Springer, Heidelberg (1994)Google Scholar
  7. 7.
    Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1150–1157 (September 1999)Google Scholar
  8. 8.
    Lowe, D.: Local feature view clustering for 3D object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688. Springer, Heidelberg (2001)Google Scholar
  9. 9.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  10. 10.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 384–393 (2002)Google Scholar
  11. 11.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2003)Google Scholar
  13. 13.
    Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proceedings of the British Machine Vision Conference, pp. 113–122 (2002)Google Scholar
  14. 14.
    Schaffalitzky, F., Zisserman, A.: Automated Scene Matching in Movies. In: Lew, M., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, pp. 186–197. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or How do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Schmid, C., Mohr, R.: Local greyvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5), 530–534 (1997)CrossRefGoogle Scholar
  17. 17.
    Sivic, J., Schaffalitzky, F., Zisserman, A.: Object level grouping for video shots. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 85–98. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proceedings of the International Conference on Computer Vision (October 2003)Google Scholar
  19. 19.
    Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters 21, 1193–1198 (2000)MATHCrossRefGoogle Scholar
  20. 20.
    Tell, D., Carlsson, S.: Combining Appearance and Topology for Wide Baseline Matching. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 68–81. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  21. 21.
    Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. In: Proceedings of the 11th British Machine Vision Conference, Bristol, pp. 412–425 (2000)Google Scholar
  22. 22.
    Witten, I.H., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Josef Sivic
    • 1
  • Andrew Zisserman
    • 1
  1. 1.Department of Engineering ScienceUniversity of OxfordOxford

Personalised recommendations