Video Google: Efficient Visual Search of Videos

Sivic, Josef; Zisserman, Andrew

doi:10.1007/11957959_7

Josef Sivic²⁰ &
Andrew Zisserman²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4170))

3131 Accesses
128 Citations

Abstract

We describe an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject those that are unstable.

Efficient retrieval is achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. This requires a visual analogy of a word which is provided here by vector quantizing the region descriptors. The final ranking also depends on the spatial layout of the regions. The result is that retrieval is immediate, returning a ranked list of shots in the manner of Google.

We report results for object retrieval on the full length feature films ‘Groundhog Day’ and ‘Casablanca’.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www.robots.ox.ac.uk/~vgg/research/vgoogle/
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Baumberg, A.: Reliable feature matching across widely separated views. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 774–781 (2000)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: 7th Int. WWW Conference (1998)
Google Scholar
Harris, C.J., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, Manchester, pp. 147–151 (1988)
Google Scholar
Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3-d depth cues from affine distortions of local 2-d brightness structure. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 800, pp. 389–400. Springer, Heidelberg (1994)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1150–1157 (September 1999)
Google Scholar
Lowe, D.: Local feature view clustering for 3D object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688. Springer, Heidelberg (2001)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 384–393 (2002)
Google Scholar
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Chapter Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2003)
Google Scholar
Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proceedings of the British Machine Vision Conference, pp. 113–122 (2002)
Google Scholar
Schaffalitzky, F., Zisserman, A.: Automated Scene Matching in Movies. In: Lew, M., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, pp. 186–197. Springer, Heidelberg (2002)
Chapter Google Scholar
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or How do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)
Chapter Google Scholar
Schmid, C., Mohr, R.: Local greyvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5), 530–534 (1997)
Article Google Scholar
Sivic, J., Schaffalitzky, F., Zisserman, A.: Object level grouping for video shots. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 85–98. Springer, Heidelberg (2004)
Chapter Google Scholar
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proceedings of the International Conference on Computer Vision (October 2003)
Google Scholar
Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters 21, 1193–1198 (2000)
Article MATH Google Scholar
Tell, D., Carlsson, S.: Combining Appearance and Topology for Wide Baseline Matching. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 68–81. Springer, Heidelberg (2002)
Chapter Google Scholar
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. In: Proceedings of the 11th British Machine Vision Conference, Bristol, pp. 412–425 (2000)
Google Scholar
Witten, I.H., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering Science, University of Oxford, Oxford, OX1 3PJ1
Josef Sivic & Andrew Zisserman

Authors

Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’Informatique, Ecole Normale Supérieure, P.O. Box, Paris, France
Jean Ponce
Carnegie Mellon University, Pittsburgh, USA
Martial Hebert
GRAVIR-INRIA, 655 avenue de l’Europe, P.O. Box, 38330, Montbonnot, France
Cordelia Schmid
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sivic, J., Zisserman, A. (2006). Video Google: Efficient Visual Search of Videos. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_7

Download citation

DOI: https://doi.org/10.1007/11957959_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68794-8
Online ISBN: 978-3-540-68795-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics