In this paper, we consider the problem of similarity between video sequences. Three basic questions are raised and (partially) answered. Firstly, at what temporal duration can video sequences be compared? The frame, shot, scene and video levels are identified. Secondly, given some image or video feature, what are the requirements on its distance measure and how can it be “easily” transformed into the visual similarity desired by the inquirer? Thirdly, how can video sequences be compared at different levels? A general approach based on either a set or sequence representation with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal resolution. It allows the inquirer to fully control the importance of temporal ordering and duration. The general approach is illustrated by introducing and discussing some of the many possible image and video features. Promising experimental results are presented.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
A. Akutsu and Y. Tonomura, “Video tomography: An efficient method for camerawork extraction and motion analysis,” Proc. ACM Multimedia 94, San Francisco, CA, Oct. 1994, pp. 349–356.
S. Belognie, C. Carson, H. Greenspan, and J. Malik, Recognition of Images in Large Databases Using a Learning Framework. UC Berkeley CS Technical Report 97–939, 1997.
D. Bordwell and K. Thompson, Film Art: An Introduction. 4th ed. McGraw-Hill: New York, 1993.
N. Dimitrova and M. Abdel-Mottaleb, “Content-based video retrieval by example video clip,” Proc. SPIE Vol. 3022, Storage and Retrieval for Image and Video Databases V, Ishwar K. Sethi, and Ramesh C. Jain (Eds.), 1997, pp. 59–70.
M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, Query by image and video content: the QBIC System. Computer, Vol. 28, No. 9, pp. 23–32, 1995.
A. Hampapur, A. Gupta, B. Horowitz, C.-F. Shu, C. Fuller, J.R. Bach, M. Gorkani, and R.C. Jain, “Virage video engine,” Proc. SPIE Vol. 3022, Storage and Retrieval for Image and Video Databases V, Ishwar K. Sethi and Ramesh C. Jain; (Eds.), 1997, pp. 188–198.
J. Huang, S.R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image indexing using color correlograms,” IEEE Computer Vision and Pattern Recognition Conference, San Juan, Puerto Rico, S. June 1997, 762–768.
H. Martin Hunke, “Locating and tracking of human faces with neural networks,” Master's thesis. University of Karlsruhe. 1994. http://ernie.sfsu.edu/~hunke/
B. Jähne, Digital Image Processing. Concepts, Algorithms, and Scientific Applications. Springer-Verlag: Berlin, 1995.
R. Lienhart, Methods towards automatic video analysis, indexing and retrieval. Ph.D. thesis, University of Mannheim, June 1998. in German.
R. Lienhart, S. Pfeiffer, and W. Effelsberg, “The MoCA workbench: Support for creativity in movie content analysis,” Proc. of the IEEE Conference on Multimedia Computing & Systems, Hiroshima, Japan, June 1996, pp. 314–321.
R. Lienhart, S. Pfeiffer, and W. Effelsberg,“Video abstracting,” Communications of the ACM, Vol. 40, No. 12, pp. 55–62, 1997.
G. Pass, R. Zabih, and J. Miller, “Comparing images using color coherence vectors,” Proc. ACM Multimedia 96, Boston, MA, Nov. 1996, pp. 65–73.
A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation on image databases,” International Journal of Computer Vision, Vol. 18, No. 3, pp. 323–254, 1996.
S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” Proc. ACM Multimedia 96, Boston, MA, Nov. 1996, pp. 21–30.
C.A. Poynton, A technical introduction to digital video. John Wiley & Sons, 1996.
H.A. Rowley, S. Baluja, and T. Kanade, “Human face recognition in visual scenes,” Technical Report CMUCS–95–158R, School of Computer Science, Carnegie Mellon University, Nov. 1995.
S. Santini and R. Jain, “Similarity measures,” IEEETransactions on Pattern Analysis and Machine Intelligence, Vol. 21, No. 2, Sept. 1999.
H.S. Sawhney and S. Ayer, “Compact representations of videos through dominant and multiple motion estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 18, pp. 814–830, 1996.
J.R. Smith and S.-F. Chang, “VisualSEEk: A fully automated content-based image query system,” ACM Multimedia Conference, Boston, MA, Nov. 1996.
G.A. Stephen, String Searching Algorithms. World Scientific Publishing Co. Pte. Ltd., 1994.
D. Swanberg, C.-F. Shu, and R.C. Jain, “Knowledge-guided parsing in video databases,” in SPIE Vol. 1908, Storage and Retrieval for Image and Video Databases, Wayne Niblack (Ed.), 1993, pp. 13–24.
M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71–86, 1991.
D.A. White and R. Jain, “ImageGREP: Fast visual pattern matching in image databases,” in SPIE Vol. 3022, Storage and Retrieval for Image and Video Databases V, Ishwar K. Sethi and Ramesh C. Jain (Eds.), pp. 96–107, 1997.
M. Yeung and B.-L. Yeo, “Video content characterization and compaction for digital library applications,” in SPIE 3022, Storage and Retrieval of Image and Video Databases 1997, Jan. 1997, pp. 45–58.
M. Yeung, B.-L. Yeo, and Bede Liu, “Extracting story units from long programs for video browsing and navigation,” Proc. of the IEEE Conference on Multimedia Computing & Systems, Hiroshima, Japan, June 1996, pp. 296–305.
R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and classifying scene breaks,” Proc. ACM Multimedia 95, San Francisco, CA, Nov. 1995, pp. 189–200.
H. Zhang and S.W. Smoliar, “Developing power tools for video indexing and retrieval,” in SPIE Vol. 2185, Storage and Retrieval for Video Databases II, San Jose, CA, USA, Feb. 1994, pp. 140–149.
About this article
Cite this article
Lienhart, R., Effelsberg, W. & Jain, R. VisualGREP: A Systematic Method to Compare and Retrieve Video Sequences. Multimedia Tools and Applications 10, 47–72 (2000). https://doi.org/10.1023/A:1009663921899
- video indexing
- video retrieval
- video content analysis
- video similarity