Beyond Key-Frames: The Physical Setting as a Video Mining Primitive

Aner-Wolf, Aya; Kender, John R.

doi:10.1007/978-1-4757-6928-9_2

Aya Aner-Wolf³ &
John R. Kender⁴

Part of the book series: The Springer International Series in Video Computing ((VICO,volume 6))

156 Accesses

Abstract

We present an automatic tool for the compact representation, cross-referencing, and exploration of long video sequences, which is based on a novel visual abstraction of semantic content. Our approach is based on building a highly compact hierarchical representation for long sequences. This is achieved by using non-temporal clustering of scene segments into a new conceptual form grounded in the recognition of real-world backgrounds. We represent shots and scenes using mosaics derived from representative shots, and employ a novel method for the comparison of scenes based on these representative mosaics. We then cluster scenes together into a more useful higher level of abstraction — the physical setting. We demonstrate our work using situation comedies, where each half-hour (40,000-frame) episode is well-structured by rules governing background use. Consequently, browsing, indexing, and comparison across videos by physical setting is very fast. Further, we show that the analysis of the frequency of use of these physical settings leads directly to high-level contextual identification of the main plots in each video. We demonstrate these contributions with a browsing tool which allows both temporal and non-temporal browsing of episodes from situation comedies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aner, A. and Kender, J. R. (2001). A unified memory-based approach to cut, dissolve, key frame and scene analysis. In Proceedings of IEEE International Conference on Image Processing.
Google Scholar
Aner, A. and Kender, J. R. (2002). Video summaries through mosaic-based shot and scene clustering. In Proceedings of European Conference on Computer Vision.
Google Scholar
Aner, A., L.Tang, and Kender, J. R. (2002). A method and browser for cross-referenced video summaries. In Proceedings of International Conference on Multimedia and Expo.
Google Scholar
Aner-Wolf, A. and Wolf, L. (2002). Video de-abstraction or how to save money on your wedding video. In Workshop on Applications of Computer Vision.
Google Scholar
Arijon, D. (1976). Grammar of the Film Language. Silman-James Press.
Google Scholar
Bouthemy, P., Dufournaud, Y., Fablet, R., Mohr, R., Peleg, S., and Zomet, A. (1999). Video hyper-links creation for content-based browsing and navigation. In Workshop on Content-Based Multimedia Indexing,Touluse, France.
Google Scholar
Gelgon, M. and P.Bouthemy (1998). Comparison of automatic shot boundary detection algorithms. In European Conference on Computer Vision, volume 1.
Google Scholar
Gonzalez, R. C. and Woods, R. E. (1993). Digital Image Processing. Addison Wesley.
Google Scholar
Hanjalic, A., Lagendijk, R. L., and Biemond, J. (1999). Automated high-level movie segmentation for advanced video retrieval systems. In IEEE Transactions on Circuits and Systems for Video Technology, volume 9, pages 580–588.
Google Scholar
Irani, M. and Anandan, P. (1998). Video indexing based on mosaic representations. In IEEE Trans. on Pattern Analysis and Machine Inteligencevolume 86, pages 905 — 921.
Google Scholar
Irani, M., Anandan, P., Kumar, J. B. R., and Hsu, S. (1996). Efficient representation of video sequences and their applications. In Signal processing: Image Communicationvolume 8, pages 327–351.
Article Google Scholar
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
Kender, J. R. and Yeo, B.-L. (1998). Video scene segmentation via continuous video coherence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
http://odur.let.rug.nl/ kleiweg/clustering/clustering.html Kleiweg, P. Clustering software available at
Lee, M., Chen, W., Lin, C., Gu, C., Markoc, T., Zabinsky, S., and Szeliski, R. (1997). A layered video object coding system using sprite and affine motion model. In Proceedings IEEE Transactions on Circuits and Systems for Video Technology, volume 7, pages 130–145.
Google Scholar
Lienhart, R. (1999). Comparison of automatic shot boundary detection algorithms. In SPIE Storage and Retrieval for Still Image and Video Databases VII, volume 3656, pages 290–301.
Chapter Google Scholar
Massey, M. and Bender, W. (1996). Salient stills: Process and practice. In IBM Research Journal, volume 35.
Google Scholar
Oh, J., Hua, K. A., and Liang, N. (2000a). A content-based scene change detection and classification technique using background tracking. In Proceedings of the ISUT/SPIE Conference on Multimedia Computing and Networking, pages 254–265.
Google Scholar
Oh, J., Hua, K. A., and Liang, N. (2000b). Efficient and cost-effective techniques for browsing and indexing large video databases. In ACM SIGMOD on Management of Data, pages 415–426.
Google Scholar
Salton, G. and McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.
MATH Google Scholar
Schaffalitzky, F. and Zisserman, A. (2001). Viewpoint invariant texture matching and wide baseline stereo. In Proceedings of the International Conference on Computer Vision.
Google Scholar
Smeulders, A., Worring, M., Santini, S., and Gupta, A. (2000). Content based image retrieval at the end of the early years. In International Journal on Pattern Analysis and Machine Intelligence, volume 22.
Google Scholar
Szeliski, R. and Heung-Yeung, S. (1997). Creating full-view panoramic image mosaics and environment maps. In SIGGRAPH.
Google Scholar
Vasconcelos, N. (1998). A spatiotemporal motion model for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
Wang, J. and Adelson, E. (1994). Representing moving images with layers. In IEEE Transactions on Image Processing, volume 3, pages 625638.
Article Google Scholar
Yeung, M. and Yeo, B. (1996). Time-constrained clustering for segmentation of video into story units. In Proceedings of IEEE International Conference on Pattern Recognition.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Applied Math Weizmann Institute of Science, Israel
Aya Aner-Wolf
Department of Computer Science Columbia University, USA
John R. Kender

Authors

Aya Aner-Wolf
View author publications
You can also search for this author in PubMed Google Scholar
John R. Kender
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Maryland, College Park, MD, USA
Azriel Rosenfeld , David Doermann & Daniel DeMenthon , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aner-Wolf, A., Kender, J.R. (2003). Beyond Key-Frames: The Physical Setting as a Video Mining Primitive. In: Rosenfeld, A., Doermann, D., DeMenthon, D. (eds) Video Mining. The Springer International Series in Video Computing, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6928-9_2

Download citation

DOI: https://doi.org/10.1007/978-1-4757-6928-9_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5383-4
Online ISBN: 978-1-4757-6928-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics