Skip to main content

Beyond Key-Frames: The Physical Setting as a Video Mining Primitive

  • Chapter
Video Mining

Part of the book series: The Springer International Series in Video Computing ((VICO,volume 6))

  • 156 Accesses

Abstract

We present an automatic tool for the compact representation, cross-referencing, and exploration of long video sequences, which is based on a novel visual abstraction of semantic content. Our approach is based on building a highly compact hierarchical representation for long sequences. This is achieved by using non-temporal clustering of scene segments into a new conceptual form grounded in the recognition of real-world backgrounds. We represent shots and scenes using mosaics derived from representative shots, and employ a novel method for the comparison of scenes based on these representative mosaics. We then cluster scenes together into a more useful higher level of abstraction — the physical setting. We demonstrate our work using situation comedies, where each half-hour (40,000-frame) episode is well-structured by rules governing background use. Consequently, browsing, indexing, and comparison across videos by physical setting is very fast. Further, we show that the analysis of the frequency of use of these physical settings leads directly to high-level contextual identification of the main plots in each video. We demonstrate these contributions with a browsing tool which allows both temporal and non-temporal browsing of episodes from situation comedies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aner, A. and Kender, J. R. (2001). A unified memory-based approach to cut, dissolve, key frame and scene analysis. In Proceedings of IEEE International Conference on Image Processing.

    Google Scholar 

  • Aner, A. and Kender, J. R. (2002). Video summaries through mosaic-based shot and scene clustering. In Proceedings of European Conference on Computer Vision.

    Google Scholar 

  • Aner, A., L.Tang, and Kender, J. R. (2002). A method and browser for cross-referenced video summaries. In Proceedings of International Conference on Multimedia and Expo.

    Google Scholar 

  • Aner-Wolf, A. and Wolf, L. (2002). Video de-abstraction or how to save money on your wedding video. In Workshop on Applications of Computer Vision.

    Google Scholar 

  • Arijon, D. (1976). Grammar of the Film Language. Silman-James Press.

    Google Scholar 

  • Bouthemy, P., Dufournaud, Y., Fablet, R., Mohr, R., Peleg, S., and Zomet, A. (1999). Video hyper-links creation for content-based browsing and navigation. In Workshop on Content-Based Multimedia Indexing,Touluse, France.

    Google Scholar 

  • Gelgon, M. and P.Bouthemy (1998). Comparison of automatic shot boundary detection algorithms. In European Conference on Computer Vision, volume 1.

    Google Scholar 

  • Gonzalez, R. C. and Woods, R. E. (1993). Digital Image Processing. Addison Wesley.

    Google Scholar 

  • Hanjalic, A., Lagendijk, R. L., and Biemond, J. (1999). Automated high-level movie segmentation for advanced video retrieval systems. In IEEE Transactions on Circuits and Systems for Video Technology, volume 9, pages 580–588.

    Google Scholar 

  • Irani, M. and Anandan, P. (1998). Video indexing based on mosaic representations. In IEEE Trans. on Pattern Analysis and Machine Inteligencevolume 86, pages 905 — 921.

    Google Scholar 

  • Irani, M., Anandan, P., Kumar, J. B. R., and Hsu, S. (1996). Efficient representation of video sequences and their applications. In Signal processing: Image Communicationvolume 8, pages 327–351.

    Article  Google Scholar 

  • Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Kender, J. R. and Yeo, B.-L. (1998). Video scene segmentation via continuous video coherence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

    Google Scholar 

  • http://odur.let.rug.nl/ kleiweg/clustering/clustering.html Kleiweg, P. Clustering software available at

  • Lee, M., Chen, W., Lin, C., Gu, C., Markoc, T., Zabinsky, S., and Szeliski, R. (1997). A layered video object coding system using sprite and affine motion model. In Proceedings IEEE Transactions on Circuits and Systems for Video Technology, volume 7, pages 130–145.

    Google Scholar 

  • Lienhart, R. (1999). Comparison of automatic shot boundary detection algorithms. In SPIE Storage and Retrieval for Still Image and Video Databases VII, volume 3656, pages 290–301.

    Chapter  Google Scholar 

  • Massey, M. and Bender, W. (1996). Salient stills: Process and practice. In IBM Research Journal, volume 35.

    Google Scholar 

  • Oh, J., Hua, K. A., and Liang, N. (2000a). A content-based scene change detection and classification technique using background tracking. In Proceedings of the ISUT/SPIE Conference on Multimedia Computing and Networking, pages 254–265.

    Google Scholar 

  • Oh, J., Hua, K. A., and Liang, N. (2000b). Efficient and cost-effective techniques for browsing and indexing large video databases. In ACM SIGMOD on Management of Data, pages 415–426.

    Google Scholar 

  • Salton, G. and McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Schaffalitzky, F. and Zisserman, A. (2001). Viewpoint invariant texture matching and wide baseline stereo. In Proceedings of the International Conference on Computer Vision.

    Google Scholar 

  • Smeulders, A., Worring, M., Santini, S., and Gupta, A. (2000). Content based image retrieval at the end of the early years. In International Journal on Pattern Analysis and Machine Intelligence, volume 22.

    Google Scholar 

  • Szeliski, R. and Heung-Yeung, S. (1997). Creating full-view panoramic image mosaics and environment maps. In SIGGRAPH.

    Google Scholar 

  • Vasconcelos, N. (1998). A spatiotemporal motion model for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

    Google Scholar 

  • Wang, J. and Adelson, E. (1994). Representing moving images with layers. In IEEE Transactions on Image Processing, volume 3, pages 625638.

    Article  Google Scholar 

  • Yeung, M. and Yeo, B. (1996). Time-constrained clustering for segmentation of video into story units. In Proceedings of IEEE International Conference on Pattern Recognition.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Aner-Wolf, A., Kender, J.R. (2003). Beyond Key-Frames: The Physical Setting as a Video Mining Primitive. In: Rosenfeld, A., Doermann, D., DeMenthon, D. (eds) Video Mining. The Springer International Series in Video Computing, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6928-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-6928-9_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5383-4

  • Online ISBN: 978-1-4757-6928-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics