Skip to main content
Log in

Scene Determination Based on Video and Audio Features

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Determining automatically what constitutes a scene in a video is a challenging task, particularly since there is no precise definition of the term “scene”. It is left to the individual to set attributes shared by consecutive shots which group them into scenes. Certain basic attributes such as dialogs, settings and continuing sounds are consistent indicators. We have therefore developed a scheme for identifying scenes which clusters shots according to detected dialogs, settings and similar audio. Results from experiments show automatic identification of these types of scenes to be reliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. Aoki, S. Shimotsuji, and O. Hori, “A shot classification method of selecting effective key-frames for video browsing,” in Proc. ACM Multimedia 96, Nov. 1996, Boston, MA, pp. 1–10.

  2. F. Arman, R. Depommier, A. Hsu, and M.-Y. Chiu, “Content-based browsing of video sequences,” in Proc. ACM Multimedia 94, San Francisco, CA, Oct. 1994, pp. 97–103.

  3. A.A. Armer, Directing Television and Film, Wadsworth Publishing Co., 2nd ed., Belmont, CA, 1990.

    Google Scholar 

  4. F. Beaver, Dictionary of Film Terms, Twayne Publishing, New York, 1994.

    Google Scholar 

  5. S. Belognie, C. Carson, H. Greenspan, and J. Malik, “Recognition of images in large databases using a learning framework,” UC Berkeley CS Technical Report 97–939, 1997.

  6. R.M. Bolle, B.-L. Yeo, and M.M. Yeung, “Video query: Research directions,” IBM Journal of Research and Development, Vol. 42, No. 2, pp. 233–252, 1998.

    Google Scholar 

  7. D. Bordwell and K. Thompson, Film Art: An Introduction, McGraw-Hill, Inc., 4th ed., New York, NY, 1993.

    Google Scholar 

  8. M. Gorkani and R.W. Picard, “Texture orientation for sorting photos “at a Glance,”” in Proc. International Conference on Pattern Recognition, Jerusalem, 1995, Vol. I, pp. 459–464.

    Google Scholar 

  9. A.G. Hauptmann and M.A. Smith, “Text, speech, and vision for video segmentation: The informedia project,” in AAAI-95 Fall Symposium on Computational Models for Integrating Language and Vision, Nov. 1995.

  10. J. Huang, S.R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image indexing using color correlograms,” in IEEE Computer Vision and Pattern Recognition Conference, San Juan, Puerto Rico, June 1997, pp. 762–768.

  11. M. Hunke, Locating and Tracking of Human Faces With Neural Networks, Master's thesis, University of Karlsruhe, 1994. http://ernie.sfsu.edu/~hunke/

  12. B. Jähne, Digital Image Processing. Concepts, Algorithms, and Scientific Applications, Springer-Verlag: Berlin, Heidelberg, New York, 1995.

    Google Scholar 

  13. S. Lawrence, C.L. Giles, A.C. Tsoi, and A.D. Back, “Face recognition: A convolutional neural network approach,” in IEEE Transactions on Neural Networks, Special Issue on Neural Network and Pattern Recognition, accepted for publication.

  14. W. Li, S. Gauch, J. Gauch, and K.M. Pua, “VISION: A digital video library,” ACM Multimedia Magazine, pp. 19–27, 1996.

  15. R. Lienhart, “Methods of content analysis, indexing and comparison of digital video sequences,” Ph.D. thesis, Shaker Verlag, July 1998. (in German).

  16. R. Lienhart, W. Effelsberg, and R. Jain, “Towards a visual grep: A systematic analysis of various methods to compare video sequences,” in Proc. SPIE 3312, Storage and Retrieval for Image and Video Databases VI, I.K. Sethi and R.C. Jain (Eds.), 1998, pp. 271–282.

  17. Z. Liu, Y. Wang, and T. Chen, “Audio feature extraction and analysis for scene segmentation and classification,” Journal of VLSI Signal Processing System, Systems for Signal, Image, and Video Technology, Vol. 20, No. 1/2, pp. 61–79, 1998.

    Google Scholar 

  18. C.Y. Low, Q. Tian, and H. Zhang, “An automatic news video parsing, indexing and browsing system,” in Proc. ACM Multimedia 96, Boston, MA, Nov. 1996, pp. 425–426.

  19. K. Minami, A. Akutsu, H. Hamada, and Y. Tonomura, “Enhanced video handling based on audio analysis,” in Proc. IEEE International Conference on Multimedia and Systems, Ottawa, Canada. June 1997, pp. 219–226.

  20. A. Merlino, D. Morey, and M. Maybury, “Broadcast news navigation using story segmentation,” in Proc. ACM Multimedia 97, Seattle, USA, Nov. 1997, pp. 381–391.

  21. J. Nam, A. Cetin, and A. Tewfik, “Speaker identification and video analysis for hierarchical video shot classification,” in Proc. IEEE International Conference on Image Processing 97, Vol. 2, 1997, pp. 550–553.

    Google Scholar 

  22. G. Pass, R. Zabih, and J. Miller, “Comparing images using color coherence vectors,” in Proc.ACMMultimedia 96, Boston, MA, USA, Nov. 1996, pp. 65–73.

  23. A. Pentland, B. Moghaddam, and T. Starner, “View-based and modular eigenspaces for face recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition, Seattle, WA, July, 1994.

  24. S. Pfeiffer, “The importance of perceptive adaptation of sound features for audio content processing,” in Proc. SPIE Conferences, Electronic Imaging 1999, Storage and Retrieval for Image and Video Databases VII, Jan. 1999, San Jose, CA, pp. 328–337.

  25. H. A. Rowley, S. Baluja, and T. Kanade, “Human face recognition in visual scenes,” Technical Report Carnegie Mellon University, CMU-CS-95-158R, School of Computer Science, Nov. 1995.

  26. C. Saraceno and R. Leonardi, “Audio as a support to scene change detection and characterization of video sequences,” in Proc. 22nd IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, May 1997, pp. 2597–2600.

  27. Y. Taniguchi, A. Akutsu, and Y. Tonomura, “PanoramaExcerpts: Extracting and packing panoramas for video browsing,” in Proc. ACM Multimedia 97, Seattle, USA, Nov. 1997, pp. 427–436.

  28. M.M. Yeung, B.-L. Yeo, W. Wolf, and B. Lu, “Video browsing using clustering and scene transitions on compressed sequences,” in Proc. SPIE 2417, 1995, pp. 399–413.

  29. M.M. Yeung and B.-L. Yeo, “Time-constrained clustering for segmentation of video into story units,” in Proc. International Conference on Pattern Recognition, Vol. C, 1996, pp. 375–380.

    Google Scholar 

  30. M. Yeung and B.-L. Yeo, “Video content characterization and compaction for digital library applications,” in SPIE 3022, Storage and Retrieval of Image and Video Databases 1997, Jan. 1997, pp. 45–58.

  31. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and classifying scene breaks,” in Proc. ACM Multimedia 95, San Francisco, CA, Nov. 1995, pp. 189–200.

  32. H.J. Zhang, S. Tau, S. Smoliar, and G. Yihong, “Automatic parsing and indexing of news video,” Multimedia Systems, Vol. 2, No. 6, pp. 256–266, 1995.

    Google Scholar 

  33. D. Zhong, H.J. Zhang, and S.-F. Chang, “Clustering Methods for Video Browsing and Annotation,” in Sorage and Retrieval for Still Image andVideo Databases IV, IS&T/SPIE's Electronic Imaging Science & Technology 96, 2670, San Jose, CA, Feb. 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pfeiffer, S., Lienhart, R. & Efflsberg, W. Scene Determination Based on Video and Audio Features. Multimedia Tools and Applications 15, 59–81 (2001). https://doi.org/10.1023/A:1011315803415

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011315803415

Navigation