Skip to main content

Movie Content Analysis, Indexing and Skimming Via Multimodal Information

  • Chapter
Video Mining

Part of the book series: The Springer International Series in Video Computing ((VICO,volume 6))

Abstract

A content-based movie analysis, indexing and skimming system is developed in this research. Specifically, it includes the following three major modules: 1) an event detection module, where three types of movie events, namely, two-speaker dialogs, multiple-speaker dialogs, and hybrid events are extracted from the content. Multiple media cues such as audio, speech, visual and face information are integrated to achieve this goal; 2) a speaker identification module, where an adaptive speaker identification scheme is proposed to recognize target movie cast members for content indexing purposes. Both audio and visual sources are exploited in the identification process, where the audio source is analyzed to recognize speakers using a likelihood-based approach, and the visual source is examined to locate talking faces with face detection/recognition and mouth tracking techniques; 3) a movie skimming module, where an event-based skimming system is developed to abstract movie content in the form of a short video clip for content browsing purposes. Extensive experiments on integrating multiple media cues for movie content analysis, indexing and skimming have yielded encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Block, B., The Visual Story: Seeing the Structure of Film, TV, and New Media. Massachusetts, Focal Press, 2001.

    Google Scholar 

  • Chen, S., and Gopalakrishnan, P, P., “Speaker, Environment and Channel Change Detection and Clustering Via the Bayesian Information Criterion,” Proc. of DARPA Broadcast News Transcription and Understanding Workshop, 1998.

    Google Scholar 

  • Hauptmann, A.G., and Smith, M.A., “Text, Speech, and Vision For Video Segmentation: The Informedia Project,” Proc. of the AAAI Fall Symposium on Computer Models for Integrating Language and Vision, 1995.

    Google Scholar 

  • HP Labs, Computational Video Group, “The HP Face Detection and Recognition Library,” User’s Guide and Reference Manual, Version 2. 2, December 1998.

    Google Scholar 

  • Huang, J., Liu, Z., and Wang, Y., “Integration of Audio and Visual Information For Content-based Video Segmentation,” ICIP’98, October 1998.

    Google Scholar 

  • Johnson, S.E., “Who Spoke When? - Automatic Segmentation and Clustering For Determining Speaker Turns,” Eurospeech’99, 1999.

    Google Scholar 

  • Li, D., Wei, G., Sethi, I.K., and Dimitrova, N., “Person Identification in TV Programs,” Journal of Electronic Imaging, 10 (4): 930–938, 2001.

    Article  Google Scholar 

  • Li, M., Li, D., Dimitrova, N., and Sethi, I., “Audio-visual Talking Face Detection,” ICME’03, July 2003.

    Google Scholar 

  • Li, Q., Zheng, J., Zhou, Q., and Lee, C., “A Robust, Real-time Endpoint Detector With Energy Normalization For ASR in Adverse Environments,”, ICASSP’01, May 2001.

    Google Scholar 

  • Li, Y., and Kuo, C.-C., Content-based Video Analysis, Indexing and Representation Using Multimodal Information, Ph.D Thesis, University of Southern California, 2003.

    Google Scholar 

  • Li, Y., Narayanan, S., and Kuo, C.-C, C.-C., “Identification of Speakers in Movie Dialogs Using Audiovisual Cues,” ICASSP’02, Orlando, May 2002.

    Google Scholar 

  • Li, Y., Narayanan, S., and Kuo, C.-C., “Adaptive Speaker Identification with AudioVisual Cues For Movie Content Analysis,” Invited Paper in Pattern Recognition Letters with special issue on Recent Trends in Video Computing, 2003.

    Google Scholar 

  • Liu, F., Kim, J., and Kuo, C.-C., “Adaptive Delay Concealment For Internet Voice Applications with Packet-based Time-scale Modification,” ICASSP’01, 2001.

    Google Scholar 

  • Mardia, K., Kent, J., and Bibby, J., Multivariate Analysis. Academic Press, San Diego, 1979.

    MATH  Google Scholar 

  • Martello, S., and Toth, P., Knapsack Problems: Algorithms and Computer Implementations. Chichester, NY, Wiley and Sons, 1990.

    MATH  Google Scholar 

  • Mokbel, C., “Online Adaptation of HMMs to Real-life Conditions: A Unified Framework,” IEEE Transactions on Speech and Audio Processing, 9 (4): 342–357, May 2001.

    Article  Google Scholar 

  • Monaco, J., How To Read A Film: The Art, Technology, Language, History and Theory of Film and Media, New York, Oxford University Press, 1982.

    Google Scholar 

  • MPEG Requirements Group, “MPEG-7 Context, Objectives and Technical Roadmap,” Doc. ISO/MPEG N2861, MPEG Vancouver Meeting, July 1999.

    Google Scholar 

  • Pfeiffer, S., Lienhart, R., Fischer, S., and Effelsberg, W., “Abstracting Digital Movies Automatically,” Journal of Visual Communication and Image Representation, 7 (4): 345–353, December 1996.

    Article  Google Scholar 

  • Reisz, K., and Millar, G., The Technique of Film Editing. New York: Hastings House, Publishers, 1968.

    Google Scholar 

  • Reynolds, D., and Rose, R., “Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, 3 (1): 72–83, 1995.

    Article  Google Scholar 

  • Rui, Y., Huang, T.S., and Mehrotra, S., “Constructing Table-of-content For Video,” ACM Journal of Multimedia Systems, 7 (5): 359–368, 1998.

    Article  Google Scholar 

  • Smith, M., and Kanade, T., “Video Skimming and Characterization Through the Combination of Image and Language Understanding Techniques,” Proc. of the IEEE Computer Vision and Pattern Recognition, pages 775–781, 1997.

    Google Scholar 

  • Sundaram, H., and Chang, S.F., “Determining Computable Scenes in Films and Their Structures Using Audio-visual Memory Models,” ACM Multimedia’00, Marina Del Rey, November 2000.

    Google Scholar 

  • Tarkovsky, A., Sculpting in Time - Reflections on the Cinema, Austin, University of Texas Press, 1986.

    Google Scholar 

  • Taskiran, C.M., Amir, A., Ponceleon, D., and Delp, E.J., “Automated Video Summarization Using Speech Transcripts,” Proc. of SPIE, 4676: 37 1382, January 2002.

    Google Scholar 

  • Toklu, C., Liou, S.P., and Das, M, M., “Videoabstract: A Hybrid Approach To Generate Semantically Meaningful Video Summaries,” ICME’00, New York, 2000.

    Google Scholar 

  • Tsekeridou, S., and Pitas, I., “Content-based Video Parsing and Indexing Based on Audio-visual Interaction,” IEEE Transactions on Circuits and Systems for Video Technology, 11 (4): 522–535, 2001.

    Article  Google Scholar 

  • Tseng, B.L., Lin, C.Y., and Smith, J.R., “Video Summarization and Personalization For Pervasive Mobile Devices,” Proc. of SPIE, 4676: 359370, January 2002.

    Google Scholar 

  • Wan, V., and Campbell, W., “Support Vector Machines for Speaker Verification and Identification,” Proc. of the IEEE Signal Processing Society Workshop on Neural Networks, 2: 775–784, 2000.

    Google Scholar 

  • Yeung, M., Yeo, B.L., and Liu, B., “Extracting Story Units From Long Programs For Video Browsing and Navigation,” IEEE Proceedings of Multimedia, pages 296–305, 1996.

    Google Scholar 

  • Yeung, M., and Yeo, B.L., “Video Content Characterization and Compaction For Digital Library Applications,” Proc. of SPIE, 3022: 45–58, February 1997.

    Article  Google Scholar 

  • Zhang, T., and Kuo, C.-C., “Audio Content Analysis For On-line Audiovisual Data Segmentation,” IEEE Transactions on Speech and Audio Processing, 9 (4): 441–457, 2001.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Li, Y., Narayanan, S., Kuo, CC.J. (2003). Movie Content Analysis, Indexing and Skimming Via Multimodal Information. In: Rosenfeld, A., Doermann, D., DeMenthon, D. (eds) Video Mining. The Springer International Series in Video Computing, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6928-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-6928-9_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5383-4

  • Online ISBN: 978-1-4757-6928-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics