Movie Content Analysis, Indexing and Skimming Via Multimodal Information

Li, Ying; Narayanan, Shrikanth; Kuo, C.-C. Jay

doi:10.1007/978-1-4757-6928-9_5

Ying Li³,
Shrikanth Narayanan⁴ &
C.-C. Jay Kuo⁴

Part of the book series: The Springer International Series in Video Computing ((VICO,volume 6))

179 Accesses
4 Citations

Abstract

A content-based movie analysis, indexing and skimming system is developed in this research. Specifically, it includes the following three major modules: 1) an event detection module, where three types of movie events, namely, two-speaker dialogs, multiple-speaker dialogs, and hybrid events are extracted from the content. Multiple media cues such as audio, speech, visual and face information are integrated to achieve this goal; 2) a speaker identification module, where an adaptive speaker identification scheme is proposed to recognize target movie cast members for content indexing purposes. Both audio and visual sources are exploited in the identification process, where the audio source is analyzed to recognize speakers using a likelihood-based approach, and the visual source is examined to locate talking faces with face detection/recognition and mouth tracking techniques; 3) a movie skimming module, where an event-based skimming system is developed to abstract movie content in the form of a short video clip for content browsing purposes. Extensive experiments on integrating multiple media cues for movie content analysis, indexing and skimming have yielded encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Block, B., The Visual Story: Seeing the Structure of Film, TV, and New Media. Massachusetts, Focal Press, 2001.
Google Scholar
Chen, S., and Gopalakrishnan, P, P., “Speaker, Environment and Channel Change Detection and Clustering Via the Bayesian Information Criterion,” Proc. of DARPA Broadcast News Transcription and Understanding Workshop, 1998.
Google Scholar
Hauptmann, A.G., and Smith, M.A., “Text, Speech, and Vision For Video Segmentation: The Informedia Project,” Proc. of the AAAI Fall Symposium on Computer Models for Integrating Language and Vision, 1995.
Google Scholar
HP Labs, Computational Video Group, “The HP Face Detection and Recognition Library,” User’s Guide and Reference Manual, Version 2. 2, December 1998.
Google Scholar
Huang, J., Liu, Z., and Wang, Y., “Integration of Audio and Visual Information For Content-based Video Segmentation,” ICIP’98, October 1998.
Google Scholar
Johnson, S.E., “Who Spoke When? - Automatic Segmentation and Clustering For Determining Speaker Turns,” Eurospeech’99, 1999.
Google Scholar
Li, D., Wei, G., Sethi, I.K., and Dimitrova, N., “Person Identification in TV Programs,” Journal of Electronic Imaging, 10 (4): 930–938, 2001.
Article Google Scholar
Li, M., Li, D., Dimitrova, N., and Sethi, I., “Audio-visual Talking Face Detection,” ICME’03, July 2003.
Google Scholar
Li, Q., Zheng, J., Zhou, Q., and Lee, C., “A Robust, Real-time Endpoint Detector With Energy Normalization For ASR in Adverse Environments,”, ICASSP’01, May 2001.
Google Scholar
Li, Y., and Kuo, C.-C., Content-based Video Analysis, Indexing and Representation Using Multimodal Information, Ph.D Thesis, University of Southern California, 2003.
Google Scholar
Li, Y., Narayanan, S., and Kuo, C.-C, C.-C., “Identification of Speakers in Movie Dialogs Using Audiovisual Cues,” ICASSP’02, Orlando, May 2002.
Google Scholar
Li, Y., Narayanan, S., and Kuo, C.-C., “Adaptive Speaker Identification with AudioVisual Cues For Movie Content Analysis,” Invited Paper in Pattern Recognition Letters with special issue on Recent Trends in Video Computing, 2003.
Google Scholar
Liu, F., Kim, J., and Kuo, C.-C., “Adaptive Delay Concealment For Internet Voice Applications with Packet-based Time-scale Modification,” ICASSP’01, 2001.
Google Scholar
Mardia, K., Kent, J., and Bibby, J., Multivariate Analysis. Academic Press, San Diego, 1979.
MATH Google Scholar
Martello, S., and Toth, P., Knapsack Problems: Algorithms and Computer Implementations. Chichester, NY, Wiley and Sons, 1990.
MATH Google Scholar
Mokbel, C., “Online Adaptation of HMMs to Real-life Conditions: A Unified Framework,” IEEE Transactions on Speech and Audio Processing, 9 (4): 342–357, May 2001.
Article Google Scholar
Monaco, J., How To Read A Film: The Art, Technology, Language, History and Theory of Film and Media, New York, Oxford University Press, 1982.
Google Scholar
MPEG Requirements Group, “MPEG-7 Context, Objectives and Technical Roadmap,” Doc. ISO/MPEG N2861, MPEG Vancouver Meeting, July 1999.
Google Scholar
Pfeiffer, S., Lienhart, R., Fischer, S., and Effelsberg, W., “Abstracting Digital Movies Automatically,” Journal of Visual Communication and Image Representation, 7 (4): 345–353, December 1996.
Article Google Scholar
Reisz, K., and Millar, G., The Technique of Film Editing. New York: Hastings House, Publishers, 1968.
Google Scholar
Reynolds, D., and Rose, R., “Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, 3 (1): 72–83, 1995.
Article Google Scholar
Rui, Y., Huang, T.S., and Mehrotra, S., “Constructing Table-of-content For Video,” ACM Journal of Multimedia Systems, 7 (5): 359–368, 1998.
Article Google Scholar
Smith, M., and Kanade, T., “Video Skimming and Characterization Through the Combination of Image and Language Understanding Techniques,” Proc. of the IEEE Computer Vision and Pattern Recognition, pages 775–781, 1997.
Google Scholar
Sundaram, H., and Chang, S.F., “Determining Computable Scenes in Films and Their Structures Using Audio-visual Memory Models,” ACM Multimedia’00, Marina Del Rey, November 2000.
Google Scholar
Tarkovsky, A., Sculpting in Time - Reflections on the Cinema, Austin, University of Texas Press, 1986.
Google Scholar
Taskiran, C.M., Amir, A., Ponceleon, D., and Delp, E.J., “Automated Video Summarization Using Speech Transcripts,” Proc. of SPIE, 4676: 37 1382, January 2002.
Google Scholar
Toklu, C., Liou, S.P., and Das, M, M., “Videoabstract: A Hybrid Approach To Generate Semantically Meaningful Video Summaries,” ICME’00, New York, 2000.
Google Scholar
Tsekeridou, S., and Pitas, I., “Content-based Video Parsing and Indexing Based on Audio-visual Interaction,” IEEE Transactions on Circuits and Systems for Video Technology, 11 (4): 522–535, 2001.
Article Google Scholar
Tseng, B.L., Lin, C.Y., and Smith, J.R., “Video Summarization and Personalization For Pervasive Mobile Devices,” Proc. of SPIE, 4676: 359370, January 2002.
Google Scholar
Wan, V., and Campbell, W., “Support Vector Machines for Speaker Verification and Identification,” Proc. of the IEEE Signal Processing Society Workshop on Neural Networks, 2: 775–784, 2000.
Google Scholar
Yeung, M., Yeo, B.L., and Liu, B., “Extracting Story Units From Long Programs For Video Browsing and Navigation,” IEEE Proceedings of Multimedia, pages 296–305, 1996.
Google Scholar
Yeung, M., and Yeo, B.L., “Video Content Characterization and Compaction For Digital Library Applications,” Proc. of SPIE, 3022: 45–58, February 1997.
Article Google Scholar
Zhang, T., and Kuo, C.-C., “Audio Content Analysis For On-line Audiovisual Data Segmentation,” IEEE Transactions on Speech and Audio Processing, 9 (4): 441–457, 2001.
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, 19 Skyline Dr, Hawthorne, NY, USA, 10532
Ying Li
Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA, 90089
Shrikanth Narayanan & C.-C. Jay Kuo

Authors

Ying Li
View author publications
You can also search for this author in PubMed Google Scholar
Shrikanth Narayanan
View author publications
You can also search for this author in PubMed Google Scholar
C.-C. Jay Kuo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Maryland, College Park, MD, USA
Azriel Rosenfeld , David Doermann & Daniel DeMenthon , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, Y., Narayanan, S., Kuo, CC.J. (2003). Movie Content Analysis, Indexing and Skimming Via Multimodal Information. In: Rosenfeld, A., Doermann, D., DeMenthon, D. (eds) Video Mining. The Springer International Series in Video Computing, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6928-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-4757-6928-9_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5383-4
Online ISBN: 978-1-4757-6928-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics