Audio-visual processing for scene change detection
Abstract
The organization of video data-bases according to semantic content of data, is a key point in multimedia technologies. In fact, this would allow algorithms such as indexing and retrieval to work more efficiently.
As an attempt to extract semantic information, efforts have been devoted in segmenting the video in shots and for each shot trying to extract informations such as representative video frame, etc. As a video sequence is constructed from a 2-D projection of a 3-D scene, processing video information only has shown its limitations especially in solving problems such as object identification or object tracking. Further not all information is contained in the video signal and more can be achieved by analyzing the audio signal as well. Information can be obtained from the audio signal either to confirm the results obtained by a video processing unit or to acquire information that cannot be extracted from video (such as presence of music).
This paper presents a technique which combines video and audio information for classification and indexing purposes.
Keywords
Audio Signal Finite State Machine Scene Change Video Information Audio FrameReferences
- 1.L. Rabiner & B. H. Juang, Fundamentals of Speech Recognition, ed. Prentice Hall, 1994Google Scholar
- 2.P. De Souza, “A Statistical Approach to the Design of an Adaptive SelfNormalizing Silence Detector”, IEEE trans. Acoust., Speech, Signal Processing, ASSP-31(3):678–684, Jun. 1983.Google Scholar
- 3.H. Kobataker, “Optimization of Voiced/Unvoiced Decision in Nonstationary Noise Environments”, IEEE Transaction on Acoustic, Speech & Signal Proc., ASSP-35(1):9–18, Jan. 1987.Google Scholar
- 4.I. K. Sethi & N. Patel, “A Statistical Approach to Scene Change Detection”, Storage and Retrieval for Image and Video Databases III, SPIE-2420:329–338, Feb. 1995.Google Scholar
- 5.A. Hampapur, R. Jain and T Weymouth, “Digital Video Segmentation”, Proc. of Multimedia 94 Conf., San Francisco, pp. 357–363, 1994.Google Scholar
- 6.H. Zhang, C. Y. Low and S. W. Smoliar, “Video Parsing and Browsing Using Compressed Data”, Multimedia Tools and Applications, Kluwer Academic Publishers, Boston, Vol. 1, pp. 89–111, 1995.Google Scholar
- 7.J. Meng, Y. Juan & Shih-Fu Chang, “Scene Change Detection in a MPEG Compressed Video Sequence”, SPIE-2419:14–25, 1995.Google Scholar
- 8.J.W. Pitton, K. Wang and B.H. Juang, “Time-Frequency Analysis and Auditory Modeling for Automatic Recognition of 0Speech”, Proceedings of the IEEE, 84(9):1199–1215, Sep. 1996.CrossRefGoogle Scholar
- 9.G.R. Doddington, “Speaker Recognition Identifying People by their Voices”, Proceedings of the IEEE, 73(11):1651–1664, Nov. 1985.Google Scholar
- 10.J. Saunders, “Real-Time Discrimination of Broadcast Speech/Music” Proc. of the 1996 ICASSP Conf., 993–996, 1996.Google Scholar
- 11.M.M Yeung and B.L. Yeo, “Video content characterization and compaction for digital library application”, Storage and Retrieval for Image and Video Databases V,SPIE-3022, pp. 45–58, Feb. 1997.Google Scholar