Abstract
The organization of video data-bases according to semantic content of data, is a key point in multimedia technologies. In fact, this would allow algorithms such as indexing and retrieval to work more efficiently.
As an attempt to extract semantic information, efforts have been devoted in segmenting the video in shots and for each shot trying to extract informations such as representative video frame, etc. As a video sequence is constructed from a 2-D projection of a 3-D scene, processing video information only has shown its limitations especially in solving problems such as object identification or object tracking. Further not all information is contained in the video signal and more can be achieved by analyzing the audio signal as well. Information can be obtained from the audio signal either to confirm the results obtained by a video processing unit or to acquire information that cannot be extracted from video (such as presence of music).
This paper presents a technique which combines video and audio information for classification and indexing purposes.
Chapter PDF
References
L. Rabiner & B. H. Juang, Fundamentals of Speech Recognition, ed. Prentice Hall, 1994
P. De Souza, “A Statistical Approach to the Design of an Adaptive SelfNormalizing Silence Detector”, IEEE trans. Acoust., Speech, Signal Processing, ASSP-31(3):678–684, Jun. 1983.
H. Kobataker, “Optimization of Voiced/Unvoiced Decision in Nonstationary Noise Environments”, IEEE Transaction on Acoustic, Speech & Signal Proc., ASSP-35(1):9–18, Jan. 1987.
I. K. Sethi & N. Patel, “A Statistical Approach to Scene Change Detection”, Storage and Retrieval for Image and Video Databases III, SPIE-2420:329–338, Feb. 1995.
A. Hampapur, R. Jain and T Weymouth, “Digital Video Segmentation”, Proc. of Multimedia 94 Conf., San Francisco, pp. 357–363, 1994.
H. Zhang, C. Y. Low and S. W. Smoliar, “Video Parsing and Browsing Using Compressed Data”, Multimedia Tools and Applications, Kluwer Academic Publishers, Boston, Vol. 1, pp. 89–111, 1995.
J. Meng, Y. Juan & Shih-Fu Chang, “Scene Change Detection in a MPEG Compressed Video Sequence”, SPIE-2419:14–25, 1995.
J.W. Pitton, K. Wang and B.H. Juang, “Time-Frequency Analysis and Auditory Modeling for Automatic Recognition of 0Speech”, Proceedings of the IEEE, 84(9):1199–1215, Sep. 1996.
G.R. Doddington, “Speaker Recognition Identifying People by their Voices”, Proceedings of the IEEE, 73(11):1651–1664, Nov. 1985.
J. Saunders, “Real-Time Discrimination of Broadcast Speech/Music” Proc. of the 1996 ICASSP Conf., 993–996, 1996.
M.M Yeung and B.L. Yeo, “Video content characterization and compaction for digital library application”, Storage and Retrieval for Image and Video Databases V,SPIE-3022, pp. 45–58, Feb. 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saraceno, C., Leonardi, R. (1997). Audio-visual processing for scene change detection. In: Del Bimbo, A. (eds) Image Analysis and Processing. ICIAP 1997. Lecture Notes in Computer Science, vol 1311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63508-4_114
Download citation
DOI: https://doi.org/10.1007/3-540-63508-4_114
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63508-6
Online ISBN: 978-3-540-69586-8
eBook Packages: Springer Book Archive