A Framework for Dialogue Detection in Movies
In this paper, we investigate a novel framework for dialogue detection that is based on indicator functions. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed and assessed. The first rule relies on the value of the cross-correlation function at zero time lag that is compared to a threshold. The second rule is based on the cross-power in a particular frequency band that is also compared to a threshold. Experiments are carried out in order to validate the feasibility of the aforementioned dialogue detection rules by using ground-truth indicator functions determined by human observers from six different movies. A total of 25 dialogue scenes and another 8 non-dialogue scenes are employed. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold.
KeywordsFalse Alarm Indicator Function Face Detection Training Sequence Audio Stream
Unable to display preview. Download preview PDF.
- 2.Chen, L., Özsu, M.T.: Rule-based extraction from video. In: Proc. 2002 IEEE Int. Conf. Image Processing, vol. II, pp. 737–740 (2002)Google Scholar
- 3.Král, P., Cerisara, C., Kleckova, J.: Combination of classifiers for automatic recognition of dialogue acts. In: Proc. 9th European Conf. Speech Communication and Technology, pp. 825–828 (2005)Google Scholar
- 4.Lehane, B., O’Connor, N., Murphy, N.: Dialogue scene detection in movies using low and mid-level visual features. In: Proc. Int. Conf. Image and Video Retrieval, pp. 286–296 (2005)Google Scholar
- 5.Arijon, D.: Grammar of the Film Language. Silman-James Press (1991)Google Scholar
- 6.Vassiliou, A., Salway, A., Pitt, D.: Formalising stories: sequences of events and state changes. In: Proc. 2004 IEEE Int. Conf. Multimedia and Expo., Hong-Kong, Taiwan, vol. I, pp. 587–590 (2004)Google Scholar
- 7.Iyengal, G., Nock, H.J., Neti, C.: Audio-visual synchrony for detection of monologues in video archives. In: Proc. 2003 IEEE lnt. Conf. Acoustics, Speech, and Signal Processing, Hong Kong, April 2003, vol. I, pp. 329–332 (2003)Google Scholar
- 9.Kotti, M., Benetos, E., Kotropoulos, C.: Automatic speaker change detection with the bayesian information criterion using MPEG-7 features and a fusion scheme. In: Proc. 2006 IEEE Int. Symp. Circuits and Systems, Kos, Greece (May 2006)Google Scholar
- 10.Lu, L., Zhang, H.: Speaker change detection and tracking in real-time news broadcast analysis. In: Proc. 2004 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. I, pp. 741–744 (June 2004)Google Scholar
- 11.Papoulis, A., Pillai, S.V.: Probabilities, Random Variables, and Stochastic Processes, 4th edn. McGraw-Hill, NY (2002)Google Scholar
- 12.Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)Google Scholar
- 13.Boys, R.J., Henderson, D.A.: A Bayesian approach to DNA sequence segmetation. In: Proc. 2004 Biometrics, vol. 60(3), p. 573 (September 2004)Google Scholar