Abstract
Successful speech recognition is highly dependent on appropriate speech segmentation. The poor efficiency of the sequential detection of abrupt changes in the signals with relatively short stationary intervals, as is the case with speech signals, can be improved by the off-line maximum likelihood segmentation algorithm. In this paper the new segmentation algorithm is presented. For the a priori known number of segments, the algorithm determines such signal partitions for which the sum of segment distortion is minimal. The generalized maximum likelihood distortion measure has been introduced, and has proven to be particularly efficient on short signal segments. In the case of an unknown number of segments, its estimate is obtained comparing the reduction of the distortion. The asymptotic properties of the distortion sequence have been analyzed, which led to the definition of the presented segmentation algorithm. The introduced measure can be applied both to the AR and ARMA models. The segmentation algorithm is verified on test signals as well as on the natural speech signal, for which the pitch synchronous framing scheme is applied. The experimental results also include a comparison of the AR and ARMA model-based segmentations. The first results show that ARMA model-based segmentation gives somewhat better results than the AR model algorithm.
Similar content being viewed by others
References
R. Andre-Obrecht, A new statistical approach for the automatic segmentation of continuous speech signals,IEEE Trans. Acoust. Speech Signal Process. ASSP-36, no. 1, January 1988, pp. 29–40.
U. Appel and A. V. Brandt, Adaptive sequential segmentation of piecewise stationary time series,Information Science, vol. 29, no. 1, 1983, pp. 17–56.
M. Basseville and A. Beneviste, eds.,Detection of Abrupt Changes in Signals and Dynamical Systems, Springer-Verlag, Berlin and New York, 1986.
B. Friedlander, Lattice filters for adaptive processing,Proc. IEEE, vol. 70, no. 8, August, 1982, pp. 829–867.
W. Hess,Pitch Determination of Speech Signals, Springer-Verlag, Berlin and New York, 1983.
F. Itakura and S. Saito, A statistical method for estimation of speech spectral density and formant frequencies,Electron, and Commun., vol. 53-A, 1970, pp. 36–43.
I. Konvalinka and M. Milosavljević, Sequential detection of the speech signal stationarity boundaries,Proc. XXIX ETAN Conf., Niš, vol. IV, pp. 141–146, June 1985, (in Serbian).
Ashok K. Krishnamurthy and Donald G. Childers, Two-channels speech analysis,IEEE Trans. Acoust. Speech Signal Process. ASSP-34, no. 4, August 1986, pp. 730–742.
Chin-Hui Lee, Frank K. Song, and Biing-Hwang Juang, A segment model based approach to speech recognition,IC ASSP, 1988, pp. 501–504.
J. D. Markel and A. H. Gray, Jr.,Linear Prediction of Speech, Springer-Verlag, Berlin and New York, 1976.
Yoshiaki Miyoshi, Kazuharu Yamato, Riichiro Mizoguchi, Masuzo Yanagida, and Osamu Kakusho, Analysis of speech signals of short pitch period by a sample-selective linear prediction,IEEE Trans. Acoust. Speech Signal Process. ASSP-35, no. 9, September 1987, pp. 1233–1239.
Zoran šarić, Reducing the speech signal pitch influence to AR parameters estimation using weighted sum of squares errors,XXXII Yugoslavian Conference ETAN, June 1988, pp. 177–184, Sarajevo (in Serbian).
Zoran šarić and Srbijank R. Turajlić, Estimation and setting starting values in ARMA algorithms,Circuits Systems Signal Process., vol. 12, no. 1, 1993, pp. 85–103.
T. Svedsen and F. K. Soong, On the automatic segmentation of speech signals,IC ASSP, 1987, pp. 77–80.
E. Vidal and A. Marzal, A review and new approaches for automatic segmentation of speech signal,Proc. of EUSIPCO-90, Barcelona (Spain), September 1990, vol. 1, pp. 43–54.
Author information
Authors and Affiliations
Additional information
Research supported in part by the Mathematical Institute of the Serbian Science Academy and Serbian Science Foundation.
Rights and permissions
About this article
Cite this article
šarić, Z.M., Turajlić, S.R. A new approach to speech segmentation based on the maximum likelihood. Circuits Systems and Signal Process 14, 615–632 (1995). https://doi.org/10.1007/BF01213958
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01213958