Skip to main content
Log in

A new approach to speech segmentation based on the maximum likelihood

  • Published:
Circuits, Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Successful speech recognition is highly dependent on appropriate speech segmentation. The poor efficiency of the sequential detection of abrupt changes in the signals with relatively short stationary intervals, as is the case with speech signals, can be improved by the off-line maximum likelihood segmentation algorithm. In this paper the new segmentation algorithm is presented. For the a priori known number of segments, the algorithm determines such signal partitions for which the sum of segment distortion is minimal. The generalized maximum likelihood distortion measure has been introduced, and has proven to be particularly efficient on short signal segments. In the case of an unknown number of segments, its estimate is obtained comparing the reduction of the distortion. The asymptotic properties of the distortion sequence have been analyzed, which led to the definition of the presented segmentation algorithm. The introduced measure can be applied both to the AR and ARMA models. The segmentation algorithm is verified on test signals as well as on the natural speech signal, for which the pitch synchronous framing scheme is applied. The experimental results also include a comparison of the AR and ARMA model-based segmentations. The first results show that ARMA model-based segmentation gives somewhat better results than the AR model algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Andre-Obrecht, A new statistical approach for the automatic segmentation of continuous speech signals,IEEE Trans. Acoust. Speech Signal Process. ASSP-36, no. 1, January 1988, pp. 29–40.

    Google Scholar 

  2. U. Appel and A. V. Brandt, Adaptive sequential segmentation of piecewise stationary time series,Information Science, vol. 29, no. 1, 1983, pp. 17–56.

    Google Scholar 

  3. M. Basseville and A. Beneviste, eds.,Detection of Abrupt Changes in Signals and Dynamical Systems, Springer-Verlag, Berlin and New York, 1986.

    Google Scholar 

  4. B. Friedlander, Lattice filters for adaptive processing,Proc. IEEE, vol. 70, no. 8, August, 1982, pp. 829–867.

    Google Scholar 

  5. W. Hess,Pitch Determination of Speech Signals, Springer-Verlag, Berlin and New York, 1983.

    Google Scholar 

  6. F. Itakura and S. Saito, A statistical method for estimation of speech spectral density and formant frequencies,Electron, and Commun., vol. 53-A, 1970, pp. 36–43.

    Google Scholar 

  7. I. Konvalinka and M. Milosavljević, Sequential detection of the speech signal stationarity boundaries,Proc. XXIX ETAN Conf., Niš, vol. IV, pp. 141–146, June 1985, (in Serbian).

    Google Scholar 

  8. Ashok K. Krishnamurthy and Donald G. Childers, Two-channels speech analysis,IEEE Trans. Acoust. Speech Signal Process. ASSP-34, no. 4, August 1986, pp. 730–742.

    Google Scholar 

  9. Chin-Hui Lee, Frank K. Song, and Biing-Hwang Juang, A segment model based approach to speech recognition,IC ASSP, 1988, pp. 501–504.

  10. J. D. Markel and A. H. Gray, Jr.,Linear Prediction of Speech, Springer-Verlag, Berlin and New York, 1976.

    Google Scholar 

  11. Yoshiaki Miyoshi, Kazuharu Yamato, Riichiro Mizoguchi, Masuzo Yanagida, and Osamu Kakusho, Analysis of speech signals of short pitch period by a sample-selective linear prediction,IEEE Trans. Acoust. Speech Signal Process. ASSP-35, no. 9, September 1987, pp. 1233–1239.

    Google Scholar 

  12. Zoran šarić, Reducing the speech signal pitch influence to AR parameters estimation using weighted sum of squares errors,XXXII Yugoslavian Conference ETAN, June 1988, pp. 177–184, Sarajevo (in Serbian).

  13. Zoran šarić and Srbijank R. Turajlić, Estimation and setting starting values in ARMA algorithms,Circuits Systems Signal Process., vol. 12, no. 1, 1993, pp. 85–103.

    Google Scholar 

  14. T. Svedsen and F. K. Soong, On the automatic segmentation of speech signals,IC ASSP, 1987, pp. 77–80.

  15. E. Vidal and A. Marzal, A review and new approaches for automatic segmentation of speech signal,Proc. of EUSIPCO-90, Barcelona (Spain), September 1990, vol. 1, pp. 43–54.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Research supported in part by the Mathematical Institute of the Serbian Science Academy and Serbian Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

šarić, Z.M., Turajlić, S.R. A new approach to speech segmentation based on the maximum likelihood. Circuits Systems and Signal Process 14, 615–632 (1995). https://doi.org/10.1007/BF01213958

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01213958

Keywords

Navigation