International Journal of Speech Technology

, Volume 2, Issue 3, pp 215–225 | Cite as

Interpolation of pitch contour using temporal decomposition

  • Shahrokh Ghaemmaghami
  • Mohamed Deriche
  • Boualem Boashash
Article
  • 60 Downloads

Abstract

A new method for predicting pitch contour of a speech signal using a small number of pitch values is addressed, for the application of very low rate speech coding, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments. To track the phonetic evolution and specify perceptually significant time points, Temporal Decomposition (TD) is used. TD provides information required for both determination of critical pitch values and estimation of pitch contour by detecting event functions, as interpolation paths, and their centroids, as the most steady points, in the spectral parameters space. It is shown that the proposed method reduces the amount of pitch information to about one-tenth of that in conventional frame-by-frame based techniques with less than 5% error in pitch approximation.

Keywords

very low-rate speech coding pitch detection pitch interpolation temporal decomposition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahlbom, G., Bimbot, F., and Chollet, G. (1987). Modeling spectral speech transitions using temporal decomposition techniques.Proc. ICASSP'87, pp. 13–16.Google Scholar
  2. Atal, B.S. (1983). Efficient coding of LPC parameters by temporal decomposition.Proc. ICASSP'83, pp. 81–84.Google Scholar
  3. Bimbot, F. and Atal, B.S. (1991). An evaluation of temporal decomposition.Proc. EUROSPEECH'91, pp. 1089–1092.Google Scholar
  4. Blumstein, S.E. and Stevens, K.N. (1979). Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants.J. Acoust. Soc. Am., 66(4): 1001–1017.Google Scholar
  5. Campbell, J.P., Jr. and Tremain, T.E. (1986). Voiced/unvoiced classification of speech with application to the U.S. government LPC-10E algorithm.Proc. ICASSP'86, pp. 473–476.Google Scholar
  6. Childers, D.G. and Wu, K. (1990). Quality of speech produced by analysis-synthesis.Speech Comm., 9:97–117.Google Scholar
  7. Chung, J.H. and Schafer, R.W. (1990). Excitation modeling in a homomorphic vocoder.Proc. ICASSP'90, vol. 2, pp. 25–28.Google Scholar
  8. Ghaemmaghami, S. and Deriche, M. (1996). A new approach to very low-rate speech coding using temporal decomposition.Proc. ICASSP'96, vol. 1, pp. 224–227.Google Scholar
  9. Ghaemmaghami, S., Deriche, M., and Boashash, B. (1997a). Comparative study of different parameters for temporal decomposition based speech coding.Proc. ICASSP'97, vol. 3, pp. 1703–1706.Google Scholar
  10. Ghaemmaghami, S., Deriche, M., and Boashash, B. (1997b). On modeling event functions in temporal decomposition based speech coding.EUROSPEECH'97, vol. 3, pp. 1299–1302.Google Scholar
  11. Golub, G.H. and Van Loan, C.F. (1983).Matrix Computation. North Oxford Academic.Google Scholar
  12. Gong, Y. and Haton, J. (1987). Time domain harmonic matching pitch estimation using time dependent speech modeling.IEEE Trans. ASSP, ASSP-35(10): 1386–1400.Google Scholar
  13. Harris, M.S. and Umeda, N. (1987). Difference limens for fundamental frequency contours in sentences.J. Acoust. Soc. Am., 81(4): 1139–1145.Google Scholar
  14. Hess, W.J. (1983).Pitch Determination of Speech Signals: Algorithms and Devices. Springer-Verlag.Google Scholar
  15. Kleijn, W.B. and Haagen, J. (1995). A speech coder based on decomposition of characteristic waveforms.Proc. ICASSP'95, vol. 1, pp. 508–511.Google Scholar
  16. Knagenhjelm, H.P.W. and Kleijn, B. (1995). Spectral dynamics is more important than spectral distortion.Proc. ICASSP'95, vol. 1, pp. 732–735.Google Scholar
  17. Mouy, B., De La Noue, P., and Goudezeune, G. (1995). NATO STANAG 4479: A standard for an 800 BPS vocoder and channel coding in HF-ECCM system.Proc. ICASSP'95, vol. 1, pp. 480–483.Google Scholar
  18. O'Shaughnessy, D. (1987).Speech Communication: Human and Machine. Addison-Wesley Pub. Co.Google Scholar
  19. Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., and McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms.IEEE Trans. ASSP, ASSP-24(5):399–418.Google Scholar
  20. Roucos, S., Schwartz, R., and Makhoul, J. (1983). A segment vocoder at 150 bits/s.Proc. ICASSP'83, pp. 61–64.Google Scholar
  21. Schwartz, R.M. and Roucos, S.E. (1983). A comparison of methods for 300–400 bits/s vocoders.Proc. ICASSP'83, pp. 69–72.Google Scholar
  22. Sekey, A. and Hanson, B.A. (1984). Improved 1-bark bandwidth auditory filter.J. Acoust. Soc. Am., 75(6): 1902–1904.Google Scholar
  23. Shiraki, Y. and Honda, M. (1988). LPC speech coding based on variable-length segment quantization.IEEE Trans. ASSP, ASSP-36:1437–1444.Google Scholar
  24. Taori, R., Sluijter, and Kathmann, E. (1995). Speech compression using pitch synchronous interpolation.Proc. ICASSP'95, vol. 1, pp. 512–515.Google Scholar
  25. Van Dijk-Kappers, A.M.L. (1989). Comparison of parameter sets for temporal decomposition.Speech Comm., 8(3):204–220.Google Scholar
  26. Wilgus, A.M. and Barnwell, T.P. (1983). Data rate reduction of gain and pitch parameters in an LPC vocoder.Proc. ICASSP'83, pp. 77–80.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Shahrokh Ghaemmaghami
    • 1
  • Mohamed Deriche
    • 1
  • Boualem Boashash
    • 1
  1. 1.Signal Processing Research Centre, School of Electrical and Electronic Systems EngineeringQueensland University of TechnologyBrisbaneAustralia

Personalised recommendations