Skip to main content

Improving the Accuracy of the Speech Synthesis Based Phonetic Alignment Using Multiple Acoustic Features

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2721))

Abstract

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou and A. Syrdal, The AT&T Next-Gen TTS System, 137th Acoustical Society of America meeting, Berlin, Germany, 1999.

    Google Scholar 

  2. A. Black, CHATR, Version 0.8, a generic speech synthesizer, System documentation, ATR-Interpreting Telecomunications Laboratories, Kyoto, Japan, 1996.

    Google Scholar 

  3. Sakoe H. and Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. on ASSP, 26(1):43–49, 1978.

    Article  MATH  Google Scholar 

  4. S. Paulo and L. Oliveira, Multilevel Annotation of Speech Signals Using Weighted Finite State Transducers. In Proceedings of IEEE 2002 Workshop on Speech Synthesis, Santa Monica, California, 2002.

    Google Scholar 

  5. D. Caseiro, H. Meinedo, A. Serralheiro, I. Trancoso and J. Neto, Spoken Book alignment using WFST HLT 2002 Human Language Technology Conference, San Diego, California, 2002.

    Google Scholar 

  6. F. Malfrère and T. Dutoit, High-Quality Speech Synthesis for Phonetic Speech Segmentation. In Proceedings of Eurospeech’97, Rhodes, Greece, 1997.

    Google Scholar 

  7. N. Campbell, Autolabelling Japanese TOBI. In Proceedings of ICSLP’96, Philadelphia, USA, 1996.

    Google Scholar 

  8. A. Black, P. Taylor and R. Caley, The Festival Speech Synthesis System. System documentation Edition 1.4, for Festival Version 1.4.0, 17th June 1999.

    Google Scholar 

  9. P. Taylor R. Caley, A. Black, S. King, Edinburgh Speech Tools Library System Documentation Edition 1.2, 15th June 1999.

    Google Scholar 

  10. ESPS Programs Version 5.3 Entropic Research Laboratories Inc., 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paulo, S., Oliveira, L.C. (2003). Improving the Accuracy of the Speech Synthesis Based Phonetic Alignment Using Multiple Acoustic Features. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45011-4_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40436-1

  • Online ISBN: 978-3-540-45011-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics