Skip to main content

Automatic Phone Alignment

A Comparison between Speaker-Independent Models and Models Trained on the Corpus to Align

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Abstract

Several automatic phonetic alignment tools have been proposed in the literature. They generally use speaker-independent acoustic models of the language to align new corpora. The problem is that the range of provided models is limited. It does not cover all languages and speaking styles (spontaneous, expressive, etc.). This study investigates the possibility of directly training the statistical model on the corpus to align. The main advantage is that it is applicable to any language and speaking style. Moreover, comparisons indicate that it provides as good or better results than using speaker-independent models of the language. It shows that about 2% are gained, with a 20 ms threshold, by using our method. Experiments were carried out on neutral and expressive corpora in French and English. The study also points out that even a small neutral corpus of a few minutes can be exploited to train a model that will provide high-quality alignment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kawai, H., Toda, T.: An evaluation of automatic phone segmentation for concatenative speech synthesis. In: Proc. of ICASSP 2004, Montreal, Canada, pp. 677–680 (2004)

    Google Scholar 

  2. Schiel, F., Draxler, C.: The production of speech corpora. Technical report, Bavarian Archive for Speech Signals (2003)

    Google Scholar 

  3. Goldman, J.P.: Easyalign: an automatic phonetic alignment tool under Praat. In: Proc. of Interspeech 2011, pp. 3233–3236 (2011)

    Google Scholar 

  4. Bigi, B., Hirst, D.: Speech phonetization alignment and syllabification (SPPAS): a tool for the automatic analysis of speech prosody. In: Proc. of Speech Prosody 2012 (2012)

    Google Scholar 

  5. Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proc. of Acoustics 2008, pp. 5687–5690 (2008)

    Google Scholar 

  6. Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)

    Article  Google Scholar 

  7. Adell, J., Bonafonte, A., Gomez, J.A., Castro, M.J.: Comparative study of automatic phone segmentation methods for TTS. In: Proc. of ICASSP 2005, pp. 309–312 (2005)

    Google Scholar 

  8. van Niekerk, D., Barnard, E.: Phonetic alignment for speech synthesis in under-resourced languages. In: Proc. of Interspeech 2009, Brighton, pp. 880–883 (2009)

    Google Scholar 

  9. Cangemi, F., Cutugno, F., Ludusan, B., Seppi, D., Van Compernolle, D.: Automatic speech segmentation for italian (ASSI): Tools, models, evaluation and applications. In: Proc. of AISV, Lecce, Italy, pp. 337–344 (2011)

    Google Scholar 

  10. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3). Cambridge University (1995)

    Google Scholar 

  11. Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. of Eurospeech 2001, pp. 1691–1694 (2001)

    Google Scholar 

  12. Toledano, D., Gómez, L.: HMMs for automatic phonetic segmentation. In: Proc. of LREC (2002)

    Google Scholar 

  13. Chen, L., Liu, Y., Harper, M., Maia, E., McRoy, S.: Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus. In: Proc. of LREC 2004, pp. 759–762 (2004)

    Google Scholar 

  14. Ljolje, A., Hirschberg, J., van Santen, J.: Automatic speech segmentation for concatenative inventory selection. In: Second ESCA/IEEE Workshop on Speech Synthesis, pp. 93–96 (1994)

    Google Scholar 

  15. Colotte, V., Beaufort, R.: Linguistic features weighting for a text-to-speech system without prosody model. In: Proc. of Interspeech 2005, pp. 2549–2552 (2005)

    Google Scholar 

  16. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proc. of ICSLP, pp. 1970–1973 (1996)

    Google Scholar 

  17. Cosi, P., Falavigna, D., Omologo, M.: A preliminary statistical evaluation of manual and automatic segmentation discrepancies. In: Proc. of Eurospeech 1991, pp. 693–696 (1991)

    Google Scholar 

  18. MacLean, K.: VoxForge (2006-2012), http://www.voxforge.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brognaux, S., Roekhaut, S., Drugman, T., Beaufort, R. (2012). Automatic Phone Alignment. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33983-7_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33982-0

  • Online ISBN: 978-3-642-33983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics