Automatic Phone Alignment

Brognaux, Sandrine; Roekhaut, Sophie; Drugman, Thomas; Beaufort, Richard

doi:10.1007/978-3-642-33983-7_30

Automatic Phone Alignment

A Comparison between Speaker-Independent Models and Models Trained on the Corpus to Align

Sandrine Brognaux^20,21,
Sophie Roekhaut²¹,
Thomas Drugman²² &
…
Richard Beaufort²³

Conference paper

1579 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Abstract

Several automatic phonetic alignment tools have been proposed in the literature. They generally use speaker-independent acoustic models of the language to align new corpora. The problem is that the range of provided models is limited. It does not cover all languages and speaking styles (spontaneous, expressive, etc.). This study investigates the possibility of directly training the statistical model on the corpus to align. The main advantage is that it is applicable to any language and speaking style. Moreover, comparisons indicate that it provides as good or better results than using speaker-independent models of the language. It shows that about 2% are gained, with a 20 ms threshold, by using our method. Experiments were carried out on neutral and expressive corpora in French and English. The study also points out that even a small neutral corpus of a few minutes can be exploited to train a model that will provide high-quality alignment.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kawai, H., Toda, T.: An evaluation of automatic phone segmentation for concatenative speech synthesis. In: Proc. of ICASSP 2004, Montreal, Canada, pp. 677–680 (2004)
Google Scholar
Schiel, F., Draxler, C.: The production of speech corpora. Technical report, Bavarian Archive for Speech Signals (2003)
Google Scholar
Goldman, J.P.: Easyalign: an automatic phonetic alignment tool under Praat. In: Proc. of Interspeech 2011, pp. 3233–3236 (2011)
Google Scholar
Bigi, B., Hirst, D.: Speech phonetization alignment and syllabification (SPPAS): a tool for the automatic analysis of speech prosody. In: Proc. of Speech Prosody 2012 (2012)
Google Scholar
Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proc. of Acoustics 2008, pp. 5687–5690 (2008)
Google Scholar
Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Adell, J., Bonafonte, A., Gomez, J.A., Castro, M.J.: Comparative study of automatic phone segmentation methods for TTS. In: Proc. of ICASSP 2005, pp. 309–312 (2005)
Google Scholar
van Niekerk, D., Barnard, E.: Phonetic alignment for speech synthesis in under-resourced languages. In: Proc. of Interspeech 2009, Brighton, pp. 880–883 (2009)
Google Scholar
Cangemi, F., Cutugno, F., Ludusan, B., Seppi, D., Van Compernolle, D.: Automatic speech segmentation for italian (ASSI): Tools, models, evaluation and applications. In: Proc. of AISV, Lecce, Italy, pp. 337–344 (2011)
Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3). Cambridge University (1995)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. of Eurospeech 2001, pp. 1691–1694 (2001)
Google Scholar
Toledano, D., Gómez, L.: HMMs for automatic phonetic segmentation. In: Proc. of LREC (2002)
Google Scholar
Chen, L., Liu, Y., Harper, M., Maia, E., McRoy, S.: Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus. In: Proc. of LREC 2004, pp. 759–762 (2004)
Google Scholar
Ljolje, A., Hirschberg, J., van Santen, J.: Automatic speech segmentation for concatenative inventory selection. In: Second ESCA/IEEE Workshop on Speech Synthesis, pp. 93–96 (1994)
Google Scholar
Colotte, V., Beaufort, R.: Linguistic features weighting for a text-to-speech system without prosody model. In: Proc. of Interspeech 2005, pp. 2549–2552 (2005)
Google Scholar
Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proc. of ICSLP, pp. 1970–1973 (1996)
Google Scholar
Cosi, P., Falavigna, D., Omologo, M.: A preliminary statistical evaluation of manual and automatic segmentation discrepancies. In: Proc. of Eurospeech 1991, pp. 693–696 (1991)
Google Scholar
MacLean, K.: VoxForge (2006-2012), http://www.voxforge.org

Download references

Author information

Authors and Affiliations

ICTEAM, Université catholique de Louvain, Belgium
Sandrine Brognaux
CENTAL, Université catholique de Louvain, Belgium
Sandrine Brognaux & Sophie Roekhaut
TCTS Lab, Université de Mons, Belgium
Thomas Drugman
Nuance Communications, Inc., Belgium
Richard Beaufort

Authors

Sandrine Brognaux
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Roekhaut
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Drugman
View author publications
You can also search for this author in PubMed Google Scholar
Richard Beaufort
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information and Media Center, Toyohashi Universtiy of Technology, 1-1 Hibarigaoka, Tenpakucho, 441-8580, Toyohashi, Japan
Hitoshi Isahara & Kyoko Kanzaki &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brognaux, S., Roekhaut, S., Drugman, T., Beaufort, R. (2012). Automatic Phone Alignment. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-33983-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33982-0
Online ISBN: 978-3-642-33983-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics