Automatic Pitch-Synchronous Phonetic Segmentation with Context-Independent HMMs

  • Jindřich Matoušek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5729)


This paper deals with an HMM-based automatic phonetic segmentation (APS) system. In particular, the use of a pitch-synchronous (PS) coding scheme within the context-independent (CI) HMM-based APS system is examined and compared to the “more traditional” pitch-asynchronous (PA) coding schemes for a given Czech male voice. For bootstrap-initialised CI-HMMs, exploited when some (manually) pre-segmented data are available, the proposed PS coding scheme performed best, especially in combination with CART-based refinement of the automatically segmented boundaries. For flat-start-initialised CI-HMMs, an inferior initialisation method used when no pre-segmented data are at disposal, standard PA coding schemes with longer parameterization shifts yielded better results. The results are also compared to the results obtained for APS systems with context-dependent (CD) HMMs. It was shown that, at least for the researched male voice, multiple-mixture CI-HMMs outperform CD-HMMs in the APS task.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Matoušek, J., Tihelka, D., Romportl, J.: Automatic Segmentation for Czech Concatenative Speech Synthesis Using Statistical Approach with Boundary-Specific Correction. In: Proceedings of Interspeech, Geneve, Switzerland, pp. 301–304 (2003)Google Scholar
  2. 2.
    Toledano, D., Gómez, L., Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6), 617–625 (2003)CrossRefGoogle Scholar
  3. 3.
    Adell, J., Bonafonte, A.: Towards Phone Segmentation for Concatenative Speech Synthesis. In: Proceedings of Speech Synthesis Workshop, Pittsburgh, U.S.A, pp. 139–144 (2004)Google Scholar
  4. 4.
    Lee, K.S.: MLP-Based Phone Boundary Refining for a TTS Database. IEEE Transactions on Audio, Speech and Language Processing 14(3), 981–989 (2006)CrossRefGoogle Scholar
  5. 5.
    Park, S.S., Kim, N.S.: On Using Multiple Models for Automatic Speech Segmentation. IEEE Transactions on Audio, Speech and Language Processing 15(8), 2202–2212 (2007)CrossRefGoogle Scholar
  6. 6.
    Matoušek, J., Romportl, J.: Automatic Pitch-Synchronous Phonetic Segmentation. In: Proceedings of Interspeech, Brisbane, Australia, pp. 1626–1629 (2008)Google Scholar
  7. 7.
    Young, S., et al.: The HTK Book (for HTK Version 3.4). Cambridge University, Cambridge (2006)Google Scholar
  8. 8.
    Legát, M., Matoušek, J., Tihelka, D.: A Robust Multi-Phase Pitch-Mark Detection Algorithm. In: Proceedings of Interspeech, Antwerp, Belgium, pp. 1641–1644 (2007)Google Scholar
  9. 9.
    Matoušek, J., Tihelka, D., Romportl, J.: Current state of czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Matoušek, J., Tihelka, D., Romportl, J.: Building of a Speech Corpus Optimised for Unit Selection TTS Synthesis. In: Proceedings of International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)Google Scholar
  11. 11.
    Taylor, P., Caley, R., Black, A., King, S.: Edinburgh Speech Tools Library: System Documentation (1999),

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jindřich Matoušek
    • 1
  1. 1.Faculty of Applied Sciences, Dept. of CyberneticsUniversity of West BohemiaPlzeňCzech Republic

Personalised recommendations