Advertisement

A Phonetic Segmentation Procedure Based on Hidden Markov Models

  • Edvin Pakoci
  • Branislav PopovićEmail author
  • Nikša Jakovljević
  • Darko Pekar
  • Fathy Yassa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)

Abstract

In this paper, a novel variant of an automatic phonetic segmentation procedure is presented, especially useful if data is scarce. The procedure uses the Kaldi speech recognition toolkit as its basis, and combines and modifies several existing methods and Kaldi recipes. Both the specifics of model training and test data alignment are explained in detail. Effectiveness of artificial extension of the starting amount of manually labeled material during training is examined as well. Experimental results show the admirable overall correctness of the proposed procedure in the given test environment. Several variants of the procedure are compared, and the usage of speaker-adapted context-dependent triphone models trained without the expanded manually checked data is proven to produce the best results. A few ways to improve the procedure even more, as well as future work, are also discussed.

Keywords

Kaldi Phonetic segmentation Hidden Markov models 

Notes

Acknowledgments

This research was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia, under Grant No. TR32035. The authors are grateful to the company “Speech Morphing, Inc.” from Campbell, CA, USA, for providing the speech corpora for the experiments.

References

  1. 1.
    Brognaux, S., Roekhaut, S., Drugman, T., Beaufort, R.: Train&Align: a new online tool for automatic phonetic alignment. In: Spoken Language Technology Workshop (SLT), pp. 416–421. IEEE Signal Processing Society (2012)Google Scholar
  2. 2.
    Scharenborg, O., Ernestus, M., Wan, V.: Segmentation of speech: child’s play? In: 8th Annual Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, pp. 1953–1956 (2007)Google Scholar
  3. 3.
    Esposito, A., Aversano, G.: Text independent methods for speech segmentation. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling. LNCS (LNAI), vol. 3445, pp. 261–290. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Leow, S.J., Chng, E.S., Lee, C.H.: Language-resource independent speech segmentation using cues from a spectrogram image. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, pp. 5813–5817 (2015)Google Scholar
  5. 5.
    Priyadarsini, S., Kumar, A.: Automatic speech segmentation in syllable centric speech recognition system. J. Speech Technol. 19(1), 9–18 (2016)CrossRefGoogle Scholar
  6. 6.
    Almpanidis, G., Kotti, M., Kotropoulos, C.: Robust detection of phone boundaries using model selection criteria with few observations. IEEE Trans. Audio Speech Lang. Process. 17(2), 287–298 (2009). IEEE Signal Processing SocietyCrossRefGoogle Scholar
  7. 7.
    Bigi, B.: SPPAS: a tool for the phonetic segmentations of speech. In: 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, pp. 1748–1755 (2012)Google Scholar
  8. 8.
    Boeffard, O., Charonnat, L., Le Maguer, S., Lolive, D., Vidal, G.: Towards fully automatic annotation of audio books for TTS. In: 8th International Conference on Language Resources and Evaluation (LREC), Instanbul, pp. 975–980 (2012)Google Scholar
  9. 9.
    Brognaux, S., Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 5–15 (2016). IEEE Signal Processing SocietyCrossRefGoogle Scholar
  10. 10.
    Hoffmann, S., Pfister, B.: Fully automatic segmentation for prosodic speech corpora. In: 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), Makuhari, pp. 1389–1392 (2010)Google Scholar
  11. 11.
    Hoffmann, S., Pfister, B.: Text-to-speech alignment of long recordings using universal phone models. In: 14th Annual Conference of the International Speech Communication Association (INTERSPEECH), Lyon, pp. 1520–1524 (2013)Google Scholar
  12. 12.
    Matoušek, J.: Automatic pitch-synchronous phonetic segmentation with context-independent HMMs. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 178–185. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Stan, A., Mamiya, Y., Yamagishi, J., Bell, P., Watts, O., Clark, R.A.J., King, S.: ALISA: an automatic lightly supervised speech segmentation and alignment tool. J. Comput. Speech Lang. 35, 116–133 (2016)CrossRefGoogle Scholar
  14. 14.
    Adell, J., Bonafonte, A., Gomez, J., Castro, M.: Comparative study of automatic phone segmentation methods for TTS. In: 30th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, pp. 309–312 (2005)Google Scholar
  15. 15.
    Toledano, D., Gomez, L., Grande, L.: Automatic phonetic segmentation. IEEE Trans. Speech Audio Process. 11(6), 617–625 (2003). IEEE Signal Processing SocietyCrossRefGoogle Scholar
  16. 16.
    Wang, L., Zhao, Y., Chu, M., Zhou, J., Cao, Z.: Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, pp. 641–644 (2004)Google Scholar
  17. 17.
    Brugnara, F., Falavigna, D., Omologo, M.: Automatic segmentation and labeling of speech based on hidden Markov models. J. Speech Commun. 12(4), 357–370 (1993)CrossRefGoogle Scholar
  18. 18.
    Appen, Product Catalog. http://catalog.appenbutlerhill.com/
  19. 19.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlícek, P., Qian, Y., Schwarz, P., Silovský, J., Stemmer, G., Veselý, K.: The kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 1–4. IEEE Signal Processing Society (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Edvin Pakoci
    • 1
  • Branislav Popović
    • 1
    Email author
  • Nikša Jakovljević
    • 1
  • Darko Pekar
    • 2
  • Fathy Yassa
    • 3
  1. 1.Faculty of Technical SciencesUniversity of Novi SadNovi SadSerbia
  2. 2.AlfaNum Speech TechnologiesNovi SadSerbia
  3. 3.Speech Morphing Inc.CampbellUSA

Personalised recommendations