A Revised Comparison of Polish Taggers in the Application for Automatic Speech Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9561)

Abstract

In this paper (This is a revised and extended version of the article A Comparison of Polish Taggers in the Application for Automatic Speech Recognition that appeared in the Proceedings of Language and Tools Conference, Poznan, 2013.) we investigate the performance of Polish taggers in the context of automatic speech recognition (ASR). We use a morphosyntactic language model to improve speech recognition in an ASR system and seek the best Polish tagger for our needs. Polish is an inflectional language and an n-gram model using morphosyntactic features, which reduces data sparsity seems to be a good choice. We investigate the difference between the morphosyntactic taggers in that context. We compare the results of tagging with respect to the reduction of word error rate as well as speed of tagging. As it turns out at present the taggers using conditional random fields (CRF) models perform the best in the context of ASR. A broader audience might be also interested in the other discussed features of the taggers such as easiness of installation and usage, which are usually not covered in the papers describing such systems.

Keywords

Morphosyntactic tagger Polish Automatic speech recognition Language model 

References

  1. 1.
    Acedański, S.: A morphosyntactic brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231. Association for Computational Linguistics (2000)Google Scholar
  3. 3.
    Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Workshop on Speech and Natural Language, pp. 112–116. Association for Computational Linguistics (1992)Google Scholar
  4. 4.
    Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)Google Scholar
  5. 5.
    Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner (2010)Google Scholar
  6. 6.
    Daelemans, W., Van den Bosch, A.: Memory-Based Language Processing. Cambridge University Press, New York (2005)CrossRefGoogle Scholar
  7. 7.
    Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)CrossRefGoogle Scholar
  8. 8.
    Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 181–184. IEEE (1995)Google Scholar
  9. 9.
    Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)Google Scholar
  10. 10.
    Marciniak, M.: Anotowany korpus dialogów telefonicznych. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2011)Google Scholar
  11. 11.
    Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16(1), 69–88 (2002)CrossRefGoogle Scholar
  12. 12.
    Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Q. 11(1–2), 151–167 (2007)Google Scholar
  13. 13.
    Pohl, A., Ziółko, B.: A comparison of polish taggers in the application for automatic speech recognition. In: Proceedings of the 6th Language & Technology Conference, pp. 294–298 (2013)Google Scholar
  14. 14.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: Proceedings of Automatic Speech Recognition and Understanding (2011)Google Scholar
  15. 15.
    Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B.: Narodowy Korpus Jȩzyka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  16. 16.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  17. 17.
    Radziszewski, A., Śniatowski, T.: A memory-based tagger for Polish. In: Proceedings of the 5th Language & Technology Conference, Poznań, pp. 29–36 (2011)Google Scholar
  18. 18.
    Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Radziszewski, A., Wardyński, A., Śniatowski, T.: WCCL: a morpho-syntactic feature toolkit. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 434–441. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)Google Scholar
  21. 21.
    Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, pp. 93–128 (2006)Google Scholar
  22. 22.
    Tufis, D.: Tiered tagging and combined language models classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  23. 23.
    Vu, N.T., Kraus, F., Schultz, T.: Multilingual a-stabil: a new confidence score for multilingual unsupervised training. In: 2010 IEEE Spoken Language Technology Workshop (SLT), pp. 183–188. IEEE (2010)Google Scholar
  24. 24.
    Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language. In: Kay, M., Boitet, C. (eds.) Proceedings of COLING, pp. 2789–2804 (2012)Google Scholar
  25. 25.
    Witten, I., Bell, T.: The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inf. Theory 37(4), 1085–1094 (1991)CrossRefGoogle Scholar
  26. 26.
    Woliński, M.: Morfeusz—a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol. 35, pp. 511–520. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  27. 27.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book. Cambridge University Engineering Department, UK (2005)Google Scholar
  28. 28.
    Żelasko, P., Ziółko, B., Jadczyk, T., Skurzok, D.: AGH corpus of Polish speech. In: Language Resources and Evaluation, pp. 1–17 (2015)Google Scholar
  29. 29.
    Ziółko, B., Ziółko, M.: Przetwarzanie mowy. Wydawnictwo AGH, Kraków (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Aleksander Smywiński-Pohl
    • 1
    • 2
    • 3
  • Bartosz Ziółko
    • 2
    • 3
  1. 1.Faculty of Management and Social CommunicationJagiellonian UniversityKrakówPoland
  2. 2.Faculty of Computer Science, Electronics and TelecommunicationAGH University of Science and TechnologyKrakówPoland
  3. 3.TechmoKrakówPoland

Personalised recommendations