Advertisement

Large Vocabulary Continuous Speech Recognizer for Slovenian Language

  • Andrej Žgank
  • Zdravko Ka7#x010D;ič
  • Bogomir Horvat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2166)

Abstract

The paper describes the development of a large vocabulary continuous speech recogniser for Slovenian language with SNABI database. The problems with inflectional languages when speech recognition is performed are presented. The system is based on hidden Markov models. For acoustic modeling biphones were used whereas for language modeling bigrams and trigrams were used. To improve the recognition result and to enable fast operation of the recogniser, speaker adaptation is also used. The optimal system with the adapted acoustic model and bigram language model achieved word accuracy of 91.30% at near 10× real time. The unadapted system with the trigram language model achieved the word accuracy of 89.56%, but it was also slower than the optimal system. Its run time was 15.3× real time.

Keywords

Speech Recognition Acoustic Model Text Corpus Sentence Accuracy Maximum Likelihood Linear Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Žibert, J., Mihelič, F.: Govorna zbirka vremenskih napovedi. Information Society multiconference: Language Technologies, Ljubljana, Slovenia, 2000.Google Scholar
  2. 2.
    Kaiser, J., Kačič, Z.: Development of the Slovenian SpeechDat database. Proc. First International Conference on Language Resources and Evaluation (LREC-1998), Granada, Spain, 1998.Google Scholar
  3. 3.
    Johansen, F.T., Warakagoda, N., Lindberg, B., Lehtinen, G., Kačič, Z., Žgank, A., Elenius, K., Salvi, G.: The COST 249 SpeechDat Multilingual Reference Recogniser. Proc. Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, May, 2000.Google Scholar
  4. 4.
    Lindberg, B., Johansen, F.T., Warakagoda, N., Lehtinen, G., Kačič, Z., Žgank, A., Elenius, K., Salvi, G.: A noise robust multilingual reference recogniser based on SpeechDat(II). ICSLP 2000: the proceedings of the 6th conference, Beijing, China, 2000.Google Scholar
  5. 5.
    Imperl, B., Köhler, J., Kačič, Z.: On the use of semi-continuous HMM for the isolated digits recognition over the telephone. Proceedings of the COST 249, 250, 258 workshop: Speech technology in the public telephone network: Where are we today? Rhodes, Greece, 26–27 September 1997, 41–44.Google Scholar
  6. 6.
    Ipšič, I., Mihelič, F., Dobrišek, S., Gros, J., Pavešić, N.: A Slovenian Spoken Dialog System for Air Flight Inquires. Proceedings of the Eurospeech’ 99, Budapest, Hungary, 1999, 2659–2662.Google Scholar
  7. 7.
    Kačič, Z., Horvat, B., Zögling A.: Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language. Proc. Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, May, 2000.Google Scholar
  8. 8.
    Byrne, W., Hajič, J., Ircing, P., Jelinek, F., Khudanpur, S., McDonough, J., Peterek, N., Psutka, J.: Large Vocabulary Speech Recognition for Read and Broadcast Czech. In: Proceedings of the Second Workshop on Text, Speech, Dialogue-TSD99, Pilsen, Czech Republic, September 1999.Google Scholar
  9. 9.
    Žgank, A.: The Development of UMB Broadcast News 1996 Transcription System. In: Advances in Speech Technology: International Workshop, Maribor, Slovenia, 4–5 July 2000.Google Scholar
  10. 10.
    Byrne, W., Hajič, J., Ircing, P., Krbec, P., Psutka, J.: Morpheme Based Language Models for Speech Recognition of Czech. In: Proceedings of the Third Workshop on Text, Speech, Dialogue-TSD 2000, Brno, Czech Republic, September 2000, 211–216.Google Scholar
  11. 11.
    Malkovsky, M.G., Subbotin, A.V.: NL-Processor and Linguistic Knowledge Base in a Speech Recognition System. In: Proceedings of the Third Workshop on Text, Speech, Dialogue-TSD 2000, Brno, Czech Republic, September 2000, 237–242.Google Scholar
  12. 12.
    Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK book (for HTK version 2.1). Entropic Cambridge Research Laboratory, March 1997.Google Scholar
  13. 13.
    Clarkson, P.R., Rosenfeld, R.: Statistical Language Modeling Using the CMU-Cambridge Toolkit. Proc. of the Eurospeech’ 97, Rhodes, Greece, 1997.Google Scholar
  14. 14.
    Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD Thesis, 1995.Google Scholar
  15. 15.
    Leggetter, C.J., Woodland, P.C.: Flexible Speaker Adaptation using Maximum Likelihood Linear Regression. Proc. ARPA Spoken Language Technology Workshop, Austin, Texas, February, 1995, 104–109.Google Scholar
  16. 16.
    Niemöller, M., Hauenstein, A., Marschall, E., Witschel, P., Harke, U.: A PC-Based Real-Time Large Vocabulary Continuous Speech Recognizer for German. ICASSP’97: the proceedings of the conference, Munich, Germany, 1997.Google Scholar
  17. 17.
    Nouza, J., A Large Czech Vocabulary Recognition System for Real-Time Applications. In: Proceedings of the Third Workshop on Text, Speech, Dialogue-TSD 2000, Brno, Czech Republic, September 2000, 217–222.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Andrej Žgank
    • 1
  • Zdravko Ka7#x010D;ič
    • 1
  • Bogomir Horvat
    • 1
  1. 1.Laboratory for Digital Signal Processing, Faculty of EE & CSUniversity of MariborMariborSlovenia

Personalised recommendations