Abstract
In this chapter basic concepts of speech recognition are presented. Acoustic processing, acoustic modeling and search algorithms are briefly described. A more detailed explanation is given on language modeling. Afterwards some features of inflective languages are described and how these features are important in the process of designing speech recognition systems. First inflective languages are discussed in general, then Slovene as an example is discussed in more detail. Next some typical methods to overcome the difficulties of speech recognition in inflective languages and improve speech recognition accuracy are described. These are the enlargement of the vocabulary, the use of sub-word language models and other more sophisticated language models. The last part of the chapter discusses morphosyntactic description tagging in inflective languages that will be used in further chapters. This chapter does not give a comprehensive overview of speech recognition, solely basic descriptions and some more information that is necessary to understand the content of further chapters are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alleva F, Huang X, Hwang MY (1993) An improved search algorithm using incremental knowledge for continuous speech recognition. In: 1993 IEEE international conference on acoustics, speech, and signal processing, Minneapolis, April 1993, pp 307–310
Arhar Š, Gorjanc V, Krek S (2007) FidaPLUS corpus of Slovenian: the new generation of the Slovenian reference corpus: its design and tools. In: Davies M (ed) Proceedings of the corpus linguistics conference, Birmingham, 2007, pp 27–30
Aubert XL (2002) An overview of decoding techniques for large vocabulary continuous speech recognition. Comput Speech Lang 16:89–114. doi:10.1006/csla.2001.0185
Axelrod AE (2006) Factored language models for statistical machine translation. Dissertation, University of Edinburgh
Biem A, McDermott E, Katagiri S (1996) A discriminative filter bank model for speech recognition. In: Proceedings of the IEEE, ICASSP-96, Atlanta, May 1996, pp 545–548
Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, Edmonton, 2003, pp 4–6
Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13:359–394. doi:10.1006/csla.1999.0128
Donaj G, Kačič Z (2011) Perplexity testing of factored language models on morphological tags in the Slovene language. In: International conference; 1st, information technology and computer networks; latest trends in information technology, Vienna, 2011, pp 237–242
Erjavec T, Fišer D, Krek S, Ledinek S (2010) The JOS linguistically tagged corpus of Slovene. In: 7th International conference on language resources and evaluations (LREC-10), Valletta, 19–21 May 2010, pp 1806–1809
Flynn R, Jones E (2012) Feature selection for reduced-bandwidth distributed speech recognition. Speech Commun 54:836–843. doi:10.1016/j.specom.2012.01.003
Gales M, Young S (2007) The application of hidden Markov models in speech recognition. Found Trends Signal Process 1:195–304. doi:10.1561/2000000004
Gemmeke JF, Cranen B, Remes U (2011) Sparse imputation for large vocabulary noise robust ASR. Comput Speech Lang 25:462–479. doi:10.1016/j.csl.2010.06.004
Geutner P, Finke M, Scheytt P (1998) Adaptive vocabularies for transcribing multilingual broadcast news. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, Seattle, 1998, pp 925–928
Giménez J, Màrquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of the 4th international conference on language resources and evaluation (LREC-04), Lisbon, 26–28 May 2004, pp 43–46
Grčar M, Krek S, Dobrovoljc K (2012) Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In: Jezikovne tehnologije 2012, Ljubljana, September 2012, pp 89–94
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87:1738–1752. doi:10.1121/1.399423
Hirsimäki T, Kurimo M (2004) Decoder issues in unlimited Finnish speech recognition. In: Proceedings of the 6th nordic signal processing symposium, Espoo, 9–11 June 2004, pp 320–323
Hirsimäki T, Creutz M, Siivola V, Kurimo M, Virpioja S, Pylkkönen J (2005) Unlimited vocabulary speech recognition with morph language models applied to Finnish. Comput Speech Lang 20:515–541. doi:10.1016/j.csl.2005.07.002
Hirsimäki T, Pylkkonen J, Kurimo M (2009) Importance of high-order N-gram models in morph-based speech recognition. IEEE Trans Audio Speech 17:724–732. doi:10.1109/TASL.2008.2012323
Huang X, Acero A, Hon HW (2001) Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall PTR, Upper Saddle River
Huet S, Gravier G, Sebillot P (2010) Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition. Comput Speech Lang 24:663–684. doi:10.1016/j.csl.2009.10.001
Ircing P, Psutka JV, Psutka J (2009) Using morphological information for robust language modeling in Czech ASR system. IEEE Trans Audio Speech 17:840–847. doi:10.1109/TASL.2009.2014217
Jelinek F (1976) Continuous speech recognition by statistical methods. Proc IEEE 64:532–556. doi:10.1109/PROC.1976.10159
Jiang H (2010) Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang 24:589–608. doi:10.1016/j.csl.2009.08.002
Katz S (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans Acoust Speech 35:400–401. doi:10.1109/TASSP.1987.1165125
Kaufmann T, Pfister B (2012) Syntactic language modeling with formal grammars. Speech Commun 54:715–731. doi:10.1016/j.specom.2012.01.001
Kirchhoff K, Vergyri D, Bilmes J, Duh K, Stolcke A (2005) Morphology-based language modeling for conversational Arabic speech recognition. Comput Speech Lang 20:589–608. doi:10.1016/j.csl.2005.10.001
Kirchhoff K, Bilmes J, Duh K (2008) Factored language models tutorial. http://ssli.ee.washington.edu/people/duh/papers/flm-manual.pdf. Accessed 1 June 2015
Klakow D, Peters J (2002) Testing the correlation of word error rate and perplexity. Speech Commun 38:19–28. doi:10.1016/S0167-6393(01)00041-3
Krek S (2012) Slovenski jezik v digitalni dobi: the Slovene language in the digital age. Springer, Heidelberg
Logar Beginc N, Kosem I (2011) Gigafida – the new corpus of modern Slovene: what is really in there? In: The second conference on Slavic Corpora, Dubrovnik, 12–14 September 2011
Lv Z, Liu W, Yang Z (2009) A novel interpolated N-gram language model based on class hierarchy. In: International conference on natural language processing and knowledge engineering, Dalian, 24–27 September 2009, pp 1–5
Màrquez L, Rodríguez H (1998) Part-of-speech tagging using decision trees. In: Nedellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Heidelberg, pp 25–36
Ming J, Smith FJ (1999) A Bayesian triphone model. Comput Speech Lang 13:195–206. doi:10.1006/csla.1999.0120
Mohri M, Pereira F, Riley M (2001) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88. doi:10.1006/csla.2001.0184
Mousa AED, Shaik MAB, Schlüter R, Ney H (2010) Sub-lexical language models for German LVCSR. In: 2010 IEEE spoken language technology workshop (SLT), Berkeley, 2010, pp 171–176
Mousa AED, Shaik MAB, Schlüter R, Ney H (2011) Morpheme based factored language models for German LVCSR. In: Proceedings of interspeech 2011, Florence, August 2011, pp I-1053–I-1056
Najedlova D (2002) Comparative study on bigram language models for spoken Czech recognition. In: Sojka P, Kopeček I, Pala K (eds) Text, speech and dialogue: 5th international conference, TSD 2002, Brno, September 2002. Lecture notes in computer science, vol 2448. Springer, Heidelberg, pp 197–204
Nouza J, Nejedlova D, Zdansky J, Kolorenc J (2004) Very large vocabulary speech recognition system for automatic transcription of Czech broadcast programs. In: Proceedings of interspeech 2004, Jeju, pp 409–412
Nouza J, Zdansky J, Cerva P et al (2010) Challenges in speech processing of slavic languages (case studies in speech recognition of Czech and Slovak). In: Esposito A, Campbell N, Vogel C et al (eds) Development of multimodal interfaces: active listening and synchrony: second COST 2102 international training school. Springer, Heidelberg, pp 225–241
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286. doi:10.1109/5.18626
Ramirez J, Gorriz JM, Segura JC (2007) Voice activity detection. Fundamentals and speech recognition system robustness. In: Grimm M, Kroschel K (eds) Robust speech recognition and understanding. InTech, Vienna, pp 1–22
Rotovnik T, Sepesy Maučec M, Kačič Z (2007) Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Commun 49:437–452. doi:10.1016/j.specom.2007.02.010
Sak H, Saraçlar M, Güngör T (2010) Morphology-based and sub-word language modeling for Turkish speech recognition. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), Dallas, 14–19 March 2010, pp 5402–5405
Schmid H (1994) Part-of-speech tagging with neural networks. In: Proceedings of the 15th international conference on computational linguistics, Kathmandu, 6–12 April 1994, pp 172–176
Sepesy Maučec M, Donaj G, Kačič Z (2013) Improving statistical machine translation with additional language models. In: 6th Language & technology conference, Poznan, 7–9 December 2013, pp 137–141
Shaik MAB, Mousa AED, Schlüter R, Ney H (2011) Using morpheme and syllable based sub-words for polish LVCSR. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), Prague, 22–27 May 2011, pp 4680–4683
Shin JW, Chang JH, Kim NS (2010) Voice activity detection based on statistical models and machine learning approaches. Comput Speech Lang 24:515–530. doi:10.1016/j.csl.2009.02.003
Su Y, Jelinek F, Khudanpur S (2007) Large-scale random forest language models for speech recognition. In: Proceedings of interspeech, Antwerp, 2007, pp 598–601
Topirišic J (1984) Slovenska slovnica. Obzorja, Ljubljana
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theory 13:260–269. doi:10.1109/TIT.1967.1054010
Whittaker EWD, Woodland PC (2003) Language modelling for Russian and English using words and classes. Comput Speech Lang 17:87–104. doi:10.1016/S0885-2308(02)00047-5
Young SJ, Evermann G, Gales MJF et al (2006) The HTK book, version 3.4. Cambridge University Press, Cambridge
Zablotskiy S, Zablotskaya K, Minker W (2010) Some approaches for Russian speech recognition. In: 2010 Sixth international conference on intelligent environments, Kuala Lumpur, 19–21 July 2010, pp 96–99
Žgank A, Verdonik D, Zögling Markuš A, Kačič Z (2005) BNSI Slovenian broadcast news database – speech and text corpus. In: Proceedings of interspeech 2005 – Eurospeech, Lisbon, 4–8 September 2005, pp 2525–2528
Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21:88–104. doi:10.1016/j.csl.2006.01.001
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 The Author(s) - SpringerBriefs
About this chapter
Cite this chapter
Donaj, G., Kačič, Z. (2017). Speech Recognition in Inflective Languages. In: Language Modeling for Automatic Speech Recognition of Inflective Languages. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-41607-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-41607-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41605-2
Online ISBN: 978-3-319-41607-6
eBook Packages: EngineeringEngineering (R0)