Speech Recognition in Inflective Languages

Donaj, Gregor; Kačič, Zdravko

doi:10.1007/978-3-319-41607-6_2

Gregor Donaj³ &
Zdravko Kačič³

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

709 Accesses

Abstract

In this chapter basic concepts of speech recognition are presented. Acoustic processing, acoustic modeling and search algorithms are briefly described. A more detailed explanation is given on language modeling. Afterwards some features of inflective languages are described and how these features are important in the process of designing speech recognition systems. First inflective languages are discussed in general, then Slovene as an example is discussed in more detail. Next some typical methods to overcome the difficulties of speech recognition in inflective languages and improve speech recognition accuracy are described. These are the enlargement of the vocabulary, the use of sub-word language models and other more sophisticated language models. The last part of the chapter discusses morphosyntactic description tagging in inflective languages that will be used in further chapters. This chapter does not give a comprehensive overview of speech recognition, solely basic descriptions and some more information that is necessary to understand the content of further chapters are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alleva F, Huang X, Hwang MY (1993) An improved search algorithm using incremental knowledge for continuous speech recognition. In: 1993 IEEE international conference on acoustics, speech, and signal processing, Minneapolis, April 1993, pp 307–310
Google Scholar
Arhar Š, Gorjanc V, Krek S (2007) FidaPLUS corpus of Slovenian: the new generation of the Slovenian reference corpus: its design and tools. In: Davies M (ed) Proceedings of the corpus linguistics conference, Birmingham, 2007, pp 27–30
Google Scholar
Aubert XL (2002) An overview of decoding techniques for large vocabulary continuous speech recognition. Comput Speech Lang 16:89–114. doi:10.1006/csla.2001.0185
Article Google Scholar
Axelrod AE (2006) Factored language models for statistical machine translation. Dissertation, University of Edinburgh
Google Scholar
Biem A, McDermott E, Katagiri S (1996) A discriminative filter bank model for speech recognition. In: Proceedings of the IEEE, ICASSP-96, Atlanta, May 1996, pp 545–548
Google Scholar
Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, Edmonton, 2003, pp 4–6
Google Scholar
Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13:359–394. doi:10.1006/csla.1999.0128
Article Google Scholar
Donaj G, Kačič Z (2011) Perplexity testing of factored language models on morphological tags in the Slovene language. In: International conference; 1st, information technology and computer networks; latest trends in information technology, Vienna, 2011, pp 237–242
Google Scholar
Erjavec T, Fišer D, Krek S, Ledinek S (2010) The JOS linguistically tagged corpus of Slovene. In: 7th International conference on language resources and evaluations (LREC-10), Valletta, 19–21 May 2010, pp 1806–1809
Google Scholar
Flynn R, Jones E (2012) Feature selection for reduced-bandwidth distributed speech recognition. Speech Commun 54:836–843. doi:10.1016/j.specom.2012.01.003
Article Google Scholar
Gales M, Young S (2007) The application of hidden Markov models in speech recognition. Found Trends Signal Process 1:195–304. doi:10.1561/2000000004
Article MATH Google Scholar
Gemmeke JF, Cranen B, Remes U (2011) Sparse imputation for large vocabulary noise robust ASR. Comput Speech Lang 25:462–479. doi:10.1016/j.csl.2010.06.004
Article Google Scholar
Geutner P, Finke M, Scheytt P (1998) Adaptive vocabularies for transcribing multilingual broadcast news. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, Seattle, 1998, pp 925–928
Google Scholar
Giménez J, Màrquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of the 4th international conference on language resources and evaluation (LREC-04), Lisbon, 26–28 May 2004, pp 43–46
Google Scholar
Grčar M, Krek S, Dobrovoljc K (2012) Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In: Jezikovne tehnologije 2012, Ljubljana, September 2012, pp 89–94
Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87:1738–1752. doi:10.1121/1.399423
Article Google Scholar
Hirsimäki T, Kurimo M (2004) Decoder issues in unlimited Finnish speech recognition. In: Proceedings of the 6th nordic signal processing symposium, Espoo, 9–11 June 2004, pp 320–323
Google Scholar
Hirsimäki T, Creutz M, Siivola V, Kurimo M, Virpioja S, Pylkkönen J (2005) Unlimited vocabulary speech recognition with morph language models applied to Finnish. Comput Speech Lang 20:515–541. doi:10.1016/j.csl.2005.07.002
Article Google Scholar
Hirsimäki T, Pylkkonen J, Kurimo M (2009) Importance of high-order N-gram models in morph-based speech recognition. IEEE Trans Audio Speech 17:724–732. doi:10.1109/TASL.2008.2012323
Article Google Scholar
Huang X, Acero A, Hon HW (2001) Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall PTR, Upper Saddle River
Google Scholar
Huet S, Gravier G, Sebillot P (2010) Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition. Comput Speech Lang 24:663–684. doi:10.1016/j.csl.2009.10.001
Article Google Scholar
Ircing P, Psutka JV, Psutka J (2009) Using morphological information for robust language modeling in Czech ASR system. IEEE Trans Audio Speech 17:840–847. doi:10.1109/TASL.2009.2014217
Article MATH Google Scholar
Jelinek F (1976) Continuous speech recognition by statistical methods. Proc IEEE 64:532–556. doi:10.1109/PROC.1976.10159
Article Google Scholar
Jiang H (2010) Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang 24:589–608. doi:10.1016/j.csl.2009.08.002
Article Google Scholar
Katz S (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans Acoust Speech 35:400–401. doi:10.1109/TASSP.1987.1165125
Article Google Scholar
Kaufmann T, Pfister B (2012) Syntactic language modeling with formal grammars. Speech Commun 54:715–731. doi:10.1016/j.specom.2012.01.001
Article Google Scholar
Kirchhoff K, Vergyri D, Bilmes J, Duh K, Stolcke A (2005) Morphology-based language modeling for conversational Arabic speech recognition. Comput Speech Lang 20:589–608. doi:10.1016/j.csl.2005.10.001
Article Google Scholar
Kirchhoff K, Bilmes J, Duh K (2008) Factored language models tutorial. http://ssli.ee.washington.edu/people/duh/papers/flm-manual.pdf. Accessed 1 June 2015
Klakow D, Peters J (2002) Testing the correlation of word error rate and perplexity. Speech Commun 38:19–28. doi:10.1016/S0167-6393(01)00041-3
Article MATH Google Scholar
Krek S (2012) Slovenski jezik v digitalni dobi: the Slovene language in the digital age. Springer, Heidelberg
Google Scholar
Logar Beginc N, Kosem I (2011) Gigafida – the new corpus of modern Slovene: what is really in there? In: The second conference on Slavic Corpora, Dubrovnik, 12–14 September 2011
Google Scholar
Lv Z, Liu W, Yang Z (2009) A novel interpolated N-gram language model based on class hierarchy. In: International conference on natural language processing and knowledge engineering, Dalian, 24–27 September 2009, pp 1–5
Google Scholar
Màrquez L, Rodríguez H (1998) Part-of-speech tagging using decision trees. In: Nedellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Heidelberg, pp 25–36
Chapter Google Scholar
Ming J, Smith FJ (1999) A Bayesian triphone model. Comput Speech Lang 13:195–206. doi:10.1006/csla.1999.0120
Article Google Scholar
Mohri M, Pereira F, Riley M (2001) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88. doi:10.1006/csla.2001.0184
Article Google Scholar
Mousa AED, Shaik MAB, Schlüter R, Ney H (2010) Sub-lexical language models for German LVCSR. In: 2010 IEEE spoken language technology workshop (SLT), Berkeley, 2010, pp 171–176
Google Scholar
Mousa AED, Shaik MAB, Schlüter R, Ney H (2011) Morpheme based factored language models for German LVCSR. In: Proceedings of interspeech 2011, Florence, August 2011, pp I-1053–I-1056
Google Scholar
Najedlova D (2002) Comparative study on bigram language models for spoken Czech recognition. In: Sojka P, Kopeček I, Pala K (eds) Text, speech and dialogue: 5th international conference, TSD 2002, Brno, September 2002. Lecture notes in computer science, vol 2448. Springer, Heidelberg, pp 197–204
Chapter Google Scholar
Nouza J, Nejedlova D, Zdansky J, Kolorenc J (2004) Very large vocabulary speech recognition system for automatic transcription of Czech broadcast programs. In: Proceedings of interspeech 2004, Jeju, pp 409–412
Google Scholar
Nouza J, Zdansky J, Cerva P et al (2010) Challenges in speech processing of slavic languages (case studies in speech recognition of Czech and Slovak). In: Esposito A, Campbell N, Vogel C et al (eds) Development of multimodal interfaces: active listening and synchrony: second COST 2102 international training school. Springer, Heidelberg, pp 225–241
Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286. doi:10.1109/5.18626
Article Google Scholar
Ramirez J, Gorriz JM, Segura JC (2007) Voice activity detection. Fundamentals and speech recognition system robustness. In: Grimm M, Kroschel K (eds) Robust speech recognition and understanding. InTech, Vienna, pp 1–22
Google Scholar
Rotovnik T, Sepesy Maučec M, Kačič Z (2007) Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Commun 49:437–452. doi:10.1016/j.specom.2007.02.010
Article Google Scholar
Sak H, Saraçlar M, Güngör T (2010) Morphology-based and sub-word language modeling for Turkish speech recognition. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), Dallas, 14–19 March 2010, pp 5402–5405
Google Scholar
Schmid H (1994) Part-of-speech tagging with neural networks. In: Proceedings of the 15th international conference on computational linguistics, Kathmandu, 6–12 April 1994, pp 172–176
Google Scholar
Sepesy Maučec M, Donaj G, Kačič Z (2013) Improving statistical machine translation with additional language models. In: 6th Language & technology conference, Poznan, 7–9 December 2013, pp 137–141
Google Scholar
Shaik MAB, Mousa AED, Schlüter R, Ney H (2011) Using morpheme and syllable based sub-words for polish LVCSR. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), Prague, 22–27 May 2011, pp 4680–4683
Google Scholar
Shin JW, Chang JH, Kim NS (2010) Voice activity detection based on statistical models and machine learning approaches. Comput Speech Lang 24:515–530. doi:10.1016/j.csl.2009.02.003
Article Google Scholar
Su Y, Jelinek F, Khudanpur S (2007) Large-scale random forest language models for speech recognition. In: Proceedings of interspeech, Antwerp, 2007, pp 598–601
Google Scholar
Topirišic J (1984) Slovenska slovnica. Obzorja, Ljubljana
Google Scholar
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theory 13:260–269. doi:10.1109/TIT.1967.1054010
Article MATH Google Scholar
Whittaker EWD, Woodland PC (2003) Language modelling for Russian and English using words and classes. Comput Speech Lang 17:87–104. doi:10.1016/S0885-2308(02)00047-5
Article Google Scholar
Young SJ, Evermann G, Gales MJF et al (2006) The HTK book, version 3.4. Cambridge University Press, Cambridge
Google Scholar
Zablotskiy S, Zablotskaya K, Minker W (2010) Some approaches for Russian speech recognition. In: 2010 Sixth international conference on intelligent environments, Kuala Lumpur, 19–21 July 2010, pp 96–99
Google Scholar
Žgank A, Verdonik D, Zögling Markuš A, Kačič Z (2005) BNSI Slovenian broadcast news database – speech and text corpus. In: Proceedings of interspeech 2005 – Eurospeech, Lisbon, 4–8 September 2005, pp 2525–2528
Google Scholar
Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21:88–104. doi:10.1016/j.csl.2006.01.001
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
Gregor Donaj & Zdravko Kačič

Authors

Gregor Donaj
View author publications
You can also search for this author in PubMed Google Scholar
Zdravko Kačič
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Donaj, G., Kačič, Z. (2017). Speech Recognition in Inflective Languages. In: Language Modeling for Automatic Speech Recognition of Inflective Languages. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-41607-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-41607-6_2
Published: 30 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41605-2
Online ISBN: 978-3-319-41607-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics