A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis

Khomitsevich, Olga; Mendelev, Valentin; Tomashenko, Natalia; Rybin, Sergey; Medennikov, Ivan; Kudubayeva, Saule

doi:10.1007/978-3-319-23132-7_3

Olga Khomitsevich^7,8,
Valentin Mendelev^7,8,
Natalia Tomashenko^7,8,
Sergey Rybin⁸,
Ivan Medennikov^8,9 &
…
Saule Kudubayeva¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1713 Accesses
8 Citations

Abstract

The paper presents a system for speech recognition and synthesis for the Kazakh and Russian languages. It is designed for use by speakers of Kazakh; due to the prevalence of bilingualism among Kazakh speakers, it was considered essential to design a bilingual Kazakh-Russian system. Developing our system involved building a text processing and transcription system that deals with both Kazakh and Russian text, and is used in both speech synthesis and recognition applications. We created a Kazakh TTS voice and an additional Russian voice using the recordings of the same bilingual voice artist. A Kazakh speech database was collected and used to train deep neural network acoustic models for the speech recognition system. The resulting models demonstrated sufficient performance for practical applications in interactive voice response and keyword spotting scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Karabalaeva, M., Sharipbaev, A.: Algorithms for phone-based recognition of kazakh speech in the amplitude-time space. In: Proceedings of 2nd All-Russian Conference “Knowledge-Ontology-Theories”, Novosibirsk, Russia (2009) (in Russian)
Google Scholar
Buribayeva, A., Sharipbay, A.: The advantage of interphoneme processing at diphone recognition of Kazakh words. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 8(8) (2014)
Google Scholar
Pavlenko, A.: Russian in post-Soviet countries. Russ. linguist. 32(1), 59–80 (2008)
Article MathSciNet Google Scholar
Pavlenko, A.: Multilingualism in post-Soviet countries: language revival, language removal, and sociolinguistic theory. Int. J. Biling. Educ. Biling. 11(3–4), 275–314 (2008)
Article Google Scholar
Chistikov, P.G., Korolkov, E.A., Talanov, A.O.: Combining HMM and unit selection technologies to increase naturalness of synthesized speech. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog 2013”, vol. 2, pp. 2–10 (2013)
Google Scholar
Chistikov, P., Zakharov, D., Talanov, A.: Improving speech synthesis quality for voices created from an audiobook database. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 276–283. Springer, Heidelberg (2014)
Google Scholar
Musaev, K.M.: The Kazakh Language. Russian Academy of Sciences, Moscow (2008). (in Russian)
Google Scholar
Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., Sharafudinov, A.: Assembling the Kazakh Language Corpus. In: EMNLP, pp. 1022–1031 (2013)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Thomas, S., Ganapathy, S., Jansen, A., Hermansky, H.: Data-driven posterior features for low resource speech recognition applications. In: Proceedings of Interspeech (2012)
Google Scholar
Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proceedings of ICASSP 2013, pp. 7304–7308. IEEE (2013)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Vesel, K.: The Kaldi speech recognition toolkit (2011)
Google Scholar
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)
Google Scholar
Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In: ICASSP 2009, pp. 3761–3764. IEEE (2009)
Google Scholar
Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010–003, Deptartment of Computer Science, University of Toronto (2010)
Google Scholar

Download references

Acknowledgments

The work was financially supported by the Government of the Russian Federation, Grant 074-U01.

Author information

Authors and Affiliations

Speech Technology Center, Saint Petersburg, Russia
Olga Khomitsevich, Valentin Mendelev & Natalia Tomashenko
ITMO University, Saint Petersburg, Russia
Olga Khomitsevich, Valentin Mendelev, Natalia Tomashenko, Sergey Rybin & Ivan Medennikov
STC-innovations Ltd., Saint Petersburg, Russia
Ivan Medennikov
Kostanay State University named after A. Baytursynov, Kostanay, Kazakhstan
Saule Kudubayeva

Authors

Olga Khomitsevich
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Mendelev
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Tomashenko
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Rybin
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Medennikov
View author publications
You can also search for this author in PubMed Google Scholar
Saule Kudubayeva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Tomashenko .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khomitsevich, O., Mendelev, V., Tomashenko, N., Rybin, S., Medennikov, I., Kudubayeva, S. (2015). A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_3
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics