Generalized Structure of Active Speech Perception Based on Multiagent Intelligence

Nagoev, Zalimkhan; Gurtueva, Irina; Anchekov, Murat

doi:10.1007/978-3-030-96993-6_35

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1032))

Included in the following conference series:

Biologically Inspired Cognitive Architectures Meeting

611 Accesses

Abstract

Recent success in the field of speech technology is undoubted. Developers from Microsoft and IBM reported on the efficiency of automated speech recognition systems at the human level in transcribing conversational telephone speech. According to various estimates, their WER now is about 5.8–5.1%. However, the most challenging problems in speech recognition – diarization and noise cancellation – are still open. A comparative analysis of the most frequent errors made by systems and people when solving the recognition problem shows that, in general, the errors are similar. Errors made by a human when solving speech recognition problems are much less critical; they seldom distort the meaning of a statement. In other words, these errors are not sematic. That is why the mechanisms of human speech perception are the most promising area of research. This paper proposes the model of a general structure for active auditory perception theory and the neurobiological basis of the hypothesis put forward. The proposed concept is a basic platform for general multiagent architecture. We assume that speech recognition is guided by attention, even in its early stages, a change in the early auditory code determined by context and experience. This model simulates the involuntary attention used by children in mastering their native language, based on an emotional assessment of perceptually significant auditory information. The multiagent internal dynamics of auditory speech coding can provide new insights into how hearing impairment can be treated. The formal description of the structure of speech perception can be used as a general theoretical basis for the development of universal systems for automatic speech recognition, highly effective in noisy conditions and cocktail-party situations. Formal means for program implementation of the present model are multiagent systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hershey, J.R., Rennie, S.J., Olsen, P.A., Kristjansson, T.T.: Super-human multi-talker speech recognition: a graphical modeling approach. Comput. Speech Lang. 24, 45–66 (2010)
Article Google Scholar
Weng, C., Yu, D., Seltzer, M. L., Droppo, J.: Single-channel mixed speech recognition using deep neural networks. In: Proceedings IEEE ICASSP, pp. 5632–5636 (2014)
Google Scholar
Matsoukas, S., et al.: Advances in transcription of broadcast news and conversational telephone speech within the combined ears bbn/limsi system. IEEE Trans. Audio Speech Lang. Process. 14, 1541–1556 (2006)
Article Google Scholar
Evermann, G., et al.: Development of the 2003 CU-HTK conversational telephone speech transcription system. In: Proceedings IEEE ICASSP 1, p. I–249 (2004)
Google Scholar
Glenn, M. L., Strassel, S. M., Lee, H., Maeda, K., Zakhary, R., Li, X.: Transcription methods for consistency, volume and efficiency. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC, pp. 2915–2920 (2010)
Google Scholar
Hannun, A.: Writing about Machine Learning. https://awni.github.io/speech-recognition/. Accessed 21 Aug 2021
Han, C., O’Sullivan, J., Luo, Y., Herrero, J., Mehta, A.D., Mesgarani, N.: Speaker-independent auditory attention decoding without access to clean speech sources. Sci. Adv. 5(5), 1–11 (2019). https://doi.org/10.1126/sciadv.aav6134
Article Google Scholar
Amodei, D., et al.: Deep Speech 2: End-to-end speech recognition in English and Mandarin. arXiv preprint arXiv:1512.02595. Accessed 11 May 2020
Google Scholar
Galbraith, G.C., Arroyo, C.: Selective attention and brainstem frequency-following responses. Biol. Psychol. 37, 3–22 (1993)
Article Google Scholar
Giard, M.-E., Collet, L., Bouchet, P., Pernier, J.: Auditory selective attention in the human cochlea. Brain Res. 633, 353–356 (1994)
Article Google Scholar
Sakharny, L.V.: Introduction into Psycholinguistics. Publishing House of Leningrad University, Leningrad (1989). [Sakharny, L. V.: Vvedeniye v psikholingvistiku. Izdatel’stvo Leningradskogo Universiteta, Leningrag (1989)]
Google Scholar
Ventzov, A.V., Kasevich, V.B.: Problems of Speech Perception. Publishing House Editorial, Moscow (2003). [Ventzov, A. V., Kasevich, V. B.: Problemy Vospriyatia Rechi. Izdatel’stvo Editorial, Moscow (2003)]
Google Scholar
Morton, J.: The integration of information in word recognition. Psychol. Rev. 76, 165–178 (1969)
Article Google Scholar
Marslen-Wilson, W.D.: Functional parallelism in spoken word-recognition. Cognition 25, 71–102 (1987)
Article Google Scholar
Marslen-Wilson, W.D.: Activation, competition and frequency in lexical access. In: Altman, G.T.M. (ed.) Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives, pp. 148–172. MIT Press, Cambridge (1990)
Google Scholar
Marslen-Wilson, W.D., Brown, C.M., Tyler, L.K.: Lexical representations in spoken language comprehension. Lang Cogn. Process. 3, 1–16 (1988)
Article Google Scholar
Cole, R.A.: Listening for mispronunciations: a measure of what we hear during speech. Percept Psychophys. 1, 153–156 (1973)
Article Google Scholar
Taft, M., Hambly, G.: Exploring the cohort model of spoken word recognition. Cognition 22, 259–328 (1986)
Article Google Scholar
Bard, E.G., Shillcock, R.C., Altmann, G.E.: The recognition of words after their acoustic offsets in spontaneous speech: evidence of subsequent context. Percept Psychophys. 44, 395–408 (1988)
Article Google Scholar
Luce, P.A.: A computational analysis of uniqueness points in auditory word recognition. Percept Psychophys. 39, 155–158 (1986)
Article Google Scholar
Norris, D.: Shortlist: a connectionist model of continuous speech recognition. Cognition 52, 189–234 (1994)
Article Google Scholar
Massaro, D.W., Cohen, M.M.: The paradigm and the fuzzy logical model of perception are alive and well. J. Exp. Psychol. 122(1), 115–124 (1993)
Article Google Scholar
Hintzman, D.L.: Minerva 2: a simulation model of human memory. Behav. Res. Methods Instrum. Comput. 16(2), 96–101 (1984)
Article Google Scholar
Hintzman, D.L., Block, R., Inskeep, N.: Memory for mode of input. J. Verb. Learn. Verb. Behav. 11, 741–749 (1972)
Article Google Scholar
Heald, S.L.M., Van Hedger, S.C., Nusbaum, H. C.: Understanding Sound: Auditory Skill Acquisition. https://www.researchgate.net/publication/316866628_Understanding_Sound_Auditory_Skill_Acquisition. https://doi.org/10.1016/bs.plm.2017.03.003. Accessed 12 June 2020
Nagoev, Z.V.: Intellectics, or thinking in living and artificial systems. Publishing House KBSC RAS, Nalchik (2013). [Nagoev, Z. V.: Intellektika ili myshleniye v zhyvych i iskusstvennych sistemach. Izdatel’stvo KBNC, Nal’chik (2013)]
Google Scholar
Nagoev, Z., Lyutikova, L., Gurtueva, I.: Model for automatic speech recognition using multi-agent recursive cognitive architecture. In: Annual International Conference on Biologically Inspired Cognitive Architectures BICA, Prague, Czech Republic. https://doi.org/10.1016/j.procs.2018.11.089
Nagoev, Z., Gurtueva, I., Malyshev, D., Sundukov, Z.: Multi-agent algorithm imitating formation of phonemic awareness. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 364–369. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_47
Chapter Google Scholar
Nagoev, Z. V., Gurtueva, I.: Fundamental elements for cognitive model of speech perception mechanism based on multiagent recursive intellect. News of Kabardino-Balkarian Scientific Center of RAS 3(89), 3–14 (2019). [Nagoev, Z. V., Gurtueva, I. A.: Bazovye element kognitivnoi modeli mehanizma vospriyatiya rechi na osnove multiagentnogo rekursivnogo intellekta. Izvestiya Kabardino-Balkarskogo nauchnogo tsentra RAN (89), 3–14 (2019)]
Google Scholar
Nagoev, Z., Gurtueva, I.: Multiagent model of perceptual space formation in the process of mastering linguistic competence. Adv. Intell. Syst. Comput., 327–334. https://doi.org/10.1007/978-3-030-65596-9_39
Maye, J., Werker, J.F., Gerken, L.: Infant sensitivity to distributional information can affect phonetic discrimination. Cognition 82(3), B101–B111 (2002)
Article Google Scholar
Holt, L.L., Lotto, A.J.: Behavioral examinations of the level of auditory processing of speech context effects. Hear. Res. 167(1–2), 156–169 (2002). https://doi.org/10.1016/S0378-5955(02)00383-0
Article Google Scholar
Lim, S.-J., Fiez, J.A., Holt, L.L.: How may the basal ganglia contribute to auditory categorization and speech perception? Front. Neurosci. 8, 1–18 (2014)
Article Google Scholar
Ashby, F.G., Maddox, W.T.: Human category learning. Annu. Rev. Psychol. 56, 149–178 (2005)
Article Google Scholar
Elman, J.L., McClelland, J.L.: Exploiting lawful variability in the speech wave. In: Perkell, J.S., Klatt, D.H.: (eds.) Invariance and Variability in Speech Processes, pp. 360–385. Lawrence Erlbaum Associates, Inc., Hillsdale (1986)
Google Scholar

Download references

The research was supported by the Russian Foundation of Basic Research, grant No. 19–01-00648.

Author information

Authors and Affiliations

The Federal State Institution of Science Federal Scientific Center, Kabardino-Balkarian Scientific Center of Russian Academy of Sciences, I. Armand Street, 37-a, 360000, Nalchik, Russia
Zalimkhan Nagoev, Irina Gurtueva & Murat Anchekov

Authors

Zalimkhan Nagoev
View author publications
You can also search for this author in PubMed Google Scholar
Irina Gurtueva
View author publications
You can also search for this author in PubMed Google Scholar
Murat Anchekov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Gurtueva .

Editor information

Editors and Affiliations

MEPhI, National Research Nuclear University, Moscow, Russia
Valentin V. Klimov
Boston Consulting Group, Seattle, WA, USA
David J. Kelley

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nagoev, Z., Gurtueva, I., Anchekov, M. (2022). Generalized Structure of Active Speech Perception Based on Multiagent Intelligence. In: Klimov, V.V., Kelley, D.J. (eds) Biologically Inspired Cognitive Architectures 2021. BICA 2021. Studies in Computational Intelligence, vol 1032. Springer, Cham. https://doi.org/10.1007/978-3-030-96993-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-96993-6_35
Published: 25 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96992-9
Online ISBN: 978-3-030-96993-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics