Advertisement

Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments

  • Tatiana Khorosheva
  • Marina Novoseltseva
  • Nazim Geidarov
  • Nikolay Krivosheev
  • Sergey Chernenko
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 874)

Abstract

With the development of modern informational communication systems, voice control interface and speech recognition systems find application in various fields of activity. One application of such systems is for people with special needs who have speech impairments, and thus find using speech-dependent voice interfaces challenging. Our research team is developing a speaker dependent computer system «Deep Interactive Voice Assistant» (DIVA), which allows recognizing an arbitrary set of commands to control the computing system. The article presents the results of testing various artificial neural networks to train the machine to recognize vocal inputs. We examine such architectures as associative memory, multilayer perceptron and convolutional network. The research justifies the use of multilayer perceptron for the speaker dependent computer system DIVA as a training solution that demonstrated high results on a small selection. DIVA will be implemented in voice-user interface of such systems as «Smart House», mobile applications and IT-based assistive systems.

Keywords

Voice interface technology Speech recognition technology Assistive technologies Neural network Multilayer perceptron Pattern recognition Associative memory 

References

  1. 1.
    Convention on the Rights of Persons with Disabilities (CRPD): http://www.un.org/development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities.html. Accessed 01 May 2018
  2. 2.
    Gaida, C.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf. Accessed 01 May 2018
  3. 3.
    Gazetić, E.: Comparison Between Cloud-based and Offline Speech Recognition Systems. https://mediatum.ub.tum.de/doc/1399984/1399984.pdf. Accessed 01 May 2018
  4. 4.
    Rybka, J., Janicki, A.: Comparison of speaker dependent and speaker independent emotion recognition. Appl. Math. Comput. Sci. 4(23), 797–808 (2013)Google Scholar
  5. 5.
    Lee, K., Huang, X.: On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. IEEE Trans. Speech Audio Process. 2(1), 150–157 (1993)Google Scholar
  6. 6.
    Senkevich, G.: Computer for People with Disabilities. BHV-Petersburg, St. Petersburg (2014)Google Scholar
  7. 7.
    Center of Speech Technologies: https://www.speechpro.ru/. Accessed 01 May 2018
  8. 8.
    El Amrania, M., Hafizur Rahmanb, M., Wahiddinb, M., Shahb, A.: Building CMU Sphinx language model for the Holy Quran using simplified Arabic phonemes. Egypt. Inform. J. 3(17), 305–314 (2016)CrossRefGoogle Scholar
  9. 9.
    Tampel, I.: Automatic speech recognition - the main stages of 50 years. Sci. Tech. Her. Inf. Technol. Mech. Opt. 6(15), 957–968 (2015)Google Scholar
  10. 10.
    Roebuck, K.: Speech Recognition: High-Impact Emerging Technology - What You Need To Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. Emereo Publishing, Australia (2012)Google Scholar
  11. 11.
    Povey, D.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, pp. 1–4 (2011)Google Scholar
  12. 12.
    Lange, P., Suendermann-Oeft, D.: Tuning Sphinx to outperform Google’s speech API. In: Proceedings of the ESSV 2014, Conference on Electronic Speech Signal Processing, Dresden, Germany (2014)Google Scholar
  13. 13.
    Simon, O.: Haykin Neural Networks and Learning Machines, 3rd edn. Pearson, Upper Saddle River (2009)Google Scholar
  14. 14.
    Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks, CoRR, vol. abs/1701.02720. http://arxiv.org/abs/1701.02720 (2017)
  15. 15.
    Vazquez, R.A., Sossa, H.: Associative Memories Applied to Image Categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 549–558. Springer, Heidelberg (2006)Google Scholar
  16. 16.
    Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of Hopfield network. In: IEEE Annual India Conference (INDICON), Bangalore, pp. 1–6 (2016)Google Scholar
  18. 18.
    Ladwani, V.M., Vaishnavi, Y., Shreyas, R., Vinay Kumar, B.R., Harisha, N., Yogesh, S., Shivaganga, P., Ramasubramanian, V.: Hopfield net framework for audio search. In: Communications (NCC), pp. 1–6.  https://doi.org/10.1109/ncc.2017.8077074 (2017)
  19. 19.
    Barra, A., Beccaria, M., Fachechi, A.: A relativistic extension of Hopfield neural networks via the mechanical analogy. arXiv:1801.01743v1 (2018)
  20. 20.
    Hamming, R.: Coding and Information Theory. Prentice-Hall, Englewood Cliffs (1968)zbMATHGoogle Scholar
  21. 21.
    Kosko, B.: Adaptive bidirectional associative memories. Appl. Opt. 26(23), 4947–4960 (1987)CrossRefGoogle Scholar
  22. 22.
    Willshaw, D.J., Buneman, O.P., Longuet-Higgins, H.C.: Non-holographic associative memory. Nature 222, 960–962 (1969)CrossRefGoogle Scholar
  23. 23.
    Stöckel, A.: Design Space Exploration of Associative Memories Using Spiking Neurons with Respect to Neuromorphic Hardware Implementations. Universität Bielefeld, Bielefeld (2016)Google Scholar
  24. 24.
    Vázquez, A.: New associative model with dynamical synapses. Neural Process. Lett. 28(3), 189–207 (2008)CrossRefGoogle Scholar
  25. 25.
    Vázquez, R. Sossa, H.: Voice translator based on associative memories. In: Advances in Neural Networks, pp. 341–350 (2008)Google Scholar
  26. 26.
    Minghu, J., Biqin, L., Baozong, Y.: Speech recognition by using the extended associative memory neural network (EAMNN). In: IEEE International Conference on Intelligent Processing Systems, vol. 2, pp. 1777–1780 (1997)Google Scholar
  27. 27.
    Krotov, D., Hopfield, J.: Dense associative memory for pattern recognition. In: Advances in Neural Information Processing Systems 29, pp. 1172–1180 (2016)Google Scholar
  28. 28.
    Giovanni, C.: Design of associative memory for gray-scale images by multilayer Hopfield neural networks. In: Proceedings of the 10th WSEAS International Conference on CIRCUITS, Vouliagmeni, Athens, Greece, pp. 376–379 (2006)Google Scholar
  29. 29.
    Sussner, P., Esmi, E., Villaverde, I., Graña, M.: The Kosko subsethood fuzzy associative memory (KS-FAM): mathematical background and applications in computer vision. J. Math. Imaging Vis. 42, 134–149 (2012)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Kohonen, T.: Self-organizing Maps, 3rd Extended edn. Springer, New York/Heidelberg (2001)CrossRefGoogle Scholar
  31. 31.
    Furao, S., Ouyang, Q., Kasai, W., Hasegawa, O.: A general associative memory based on self-organizing incremental neural network. Neurocomputing 104, 57–71 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tatiana Khorosheva
    • 1
  • Marina Novoseltseva
    • 1
  • Nazim Geidarov
    • 1
  • Nikolay Krivosheev
    • 1
  • Sergey Chernenko
    • 1
  1. 1.Kemerovo State UniversityKemerovoRussia

Personalised recommendations