Skip to main content

Two-Layer Perceptron for Voice Recognition of Speaker’s Identity

  • Conference paper
  • First Online:
Advances in Computer Science for Engineering and Education III (ICCSEEA 2020)

Abstract

The article is devoted to the problem of ensuring reliable authentication of users of information systems for various purposes. The prospects of solving this problem through the use of voice signal analysis tools to recognize the speaker’s personality are shown. The main advantages of such tools include the increased durability of the biometric access code, the use of common registration tools, as well as the possibility of implementation of hidden monitoring of the user’s identity. The relevance of research in the direction of developing low-resource means of recognizing the speaker’s personality by voice fragments of a fixed duration, using only available computing power on the spot, is substantiated. Based on the analysis of literary works, the prospects of using neural network solutions are shown, the creation of which is complicated by the existing uncertainty in choosing the type of neural network model, as well as in determining the set of input parameters. As a result of the studies, it was determined that in the task of recognizing the speaker’s identity by voice fragments of a fixed duration, it is advisable to use a type of neural network model such as a two-layer perceptron, the input parameters of which are associated with small-cepstral coefficients characterizing each of the quasi-stationary fragments of the analyzed voice signal, and the output parameters match of recognizable speakers. By computing experiments, it is proved that each of the quasistationary fragments should be described using 20 chalk-cepstral coefficients. At the same time, the recognition accuracy of the speaker using a two-layer perceptron is at the level of the best modern means of this purpose and is 8% higher than the recognition accuracy using a convolutional neural network such as LeNet. The need for further research in the direction of adapting the parameters of the two-layer perceptron to the recognition conditions under the influence of various kinds of interference was also established.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aitchanov, B., Korchenko, A., Tereykovskiy, I., Bapiyev, I.: Perspectives for using classical neural network models and methods of counteracting attacks on network resources of information systems. News Natl. Acad. Sci. Republic Kazakhstan ser. Geol. Tech. Sci. 5(425), 202–212 (2017)

    Google Scholar 

  2. Jadhav, A.N., Dharwadkar, N.V.: A Speaker recognition system using Gaussian mixture model, EM algorithm and K-means clustering. Int. J. Mod. Educ. Comput. Sci. (IJMECS) 10(11), 19–28 (2018)

    Article  Google Scholar 

  3. Akhmetov, B., Lakhno, V., Malyukov, V., Omarov, A., Abuova, K., Issaikin, D., Lakhno, M.: Developing a mathematical model and intellectual decision support system for the distribution of financial resources allocated for the elimination of emergency situations and technogenic accidents on railway transport. J. Theor. Appl. Inf. Technol. 97(16), 4401–4411 (2019)

    Google Scholar 

  4. Akhmetov, B., Tereykovsky, I., Doszhanova, A., Tereykovskaya, L.: Determination of input parameters of the neural network model, intended for phoneme recognition of a voice signal in the systems of distance learning. Int. J. Electron. Telecommun. 64(4), 425–432 (2018)

    Google Scholar 

  5. Altincay, H.: Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence. Speech Commun. 41(4), 531–547 (2003)

    Article  Google Scholar 

  6. Campbell W., Sturim D., Reynolds D.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006b)

    Google Scholar 

  7. Drugman, T., Dutoit, T.: On the potential of glottal signatures for speaker recognition. In: Interspeech, pp. 2106–2109 (2010)

    Google Scholar 

  8. Dychka, I., Tereikovskyi, I., Tereikovska, L., Pogorelov, V., Mussiraliyeva, S.: Deobfuscation of computer virus malware code with value state dependence graph. In: Advances in Intelligent Systems and Computing, vol. 754, pp. 370–379 (2018)

    Google Scholar 

  9. Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014)

    Google Scholar 

  10. Nijhawan, G., Soni, M.K.: A new design approach for speaker recognition using MFCC and VAD. IJIGSP 5(9), 43–49 (2013)

    Article  Google Scholar 

  11. Gnatyuk, S.: Critical aviation information systems cybersecurity. In: Meeting Security Challenges Through Data Analytics and Decision Support. NATO Science for Peace and Security Series, D: Information and Communication Security, vol. 47, no. 3, pp. 308–316. IOS Press Ebooks (2016)

    Google Scholar 

  12. Gnatyuk, S., Sydorenko, V., Aleksander, M.: Unified data model for defining state critical information infrastructure in civil aviation. In: Proceedings of the 2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT), Kyiv, Ukraine, 24–27 May 2018, pp. 37–42 (2018)

    Google Scholar 

  13. Hu, Z., Tereykovskiy, I., Zorin, Y., Tereykovska, L., Zhibek, A.: Optimization of convolutional neural network structure for biometric authentication by face geometry. In: Advances in Intelligent Systems and Computing, vol. 754, pp 567–577 (2018)

    Google Scholar 

  14. Ding Jr., I., Yen, C.-T., Hsu, Y.-M.: Developments of machine learning schemes for dynamic time-wrapping-based speech recognition. Math. Probl. Eng. 56–68 (2013)

    Google Scholar 

  15. Karam, Z., Campbell, W.: A new kernel for SVM MLLR based speaker recognition. In: Proceedings of Interspeech 2007, Antwerp, Belgium, August 2007, pp. 290–293 (2007)

    Google Scholar 

  16. Lakhno, V.A.: Algorithms for forming a knowledge base for decision support systems in cybersecurity tasks. In: Advances in Intelligent Systems and Computing, vol. 938, pp. 268–278 (2020)

    Google Scholar 

  17. Lakhno, V.A., Kasatkin, D.Y., Blozva, A.I., Gusev, B.S.: Method and model of analysis of possible threats in user authentication in electronic information educational environment of the university. In: Advances in Intelligent Systems and Computing, vol. 938, pp. 600–609 (2020)

    Google Scholar 

  18. McLaren, M., Lei, Y., Scheffer, N., Ferrer, L.: Application of convolutional neural networks to speaker recognition in noisy conditions. In: 15th Annual Conference of the International Speech Communication Association, Singapore, 14–18 September 2014, pp. 686–690. ISCA (2014)

    Google Scholar 

  19. Singh, S., Kumar, A., Kolluri, D.R.: Efficient modelling technique based speaker recognition under limited speech data. Int. J. Image Graph. Signal Process. (IJIGSP) 8(11), 41–48 (2016)

    Article  Google Scholar 

  20. Sorokin, V.N.: Speaker verification using the spectral parameters of voice signal. J. Commun. Technol. Electron. 55(12), 156–157 (2010)

    Google Scholar 

  21. Tereikovskyi, I., Chernyshev, D., Tereikovska, L.A., Mussiraliyeva, S., Akhmed, G.: The procedure for the determination of structural parameters of a convolutional neural network to fingerprint recognition. J. Theor. Appl. Inf. Technol. 97(8), 2381–2392 (2019)

    Google Scholar 

  22. Zhang, W.-Q., Deng, Y., He, L., Liu, J.: Variant time-frequency cepstral features for speaker recognition. In: Interspeech, pp. 2122–2125 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ihor Tereikovskyi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, Z., Tereikovskyi, I., Korystin, O., Mihaylenko, V., Tereikovska, L. (2021). Two-Layer Perceptron for Voice Recognition of Speaker’s Identity. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education III. ICCSEEA 2020. Advances in Intelligent Systems and Computing, vol 1247. Springer, Cham. https://doi.org/10.1007/978-3-030-55506-1_46

Download citation

Publish with us

Policies and ethics