Abstract
The article is devoted to the problem of ensuring reliable authentication of users of information systems for various purposes. The prospects of solving this problem through the use of voice signal analysis tools to recognize the speaker’s personality are shown. The main advantages of such tools include the increased durability of the biometric access code, the use of common registration tools, as well as the possibility of implementation of hidden monitoring of the user’s identity. The relevance of research in the direction of developing low-resource means of recognizing the speaker’s personality by voice fragments of a fixed duration, using only available computing power on the spot, is substantiated. Based on the analysis of literary works, the prospects of using neural network solutions are shown, the creation of which is complicated by the existing uncertainty in choosing the type of neural network model, as well as in determining the set of input parameters. As a result of the studies, it was determined that in the task of recognizing the speaker’s identity by voice fragments of a fixed duration, it is advisable to use a type of neural network model such as a two-layer perceptron, the input parameters of which are associated with small-cepstral coefficients characterizing each of the quasi-stationary fragments of the analyzed voice signal, and the output parameters match of recognizable speakers. By computing experiments, it is proved that each of the quasistationary fragments should be described using 20 chalk-cepstral coefficients. At the same time, the recognition accuracy of the speaker using a two-layer perceptron is at the level of the best modern means of this purpose and is 8% higher than the recognition accuracy using a convolutional neural network such as LeNet. The need for further research in the direction of adapting the parameters of the two-layer perceptron to the recognition conditions under the influence of various kinds of interference was also established.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchanov, B., Korchenko, A., Tereykovskiy, I., Bapiyev, I.: Perspectives for using classical neural network models and methods of counteracting attacks on network resources of information systems. News Natl. Acad. Sci. Republic Kazakhstan ser. Geol. Tech. Sci. 5(425), 202–212 (2017)
Jadhav, A.N., Dharwadkar, N.V.: A Speaker recognition system using Gaussian mixture model, EM algorithm and K-means clustering. Int. J. Mod. Educ. Comput. Sci. (IJMECS) 10(11), 19–28 (2018)
Akhmetov, B., Lakhno, V., Malyukov, V., Omarov, A., Abuova, K., Issaikin, D., Lakhno, M.: Developing a mathematical model and intellectual decision support system for the distribution of financial resources allocated for the elimination of emergency situations and technogenic accidents on railway transport. J. Theor. Appl. Inf. Technol. 97(16), 4401–4411 (2019)
Akhmetov, B., Tereykovsky, I., Doszhanova, A., Tereykovskaya, L.: Determination of input parameters of the neural network model, intended for phoneme recognition of a voice signal in the systems of distance learning. Int. J. Electron. Telecommun. 64(4), 425–432 (2018)
Altincay, H.: Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence. Speech Commun. 41(4), 531–547 (2003)
Campbell W., Sturim D., Reynolds D.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006b)
Drugman, T., Dutoit, T.: On the potential of glottal signatures for speaker recognition. In: Interspeech, pp. 2106–2109 (2010)
Dychka, I., Tereikovskyi, I., Tereikovska, L., Pogorelov, V., Mussiraliyeva, S.: Deobfuscation of computer virus malware code with value state dependence graph. In: Advances in Intelligent Systems and Computing, vol. 754, pp. 370–379 (2018)
Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014)
Nijhawan, G., Soni, M.K.: A new design approach for speaker recognition using MFCC and VAD. IJIGSP 5(9), 43–49 (2013)
Gnatyuk, S.: Critical aviation information systems cybersecurity. In: Meeting Security Challenges Through Data Analytics and Decision Support. NATO Science for Peace and Security Series, D: Information and Communication Security, vol. 47, no. 3, pp. 308–316. IOS Press Ebooks (2016)
Gnatyuk, S., Sydorenko, V., Aleksander, M.: Unified data model for defining state critical information infrastructure in civil aviation. In: Proceedings of the 2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT), Kyiv, Ukraine, 24–27 May 2018, pp. 37–42 (2018)
Hu, Z., Tereykovskiy, I., Zorin, Y., Tereykovska, L., Zhibek, A.: Optimization of convolutional neural network structure for biometric authentication by face geometry. In: Advances in Intelligent Systems and Computing, vol. 754, pp 567–577 (2018)
Ding Jr., I., Yen, C.-T., Hsu, Y.-M.: Developments of machine learning schemes for dynamic time-wrapping-based speech recognition. Math. Probl. Eng. 56–68 (2013)
Karam, Z., Campbell, W.: A new kernel for SVM MLLR based speaker recognition. In: Proceedings of Interspeech 2007, Antwerp, Belgium, August 2007, pp. 290–293 (2007)
Lakhno, V.A.: Algorithms for forming a knowledge base for decision support systems in cybersecurity tasks. In: Advances in Intelligent Systems and Computing, vol. 938, pp. 268–278 (2020)
Lakhno, V.A., Kasatkin, D.Y., Blozva, A.I., Gusev, B.S.: Method and model of analysis of possible threats in user authentication in electronic information educational environment of the university. In: Advances in Intelligent Systems and Computing, vol. 938, pp. 600–609 (2020)
McLaren, M., Lei, Y., Scheffer, N., Ferrer, L.: Application of convolutional neural networks to speaker recognition in noisy conditions. In: 15th Annual Conference of the International Speech Communication Association, Singapore, 14–18 September 2014, pp. 686–690. ISCA (2014)
Singh, S., Kumar, A., Kolluri, D.R.: Efficient modelling technique based speaker recognition under limited speech data. Int. J. Image Graph. Signal Process. (IJIGSP) 8(11), 41–48 (2016)
Sorokin, V.N.: Speaker verification using the spectral parameters of voice signal. J. Commun. Technol. Electron. 55(12), 156–157 (2010)
Tereikovskyi, I., Chernyshev, D., Tereikovska, L.A., Mussiraliyeva, S., Akhmed, G.: The procedure for the determination of structural parameters of a convolutional neural network to fingerprint recognition. J. Theor. Appl. Inf. Technol. 97(8), 2381–2392 (2019)
Zhang, W.-Q., Deng, Y., He, L., Liu, J.: Variant time-frequency cepstral features for speaker recognition. In: Interspeech, pp. 2122–2125 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, Z., Tereikovskyi, I., Korystin, O., Mihaylenko, V., Tereikovska, L. (2021). Two-Layer Perceptron for Voice Recognition of Speaker’s Identity. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education III. ICCSEEA 2020. Advances in Intelligent Systems and Computing, vol 1247. Springer, Cham. https://doi.org/10.1007/978-3-030-55506-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-55506-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55505-4
Online ISBN: 978-3-030-55506-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)