Abstract
In life, the voice signals collected by people are essentially mixed signals, which mainly include information related to speaker characteristics, such as gender, age and emotional state. The commonality and characteristics of traditional single-dimensional speaker information recognition are analyzed, and children’s individualized analysis is carried out for common acoustic feature parameters such as prosodic features, sound quality features and spectral-based features. Therefore, considering the temporal characteristics of voice, combined with the Time-Delay Neural Network (TDNN) model, Bidirectional Long Short-Term Memory model and the attention mechanism, the multi-channel model is trained to form a speaker recognition problem solution for children’s speaker recognition. A large number of experimental results show that on the basis of guaranteeing the accuracy of age and gender recognition, higher accuracy of children’s voiceprint recognition can be obtained.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Phapatanaburi, K., Wang, L., Sakagami, R.: Distant-talking accent recognition by combining GMM and DNN. Multimedia Tools Appl. 75(9), 5109–5124 (2016)
Wang, J., Yang, Y., Mao, J., et al.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)
Jiang, H., Lu, Y., Xue, J.: Automatic soccer video event detection based on a deep neural network combined CNN and RNN. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence, pp. 490–494 (2016)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61(3), 85–94 (2014)
Abdullah, H., Garcia, W., Peeters, C., et al.: Practical hidden voice attacks against speech and speaker recognition systems (2019)
Mary, L.: Significance of Prosody for Speaker, Language, Emotion, and Speech Recognition (2019)
Wang, Y., Fan, X., Chen, I.F., et al.: End-to-end anchored speech recognition (2019)
Lakomkin, E., Zamani, M.A., Weber, C., et al.: Incorporating end-to-end speech recognition models for sentiment analysis (2019)
Harb, H., Chen, L.: Vlice-based gender identification in multimedia applications. Int. J. Pattern Recogn. Artif. Intell. 19(2), 63–78 (2005)
Liu, Z., Wu, Z., Li, T., et al.: GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans. Ind. Inform. 14(7), 3244–3252 (2018)
Ravanelli, M., Bengio, Y.: Speech and speaker recognition from raw waveform with SincNet (2018)
Parthasarathy, S., Busso, C.: Predicting speaker recognition reliability by considering emotional content. In: International Conference on Affective Computing & Intelligent Interaction (2017)
Acknowledgment
This paper is funded by the “Dalian Key Laboratory for the Application of Big Data and Data Science”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jia, N., Zheng, C., Sun, W. (2019). Children’s Speaker Recognition Method Based on Multi-dimensional Features. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2019. Lecture Notes in Computer Science(), vol 11888. Springer, Cham. https://doi.org/10.1007/978-3-030-35231-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-35231-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35230-1
Online ISBN: 978-3-030-35231-8
eBook Packages: Computer ScienceComputer Science (R0)