Children’s Speaker Recognition Method Based on Multi-dimensional Features

Jia, Ning; Zheng, Chunjun; Sun, Wei

doi:10.1007/978-3-030-35231-8_33

Children’s Speaker Recognition Method Based on Multi-dimensional Features

Ning Jia¹³,
Chunjun Zheng^13,14 &
Wei Sun¹³

Conference paper
First Online: 15 November 2019

1746 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11888))

Abstract

In life, the voice signals collected by people are essentially mixed signals, which mainly include information related to speaker characteristics, such as gender, age and emotional state. The commonality and characteristics of traditional single-dimensional speaker information recognition are analyzed, and children’s individualized analysis is carried out for common acoustic feature parameters such as prosodic features, sound quality features and spectral-based features. Therefore, considering the temporal characteristics of voice, combined with the Time-Delay Neural Network (TDNN) model, Bidirectional Long Short-Term Memory model and the attention mechanism, the multi-channel model is trained to form a speaker recognition problem solution for children’s speaker recognition. A large number of experimental results show that on the basis of guaranteeing the accuracy of age and gender recognition, higher accuracy of children’s voiceprint recognition can be obtained.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Phapatanaburi, K., Wang, L., Sakagami, R.: Distant-talking accent recognition by combining GMM and DNN. Multimedia Tools Appl. 75(9), 5109–5124 (2016)
Article Google Scholar
Wang, J., Yang, Y., Mao, J., et al.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)
Google Scholar
Jiang, H., Lu, Y., Xue, J.: Automatic soccer video event detection based on a deep neural network combined CNN and RNN. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence, pp. 490–494 (2016)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61(3), 85–94 (2014)
Google Scholar
Abdullah, H., Garcia, W., Peeters, C., et al.: Practical hidden voice attacks against speech and speaker recognition systems (2019)
Google Scholar
Mary, L.: Significance of Prosody for Speaker, Language, Emotion, and Speech Recognition (2019)
Google Scholar
Wang, Y., Fan, X., Chen, I.F., et al.: End-to-end anchored speech recognition (2019)
Google Scholar
Lakomkin, E., Zamani, M.A., Weber, C., et al.: Incorporating end-to-end speech recognition models for sentiment analysis (2019)
Google Scholar
Harb, H., Chen, L.: Vlice-based gender identification in multimedia applications. Int. J. Pattern Recogn. Artif. Intell. 19(2), 63–78 (2005)
Google Scholar
Liu, Z., Wu, Z., Li, T., et al.: GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans. Ind. Inform. 14(7), 3244–3252 (2018)
Article Google Scholar
Ravanelli, M., Bengio, Y.: Speech and speaker recognition from raw waveform with SincNet (2018)
Google Scholar
Parthasarathy, S., Busso, C.: Predicting speaker recognition reliability by considering emotional content. In: International Conference on Affective Computing & Intelligent Interaction (2017)
Google Scholar

Download references

Acknowledgment

This paper is funded by the “Dalian Key Laboratory for the Application of Big Data and Data Science”.

Author information

Authors and Affiliations

Dalian Neusoft University of Information, Dalian, Liaoning, China
Ning Jia, Chunjun Zheng & Wei Sun
Dalian Maritime University, Dalian, Liaoning, China
Chunjun Zheng

Authors

Ning Jia
View author publications
You can also search for this author in PubMed Google Scholar
Chunjun Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Jia .

Editor information

Editors and Affiliations

Deakin University, Burwood, VIC, Australia
Jianxin Li
The University of Queensland, St. Lucia, QLD, Australia
Sen Wang
Flinders University, Bedford Park, SA, Australia
Shaowen Qin
Dalian Neusoft University of Information, Dalian, China
Xue Li
Beijing Institute of Technology, Beijing, China
Shuliang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, N., Zheng, C., Sun, W. (2019). Children’s Speaker Recognition Method Based on Multi-dimensional Features. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2019. Lecture Notes in Computer Science(), vol 11888. Springer, Cham. https://doi.org/10.1007/978-3-030-35231-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-35231-8_33
Published: 15 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35230-1
Online ISBN: 978-3-030-35231-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics