Skip to main content

Children’s Speaker Recognition Method Based on Multi-dimensional Features

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11888))

Abstract

In life, the voice signals collected by people are essentially mixed signals, which mainly include information related to speaker characteristics, such as gender, age and emotional state. The commonality and characteristics of traditional single-dimensional speaker information recognition are analyzed, and children’s individualized analysis is carried out for common acoustic feature parameters such as prosodic features, sound quality features and spectral-based features. Therefore, considering the temporal characteristics of voice, combined with the Time-Delay Neural Network (TDNN) model, Bidirectional Long Short-Term Memory model and the attention mechanism, the multi-channel model is trained to form a speaker recognition problem solution for children’s speaker recognition. A large number of experimental results show that on the basis of guaranteeing the accuracy of age and gender recognition, higher accuracy of children’s voiceprint recognition can be obtained.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Phapatanaburi, K., Wang, L., Sakagami, R.: Distant-talking accent recognition by combining GMM and DNN. Multimedia Tools Appl. 75(9), 5109–5124 (2016)

    Article  Google Scholar 

  2. Wang, J., Yang, Y., Mao, J., et al.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)

    Google Scholar 

  3. Jiang, H., Lu, Y., Xue, J.: Automatic soccer video event detection based on a deep neural network combined CNN and RNN. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence, pp. 490–494 (2016)

    Google Scholar 

  4. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61(3), 85–94 (2014)

    Google Scholar 

  5. Abdullah, H., Garcia, W., Peeters, C., et al.: Practical hidden voice attacks against speech and speaker recognition systems (2019)

    Google Scholar 

  6. Mary, L.: Significance of Prosody for Speaker, Language, Emotion, and Speech Recognition (2019)

    Google Scholar 

  7. Wang, Y., Fan, X., Chen, I.F., et al.: End-to-end anchored speech recognition (2019)

    Google Scholar 

  8. Lakomkin, E., Zamani, M.A., Weber, C., et al.: Incorporating end-to-end speech recognition models for sentiment analysis (2019)

    Google Scholar 

  9. Harb, H., Chen, L.: Vlice-based gender identification in multimedia applications. Int. J. Pattern Recogn. Artif. Intell. 19(2), 63–78 (2005)

    Google Scholar 

  10. Liu, Z., Wu, Z., Li, T., et al.: GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans. Ind. Inform. 14(7), 3244–3252 (2018)

    Article  Google Scholar 

  11. Ravanelli, M., Bengio, Y.: Speech and speaker recognition from raw waveform with SincNet (2018)

    Google Scholar 

  12. Parthasarathy, S., Busso, C.: Predicting speaker recognition reliability by considering emotional content. In: International Conference on Affective Computing & Intelligent Interaction (2017)

    Google Scholar 

Download references

Acknowledgment

This paper is funded by the “Dalian Key Laboratory for the Application of Big Data and Data Science”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Jia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jia, N., Zheng, C., Sun, W. (2019). Children’s Speaker Recognition Method Based on Multi-dimensional Features. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2019. Lecture Notes in Computer Science(), vol 11888. Springer, Cham. https://doi.org/10.1007/978-3-030-35231-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35231-8_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35230-1

  • Online ISBN: 978-3-030-35231-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics