Abstract
Many existing speaker recognition algorithms have the problem that single-domain feature extraction cannot represent the speech characteristics well, and this problem will affect the accuracy of speaker recognition. To solve this problem, we propose a time-frequency domain feature enhanced deep speaker (TFDS). The proposed algorithm can combine time domain and frequency domain, enhance the traditional MFCC feature extraction, and make up for the shortcomings of other algorithms that only extract features in a single domain. The deep speaker network architecture includes ResCNN, GRU, time averaging layer, style transformation layer, length normalization layer, and the loss is triple loss. Representation of experimental results performed on the librisspeech dataset results show that TFDS has higher accuracy and lower Equal Error Rate than deep speaker, and the time-frequency domain feature enhanced method can also be combined with other networks to improve the accuracy of speaker recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. Technical report (2018)
Cui, X., Goel, V., Saon, G.: Embedding-based speaker adaptive training of deep neural networks. arXiv preprint arXiv:1710.06937 (2017)
DiBiase, J., Silverman, H., Brandstein, M.: Microphone arrays: signal processing techniques and applications. In: chapter Robust Localization in Reverberant Rooms, pp. 157–180. Springer, Berlin (2001)
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Proceedings of the Interspeech 2017, pp. 2616–2620 (2017). https://doi.org/10.21437/Interspeech.2017-950
Pandey, A., Wang, D.: TCNN: temporal convolutional neural network for real-time speech enhancement in the time domain. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6875–6879. IEEE (2019)
Reynolds, D., et al.: The 2016 NIST speaker recognition evaluation (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Han, J., Zi, Y., Xiong, S. (2024). Improving Speaker Recognition by Time-Frequency Domain Feature Enhanced Method. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14326. Springer, Singapore. https://doi.org/10.1007/978-981-99-7022-3_33
Download citation
DOI: https://doi.org/10.1007/978-981-99-7022-3_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7021-6
Online ISBN: 978-981-99-7022-3
eBook Packages: Computer ScienceComputer Science (R0)