Improving Speaker Recognition by Time-Frequency Domain Feature Enhanced Method

Han, Jin; Zi, Yunfei; Xiong, Shengwu

doi:10.1007/978-981-99-7022-3_33

Jin Han¹²,
Yunfei Zi¹² &
Shengwu Xiong^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14326))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

528 Accesses

Abstract

Many existing speaker recognition algorithms have the problem that single-domain feature extraction cannot represent the speech characteristics well, and this problem will affect the accuracy of speaker recognition. To solve this problem, we propose a time-frequency domain feature enhanced deep speaker (TFDS). The proposed algorithm can combine time domain and frequency domain, enhance the traditional MFCC feature extraction, and make up for the shortcomings of other algorithms that only extract features in a single domain. The deep speaker network architecture includes ResCNN, GRU, time averaging layer, style transformation layer, length normalization layer, and the loss is triple loss. Representation of experimental results performed on the librisspeech dataset results show that TFDS has higher accuracy and lower Equal Error Rate than deep speaker, and the time-frequency domain feature enhanced method can also be combined with other networks to improve the accuracy of speaker recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A deep learning approach for speaker recognition

Article 18 December 2019

End-to-end speaker identification research based on multi-scale SincNet and CGAN

Article 02 August 2023

TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition

Article 18 February 2022

References

Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. Technical report (2018)
Google Scholar
Cui, X., Goel, V., Saon, G.: Embedding-based speaker adaptive training of deep neural networks. arXiv preprint arXiv:1710.06937 (2017)
DiBiase, J., Silverman, H., Brandstein, M.: Microphone arrays: signal processing techniques and applications. In: chapter Robust Localization in Reverberant Rooms, pp. 157–180. Springer, Berlin (2001)
Google Scholar
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Proceedings of the Interspeech 2017, pp. 2616–2620 (2017). https://doi.org/10.21437/Interspeech.2017-950
Pandey, A., Wang, D.: TCNN: temporal convolutional neural network for real-time speech enhancement in the time domain. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6875–6879. IEEE (2019)
Google Scholar
Reynolds, D., et al.: The 2016 NIST speaker recognition evaluation (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
Jin Han, Yunfei Zi & Shengwu Xiong
Sanya Science and Education Innovation Park, Wuhan University of Technology, Wuhan, 572000, Hainan, China
Shengwu Xiong

Authors

Jin Han
View author publications
You can also search for this author in PubMed Google Scholar
Yunfei Zi
View author publications
You can also search for this author in PubMed Google Scholar
Shengwu Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengwu Xiong .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS (Malaysia), Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, J., Zi, Y., Xiong, S. (2024). Improving Speaker Recognition by Time-Frequency Domain Feature Enhanced Method. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14326. Springer, Singapore. https://doi.org/10.1007/978-981-99-7022-3_33

Download citation

DOI: https://doi.org/10.1007/978-981-99-7022-3_33
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7021-6
Online ISBN: 978-981-99-7022-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Speaker Recognition by Time-Frequency Domain Feature Enhanced Method

Abstract

Access this chapter

Similar content being viewed by others

A deep learning approach for speaker recognition

End-to-end speaker identification research based on multi-scale SincNet and CGAN

TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Speaker Recognition by Time-Frequency Domain Feature Enhanced Method

Abstract

Access this chapter

Similar content being viewed by others

A deep learning approach for speaker recognition

End-to-end speaker identification research based on multi-scale SincNet and CGAN

TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation