Abstract
This paper presents an integrated neural network for voiceprint verification. The system implements two types of deep architecture: ResNet-18 and SincNet to extract the acoustic features. The triplet loss function is used to distinguish same-speaker and different-speaker pairs based on cosine similarity during the training phase. Experiments on three different datasets reveal that integrated system exceeds the baseline of DNN-based i-vector. The system reduces equal error rates (EERs) from the baseline method by 59.7, 54.5 and 58% on the datasets—Voxceleb1, LibriSpeech and AISHELL-1, respectively. In addition, the integrated model decreases the EERs in the single models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Cumani, S., Plchot, O., Laface, P.: Probabilistic linear discriminant analysis of i-vector posterior distributions. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7644–7648. IEEE (2013)
Amodei, D., Ananthanarayanan, S., Anubhai, R., et al.: Deep speech 2: end-to-end speech recognition in english and mandarin. Int. Conf. Mach. Learn. 173–182 (2016)
Zhang, Y., Pezeshki, M., Brakel, P., et al.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)
Chung, J., Gulcehre, C., Cho, K.H., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Ravanelli, M., Bengio, Y.: Speaker recognition from raw waveform with sincnet. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028. IEEE (2018)
Acknowledgment
The project is supported by Sichuan Science and Technology Program (2018GZDZX0038).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jun, W., Yu, Z., Wenhao, S. (2021). A Novel Voiceprint Verification Technology Through Deep Neural Network. In: WU, C.H., PATNAIK, S., POPENTIU VLÃDICESCU, F., NAKAMATSU, K. (eds) Recent Developments in Intelligent Computing, Communication and Devices. ICCD 2019. Advances in Intelligent Systems and Computing, vol 1185. Springer, Singapore. https://doi.org/10.1007/978-981-15-5887-0_55
Download citation
DOI: https://doi.org/10.1007/978-981-15-5887-0_55
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5886-3
Online ISBN: 978-981-15-5887-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)