Abstract
The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients (GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.
Similar content being viewed by others
References
Y. H. Zheng. Development and application strategy of voiceprint recognition technology. Technology Wind, no. 21, pp. 9–10, 2017. DOI: https://doi.org/10.19392/j.cnki.l671-7341.201721007. (in Chinese)
Z. Lian, Y. Li, J. H. Tao, J. Huang, M. Y. Niu. Expression analysis based on face regions in real-world conditions. International Journal of Automation and Computing, vol. 17, no. 1, pp. 96–107, 2020. DOI: https://doi.org/10.1007/s11633-019-1176-9.
T. Kinnunen, H. Z. Li. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, vol. 52, no. 1, pp. 12–40, 2010. DOI: https://doi.org/10.1016/j.specom.2009.08.009.
J. H. Tao, J. Huang, Y. Li, Z. Lian, M. Y. Niu. Semi-supervised ladder networks for speech emotion recognition. International Journal of Automation and Computing, vol. 16, no. 4, pp. 437–448, 2019. DOI: https://doi.org/10.1007/s11633-019-1175-x.
C. L. Zhang. Acoustic Study of Disguised Voice, Ph. D. dissertation, Nankai University, China, 2005. (in Chinese)
L. L. Stoll. Finding Difficult Speakers in Automatic Speaker Recognition, Ph. D. dissertation, University of California, USA, 2011.
A. R. Reich. Detecting the presence of vocal disguise in the male voice. The Journal of the Acoustical Society of America, vol.69, no.5, pp. 1458–1461, 1981. DOI: https://doi.org/10.1121/1.385778.
H. Hollien, W. Majewski. Speaker identification by long-term spectra under normal and distorted speech conditions. The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 975–980, 1977. DOI: https://doi.org/10.1121/1.381592.
X. H. Shen, T. Jin, C. Z. Zhang, R. C. Wan. Feasibility analysis on identification of disguised falsetto. Journal of Criminal Investigation Police University of China, no. 2, pp. 124–128, 2018. DOI: https://doi.org/10.14060/j.issn.2095-7939.2018.02.024. (in Chinese)
Y. Matveev. The problem of voice template aging in speaker recognition systems. In Proceedings of the 15th International Conference on Speech and Computer, Springer, Pilsen, Czech Republic, pp. 169–175, 2013. DOI: https://doi.org/10.1007/978-3-319-01931-4_46.
H. J. Wu, Y. Wang, J. W. Huang. Identification of electronic disguised voices. IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 489–500, 2014. DOI: https://doi.org/10.1109/TIFS.2014.2301912.
Z. Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, S. King. SAS: A speaker verification spoofing database containing diverse attacks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, South Brisbane, Australia, pp. 4440–4444, 2015. DOI: https://doi.org/10.1109/ICASSP.2015.7178810.
Y. Wang, H. J. Wu, J. W. Huang. Verification of hidden speaker behind transformation disguised voices. Digital Signal Processing, vol.45, pp. 84–95, 2015. DOI: https://doi.org/10.1016/j.dsp.2015.06.010.
W. Zhang. Auditory recognition of disguised speech. Science & Technology Vision, no. 13, pp. 10–12, 2016. DOI: https://doi.org/10.3969/j.issn.2095-2457.2016.13.005. (in Chinese)
Y. P. Li, L. Lin, D. Y. Tao. Research on identification of electronic disguised voice based on GMM statistical parameters. Computer Technology and Development, vol. 27, no. 1, pp. 103–106, 2017. (in Chinese)
P. Zhou, H. Shen, K. P. Zheng. Speaker recognition based on combination of MFCC and GFCC feature parameters. Journal of Applied Sciences, vol. 37, no. 1, pp. 24–32, 2019. DOI: https://doi.org/10.3969/j.issn.0255-8297.2019.01.003. (in Chinese)
K. P. Zheng. The Research of Voiceprint Recognition Method Based on MFCC and GFCC Mixed Cepstrum, Master dissertation, Guilin University of Electronic Technology, China, 2017. (in Chinese)
J. Cao, P. Pan. Research on GMM based speaker recognition technology. Computer Engineering and Applications, vol. 47, no. 11, pp. 114–117, 2011. DOI: https://doi.org/10.3778/j.issn.1002-8331.2011.11.033. (in Chinese)
X. Yu, S. He, Y. X. Peng, W. Zhou. Pattern matching of voiceprint recognition based on GMM. Communications Technology, vol.48, no. 1, pp. 97–101, 2015. DOI: https://doi.org/10.3969/j.issn.1002-0802.2015.01.020. (in Chinese)
L. Lv. Research on Speaker Recognition Based on Deep Learning, Master dissertation, Southeast University, China, 2016. (in Chinese)
H. Pan. Design and Implementation of Speaker Recognition System Based on Deep Learning, Master dissertation, Heilongjiang University, China, 2016. (in Chinese)
N. Srivastava, G. Hinton, A. Krizhevsky, A. Sutskever, R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. DOI: https://doi.org/10.5555/2627435.2670313.
Y. B. Xing, X. W. Zhang, C. Y. Zheng, T. Y. Cao. Establishment of bone-conducted speech database and mutual information analysis between bone and airconducted speeches. Technical Acoustics, vol. 38, no. 3, pp. 312–316, 2019. DOI: https://doi.org/10.16300/j.cnki.1000-3630.2019.03.013. (in Chinese)
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle. Greedy layer-wise training of deep networks. In Proceedings of the 19th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 153–160, 2006. DOI: https://doi.org/10.5555/2976456.2976476.
Acknowledgements
This work was supported by Natural Science Foundation of Liaoning Province (Nos. 2019-ZD-0168 and 2020-KF-12-11), Major Training Program of Criminal Investigation Police University of China (No. 3242019010), Key Research and Development Projects of Ministry of Science and Technology (No. 2017YFC0821005), and Second Batch of New Engineering Research and Practice Projects (No. E-AQGABQ20202710).
Author information
Authors and Affiliations
Corresponding author
Additional information
Colored figures are available in the online version at https://link.springer.com/journal/11633
Nan Jiang received the Ph.D. degree in engineering from Northeastern University, China in 2007, and completed her postdoctoral research in computer science and technology at Northeastern University, China in 2012. She is currently an associate professor in Department of Acoustic and Imaging Information Inspection Technology, Criminal Investigation Police University of China. She is the young talents of criminal science and technology in China, and has been selected into the Millions of Talents Project in Liaoning Province. During her school time, she has been engaged in teaching, scientific research and case handling of judicial voice examination and voice recognition. She has published more than 40 academic papers.
Her research interests include pattern recognition, speech recognition, speech emotion recognition, and multimodal emotion recognition.
Ting Liu received the B.Sc. degree in automation from Northeastern University, China in 2007 and the M.Sc. and Ph.D. degrees in control theory and control engineering from Northeastern University, China in 2009 and 2014, respectively. She is now a lecturer in electrical engineering and automation at Liaoning University, China, and has been engaged in the research of nonlinear algorithms and speech recognition algorithms.
Her research interests include pattern recognition, feedback control systems, and control theory.
Rights and permissions
About this article
Cite this article
Jiang, N., Liu, T. Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network. Int. J. Autom. Comput. 18, 947–962 (2021). https://doi.org/10.1007/s11633-021-1283-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-021-1283-2