Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network

Jiang, Nan; Liu, Ting

doi:10.1007/s11633-021-1283-2

Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network

Research Article
Pattern Recognition
Published: 12 March 2021

Volume 18, pages 947–962, (2021)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

154 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients (GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of Voiceprint Using Deep Neural Network Combined with Support Vector Machine

FDBN: Design and development of Fractional Deep Belief Networks for speaker emotion recognition

Article 30 September 2016

Speaker recognition with hybrid features from a deep belief network

Article 17 August 2016

References

Y. H. Zheng. Development and application strategy of voiceprint recognition technology. Technology Wind, no. 21, pp. 9–10, 2017. DOI: https://doi.org/10.19392/j.cnki.l671-7341.201721007. (in Chinese)
Google Scholar
Z. Lian, Y. Li, J. H. Tao, J. Huang, M. Y. Niu. Expression analysis based on face regions in real-world conditions. International Journal of Automation and Computing, vol. 17, no. 1, pp. 96–107, 2020. DOI: https://doi.org/10.1007/s11633-019-1176-9.
Article Google Scholar
T. Kinnunen, H. Z. Li. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, vol. 52, no. 1, pp. 12–40, 2010. DOI: https://doi.org/10.1016/j.specom.2009.08.009.
Article Google Scholar
J. H. Tao, J. Huang, Y. Li, Z. Lian, M. Y. Niu. Semi-supervised ladder networks for speech emotion recognition. International Journal of Automation and Computing, vol. 16, no. 4, pp. 437–448, 2019. DOI: https://doi.org/10.1007/s11633-019-1175-x.
Article Google Scholar
C. L. Zhang. Acoustic Study of Disguised Voice, Ph. D. dissertation, Nankai University, China, 2005. (in Chinese)
Google Scholar
L. L. Stoll. Finding Difficult Speakers in Automatic Speaker Recognition, Ph. D. dissertation, University of California, USA, 2011.
Google Scholar
A. R. Reich. Detecting the presence of vocal disguise in the male voice. The Journal of the Acoustical Society of America, vol.69, no.5, pp. 1458–1461, 1981. DOI: https://doi.org/10.1121/1.385778.
Article Google Scholar
H. Hollien, W. Majewski. Speaker identification by long-term spectra under normal and distorted speech conditions. The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 975–980, 1977. DOI: https://doi.org/10.1121/1.381592.
Article Google Scholar
X. H. Shen, T. Jin, C. Z. Zhang, R. C. Wan. Feasibility analysis on identification of disguised falsetto. Journal of Criminal Investigation Police University of China, no. 2, pp. 124–128, 2018. DOI: https://doi.org/10.14060/j.issn.2095-7939.2018.02.024. (in Chinese)
Google Scholar
Y. Matveev. The problem of voice template aging in speaker recognition systems. In Proceedings of the 15th International Conference on Speech and Computer, Springer, Pilsen, Czech Republic, pp. 169–175, 2013. DOI: https://doi.org/10.1007/978-3-319-01931-4_46.
Google Scholar
H. J. Wu, Y. Wang, J. W. Huang. Identification of electronic disguised voices. IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 489–500, 2014. DOI: https://doi.org/10.1109/TIFS.2014.2301912.
Article Google Scholar
Z. Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, S. King. SAS: A speaker verification spoofing database containing diverse attacks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, South Brisbane, Australia, pp. 4440–4444, 2015. DOI: https://doi.org/10.1109/ICASSP.2015.7178810.
Google Scholar
Y. Wang, H. J. Wu, J. W. Huang. Verification of hidden speaker behind transformation disguised voices. Digital Signal Processing, vol.45, pp. 84–95, 2015. DOI: https://doi.org/10.1016/j.dsp.2015.06.010.
Article Google Scholar
W. Zhang. Auditory recognition of disguised speech. Science & Technology Vision, no. 13, pp. 10–12, 2016. DOI: https://doi.org/10.3969/j.issn.2095-2457.2016.13.005. (in Chinese)
Google Scholar
Y. P. Li, L. Lin, D. Y. Tao. Research on identification of electronic disguised voice based on GMM statistical parameters. Computer Technology and Development, vol. 27, no. 1, pp. 103–106, 2017. (in Chinese)
Google Scholar
P. Zhou, H. Shen, K. P. Zheng. Speaker recognition based on combination of MFCC and GFCC feature parameters. Journal of Applied Sciences, vol. 37, no. 1, pp. 24–32, 2019. DOI: https://doi.org/10.3969/j.issn.0255-8297.2019.01.003. (in Chinese)
MathSciNet Google Scholar
K. P. Zheng. The Research of Voiceprint Recognition Method Based on MFCC and GFCC Mixed Cepstrum, Master dissertation, Guilin University of Electronic Technology, China, 2017. (in Chinese)
Google Scholar
J. Cao, P. Pan. Research on GMM based speaker recognition technology. Computer Engineering and Applications, vol. 47, no. 11, pp. 114–117, 2011. DOI: https://doi.org/10.3778/j.issn.1002-8331.2011.11.033. (in Chinese)
Google Scholar
X. Yu, S. He, Y. X. Peng, W. Zhou. Pattern matching of voiceprint recognition based on GMM. Communications Technology, vol.48, no. 1, pp. 97–101, 2015. DOI: https://doi.org/10.3969/j.issn.1002-0802.2015.01.020. (in Chinese)
Google Scholar
L. Lv. Research on Speaker Recognition Based on Deep Learning, Master dissertation, Southeast University, China, 2016. (in Chinese)
Google Scholar
H. Pan. Design and Implementation of Speaker Recognition System Based on Deep Learning, Master dissertation, Heilongjiang University, China, 2016. (in Chinese)
Google Scholar
N. Srivastava, G. Hinton, A. Krizhevsky, A. Sutskever, R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. DOI: https://doi.org/10.5555/2627435.2670313.
MathSciNet MATH Google Scholar
Y. B. Xing, X. W. Zhang, C. Y. Zheng, T. Y. Cao. Establishment of bone-conducted speech database and mutual information analysis between bone and airconducted speeches. Technical Acoustics, vol. 38, no. 3, pp. 312–316, 2019. DOI: https://doi.org/10.16300/j.cnki.1000-3630.2019.03.013. (in Chinese)
Google Scholar
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle. Greedy layer-wise training of deep networks. In Proceedings of the 19th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 153–160, 2006. DOI: https://doi.org/10.5555/2976456.2976476.

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Liaoning Province (Nos. 2019-ZD-0168 and 2020-KF-12-11), Major Training Program of Criminal Investigation Police University of China (No. 3242019010), Key Research and Development Projects of Ministry of Science and Technology (No. 2017YFC0821005), and Second Batch of New Engineering Research and Practice Projects (No. E-AQGABQ20202710).

Author information

Authors and Affiliations

College of Public Security Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang, 110854, China
Nan Jiang
College of Light Industry, Liaoning University, Shenyang, 110036, China
Ting Liu

Authors

Nan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting Liu.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Nan Jiang received the Ph.D. degree in engineering from Northeastern University, China in 2007, and completed her postdoctoral research in computer science and technology at Northeastern University, China in 2012. She is currently an associate professor in Department of Acoustic and Imaging Information Inspection Technology, Criminal Investigation Police University of China. She is the young talents of criminal science and technology in China, and has been selected into the Millions of Talents Project in Liaoning Province. During her school time, she has been engaged in teaching, scientific research and case handling of judicial voice examination and voice recognition. She has published more than 40 academic papers.

Her research interests include pattern recognition, speech recognition, speech emotion recognition, and multimodal emotion recognition.

Ting Liu received the B.Sc. degree in automation from Northeastern University, China in 2007 and the M.Sc. and Ph.D. degrees in control theory and control engineering from Northeastern University, China in 2009 and 2014, respectively. She is now a lecturer in electrical engineering and automation at Liaoning University, China, and has been engaged in the research of nonlinear algorithms and speech recognition algorithms.

Her research interests include pattern recognition, feedback control systems, and control theory.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, N., Liu, T. Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network. Int. J. Autom. Comput. 18, 947–962 (2021). https://doi.org/10.1007/s11633-021-1283-2

Download citation

Received: 15 August 2020
Accepted: 22 January 2021
Published: 12 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11633-021-1283-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network

Abstract

Access this article

Similar content being viewed by others

Recognition of Voiceprint Using Deep Neural Network Combined with Support Vector Machine

FDBN: Design and development of Fractional Deep Belief Networks for speaker emotion recognition

Speaker recognition with hybrid features from a deep belief network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network

Abstract

Access this article

Similar content being viewed by others

Recognition of Voiceprint Using Deep Neural Network Combined with Support Vector Machine

FDBN: Design and development of Fractional Deep Belief Networks for speaker emotion recognition

Speaker recognition with hybrid features from a deep belief network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation