Skip to main content
Log in

Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network

  • Research Article
  • Pattern Recognition
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients (GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Y. H. Zheng. Development and application strategy of voiceprint recognition technology. Technology Wind, no. 21, pp. 9–10, 2017. DOI: https://doi.org/10.19392/j.cnki.l671-7341.201721007. (in Chinese)

    Google Scholar 

  2. Z. Lian, Y. Li, J. H. Tao, J. Huang, M. Y. Niu. Expression analysis based on face regions in real-world conditions. International Journal of Automation and Computing, vol. 17, no. 1, pp. 96–107, 2020. DOI: https://doi.org/10.1007/s11633-019-1176-9.

    Article  Google Scholar 

  3. T. Kinnunen, H. Z. Li. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, vol. 52, no. 1, pp. 12–40, 2010. DOI: https://doi.org/10.1016/j.specom.2009.08.009.

    Article  Google Scholar 

  4. J. H. Tao, J. Huang, Y. Li, Z. Lian, M. Y. Niu. Semi-supervised ladder networks for speech emotion recognition. International Journal of Automation and Computing, vol. 16, no. 4, pp. 437–448, 2019. DOI: https://doi.org/10.1007/s11633-019-1175-x.

    Article  Google Scholar 

  5. C. L. Zhang. Acoustic Study of Disguised Voice, Ph. D. dissertation, Nankai University, China, 2005. (in Chinese)

    Google Scholar 

  6. L. L. Stoll. Finding Difficult Speakers in Automatic Speaker Recognition, Ph. D. dissertation, University of California, USA, 2011.

    Google Scholar 

  7. A. R. Reich. Detecting the presence of vocal disguise in the male voice. The Journal of the Acoustical Society of America, vol.69, no.5, pp. 1458–1461, 1981. DOI: https://doi.org/10.1121/1.385778.

    Article  Google Scholar 

  8. H. Hollien, W. Majewski. Speaker identification by long-term spectra under normal and distorted speech conditions. The Journal of the Acoustical Society of America, vol. 62, no. 4, pp. 975–980, 1977. DOI: https://doi.org/10.1121/1.381592.

    Article  Google Scholar 

  9. X. H. Shen, T. Jin, C. Z. Zhang, R. C. Wan. Feasibility analysis on identification of disguised falsetto. Journal of Criminal Investigation Police University of China, no. 2, pp. 124–128, 2018. DOI: https://doi.org/10.14060/j.issn.2095-7939.2018.02.024. (in Chinese)

    Google Scholar 

  10. Y. Matveev. The problem of voice template aging in speaker recognition systems. In Proceedings of the 15th International Conference on Speech and Computer, Springer, Pilsen, Czech Republic, pp. 169–175, 2013. DOI: https://doi.org/10.1007/978-3-319-01931-4_46.

    Google Scholar 

  11. H. J. Wu, Y. Wang, J. W. Huang. Identification of electronic disguised voices. IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 489–500, 2014. DOI: https://doi.org/10.1109/TIFS.2014.2301912.

    Article  Google Scholar 

  12. Z. Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, S. King. SAS: A speaker verification spoofing database containing diverse attacks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, South Brisbane, Australia, pp. 4440–4444, 2015. DOI: https://doi.org/10.1109/ICASSP.2015.7178810.

    Google Scholar 

  13. Y. Wang, H. J. Wu, J. W. Huang. Verification of hidden speaker behind transformation disguised voices. Digital Signal Processing, vol.45, pp. 84–95, 2015. DOI: https://doi.org/10.1016/j.dsp.2015.06.010.

    Article  Google Scholar 

  14. W. Zhang. Auditory recognition of disguised speech. Science & Technology Vision, no. 13, pp. 10–12, 2016. DOI: https://doi.org/10.3969/j.issn.2095-2457.2016.13.005. (in Chinese)

    Google Scholar 

  15. Y. P. Li, L. Lin, D. Y. Tao. Research on identification of electronic disguised voice based on GMM statistical parameters. Computer Technology and Development, vol. 27, no. 1, pp. 103–106, 2017. (in Chinese)

    Google Scholar 

  16. P. Zhou, H. Shen, K. P. Zheng. Speaker recognition based on combination of MFCC and GFCC feature parameters. Journal of Applied Sciences, vol. 37, no. 1, pp. 24–32, 2019. DOI: https://doi.org/10.3969/j.issn.0255-8297.2019.01.003. (in Chinese)

    MathSciNet  Google Scholar 

  17. K. P. Zheng. The Research of Voiceprint Recognition Method Based on MFCC and GFCC Mixed Cepstrum, Master dissertation, Guilin University of Electronic Technology, China, 2017. (in Chinese)

    Google Scholar 

  18. J. Cao, P. Pan. Research on GMM based speaker recognition technology. Computer Engineering and Applications, vol. 47, no. 11, pp. 114–117, 2011. DOI: https://doi.org/10.3778/j.issn.1002-8331.2011.11.033. (in Chinese)

    Google Scholar 

  19. X. Yu, S. He, Y. X. Peng, W. Zhou. Pattern matching of voiceprint recognition based on GMM. Communications Technology, vol.48, no. 1, pp. 97–101, 2015. DOI: https://doi.org/10.3969/j.issn.1002-0802.2015.01.020. (in Chinese)

    Google Scholar 

  20. L. Lv. Research on Speaker Recognition Based on Deep Learning, Master dissertation, Southeast University, China, 2016. (in Chinese)

    Google Scholar 

  21. H. Pan. Design and Implementation of Speaker Recognition System Based on Deep Learning, Master dissertation, Heilongjiang University, China, 2016. (in Chinese)

    Google Scholar 

  22. N. Srivastava, G. Hinton, A. Krizhevsky, A. Sutskever, R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. DOI: https://doi.org/10.5555/2627435.2670313.

    MathSciNet  MATH  Google Scholar 

  23. Y. B. Xing, X. W. Zhang, C. Y. Zheng, T. Y. Cao. Establishment of bone-conducted speech database and mutual information analysis between bone and airconducted speeches. Technical Acoustics, vol. 38, no. 3, pp. 312–316, 2019. DOI: https://doi.org/10.16300/j.cnki.1000-3630.2019.03.013. (in Chinese)

    Google Scholar 

  24. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle. Greedy layer-wise training of deep networks. In Proceedings of the 19th International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 153–160, 2006. DOI: https://doi.org/10.5555/2976456.2976476.

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Liaoning Province (Nos. 2019-ZD-0168 and 2020-KF-12-11), Major Training Program of Criminal Investigation Police University of China (No. 3242019010), Key Research and Development Projects of Ministry of Science and Technology (No. 2017YFC0821005), and Second Batch of New Engineering Research and Practice Projects (No. E-AQGABQ20202710).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Liu.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Nan Jiang received the Ph.D. degree in engineering from Northeastern University, China in 2007, and completed her postdoctoral research in computer science and technology at Northeastern University, China in 2012. She is currently an associate professor in Department of Acoustic and Imaging Information Inspection Technology, Criminal Investigation Police University of China. She is the young talents of criminal science and technology in China, and has been selected into the Millions of Talents Project in Liaoning Province. During her school time, she has been engaged in teaching, scientific research and case handling of judicial voice examination and voice recognition. She has published more than 40 academic papers.

Her research interests include pattern recognition, speech recognition, speech emotion recognition, and multimodal emotion recognition.

Ting Liu received the B.Sc. degree in automation from Northeastern University, China in 2007 and the M.Sc. and Ph.D. degrees in control theory and control engineering from Northeastern University, China in 2009 and 2014, respectively. She is now a lecturer in electrical engineering and automation at Liaoning University, China, and has been engaged in the research of nonlinear algorithms and speech recognition algorithms.

Her research interests include pattern recognition, feedback control systems, and control theory.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, N., Liu, T. Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network. Int. J. Autom. Comput. 18, 947–962 (2021). https://doi.org/10.1007/s11633-021-1283-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-021-1283-2

Keywords

Navigation