Automatic Control and Computer Sciences

, Volume 53, Issue 1, pp 72–79 | Cite as

Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum

  • Diqun YanEmail author
  • Li Xiang
  • Zhifeng Wang
  • Rangding Wang


Automatic speaker verification systems have achieved great performance and been widely adopted in many security applications. One of the important requirements for the verification system is its resilience to spoofing attacks, such as impersonation, replay, speech synthesis and voice conversion. Among these attacks, speech synthesis has a high risk to the verification systems. In this paper, a novel detection method for computer-generated speech, especially for HMM synthetic speech, is proposed. It is found that the wavelet coefficients in specified position show the obvious difference between the synthetic and natural speech. The logarithmic spectrum features are extracted from the wavelet coefficients and support vector machine is used as the classifier to evaluate the performance of our proposed algorithm. The experimental results over SAS corpus show that the proposed algorithm can achieve high detection accuracy and low equal error rate.


speech synthesis spoofing attack wavelet transform classification 



This work was supported by the National Natural Science Foundation of China (grant nos. 61300055, U1736215, 61672302), Zhejiang Natural Science Foundation (grant nos. LY17F020010, LZ15F020002), Ningbo Natural Science Foundation (grant no. 2017A610123), Ningbo University Fund (grant no. XKXL1503) and K.C. Wong Magna Fund in Ningbo University.


  1. 1.
    Kinnunen, T. and Li, H., An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., 2010, vol. 52, no. 2, pp. 12–40.CrossRefGoogle Scholar
  2. 2.
    Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., and Li, H., Spoofing and countermeasures for speaker verification: A survey, Speech Commun., 2015, vol. 66, pp. 130–153.CrossRefGoogle Scholar
  3. 3.
    Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., and Isogai, J., Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Audio Speech Lang. Process., 2009, vol. 17, no. 1, pp. 66–83.CrossRefGoogle Scholar
  4. 4.
    Evans, N., Kinnunen, T., and Yamagishi, J., Spoofing and countermeasures for automatic speaker verification, Proceedings of Annual Conference of the International Speech Communication Association, 2013, pp. 925–929.Google Scholar
  5. 5.
    Alegre, F., Vipperla, R., Evans, N., and Fauve, B., On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals, Proceedings of European Signal Processing Conference, 2012, pp. 36–40.Google Scholar
  6. 6.
    Satoh, T., Masuko, T., Kobayashi, T., and Tokuda, K., A robust speaker verification system against imposture using an HMM-based speech synthesis system, Proceedings of European Conference on Speech Communication and Technology, 2001, pp. 759–762.Google Scholar
  7. 7.
    Chen, L.W., Guo, W.L., and Dai, R., Speaker verification against synthetic speech, Proceedings of 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 309–312.Google Scholar
  8. 8.
    De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., and Saratxaga, I., Evaluation of speaker verification security and detection of HMM-based synthetic speech, IEEE Trans. Audio Speech Lang. Process., 2012, vol. 20, no. 8, pp. 2280–2290.CrossRefGoogle Scholar
  9. 9.
    Wu, Z., Chng, E.S., and Li, H., Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 1700–1703.Google Scholar
  10. 10.
    Ogihara, A., Unno, H., and Shiozakai, A., Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2005, vol. 88, no. 1, pp. 280–286.CrossRefGoogle Scholar
  11. 11.
    De Leon, P.L., Stewart, B., and Yamagishi, J., Synthetic speech discrimination using pitch pattern statistics derived from image analysis, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 370–373.Google Scholar
  12. 12.
    Daubechies, I., The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inf. Theory, 1990, vol. 36, no. 5, pp. 961–1005.MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Wu, Z., De Leon, P.L., Demiroglu, C., and Khodabakhsh, A., Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance, IEEE Trans. Audio Speech Lang. Process., 2016, vol. 24, no. 4, pp. 768–783.CrossRefGoogle Scholar
  14. 14.
    Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., SAS: A speaker verification spoofing database containing diverse attacks, Proceedings of International Conference on Acoustics, Speech and Signal Proceeding, 2015, pp. 4440–4444.Google Scholar
  15. 15.
    Zen, H., Tokuda, K., and Black, A.W., Statistical parametric speech synthesis, Speech Commun., 2009, vol. 51, no. 11, pp. 1039–1064.CrossRefGoogle Scholar
  16. 16.
    Chang, C.C. and Lin, C.J., LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2011, vol. 2, no. 3, pp. 1–27.CrossRefGoogle Scholar

Copyright information

© Allerton Press, Inc. 2019

Authors and Affiliations

  • Diqun Yan
    • 1
    • 2
    Email author
  • Li Xiang
    • 1
  • Zhifeng Wang
    • 1
  • Rangding Wang
    • 1
  1. 1.College of Information Science and Engineering, Ningbo UniversityNingboChina
  2. 2.Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media SecurityShenzhenChina

Personalised recommendations