Audio Replay Attack Detection for Speaker Verification System Using Convolutional Neural Networks

  • P. J. KemanthEmail author
  • Sujata SupanekarEmail author
  • Shashidhar G. KoolagudiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11942)


An audio replay attack is one of the most popular spoofing attacks on speaker verification systems because it is very economical and does not require much knowledge of signal processing. In this paper, we investigate the significance of non-voiced audio segments and deep learning models like Convolutional Neural Networks (CNN) for audio replay attack detection. The non-voiced segments of the audio can be used to detect reverberation and channel noise. FFT spectrograms are generated and given as input to CNN to classify the audio as genuine or replay. The advantage of the proposed approach is, because of the removal of the voiced speech, the feature vector size is reduced without compromising the necessary features. This leads to significant amount of reduction on training time of the networks. The ASVspoof 2017 dataset is used to train and evaluate the model. The Equal Error Rate (EER) is computed and used as a metric to evaluate model performance. The proposed system has achieved an EER of 5.62% on the development dataset and 12.47% on the evaluation dataset.


Audio replay Audio playback Deep learning CNN GMM 


  1. 1.
    Brümmer, N., De Villiers, E.: The bosaris toolkit: theory, algorithms and code for surviving the new dcf. arXiv preprint arXiv:1304.2865 (2013)
  2. 2.
    Chen, Z., Zhang, W., Xie, Z., Xu, X., Chen, D.: Recurrent neural networks for automatic replay spoofing attack detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2052–2056. IEEE (2018)Google Scholar
  3. 3.
    Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)Google Scholar
  4. 4.
    Faundez-Zanuy, M.: On the vulnerability of biometric security systems. IEEE Aerosp. Electron. Syst. Mag. 19(6), 3–8 (2004)CrossRefGoogle Scholar
  5. 5.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  6. 6.
    Kinnunen, T., et al..: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)Google Scholar
  7. 7.
    Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V.: Audio replay attack detection with deep learning frameworks. In: Interspeech, pp. 82–86 (2017)Google Scholar
  8. 8.
    LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 3361(10), 1995 (1995)Google Scholar
  9. 9.
    Lee, K.A., et al.: The RedDots data collection for speaker recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  10. 10.
    Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)
  11. 11.
    Ramırez, J., Segura, J.C., Benıtez, C., De La Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3–4), 271–287 (2004)CrossRefGoogle Scholar
  12. 12.
    Saranya, M., Murthy, H.: Decision-level feature switching as a paradigm for replay attack detection. In: 19th Annual Conference of the International Speech Communication Association, pp. 686–690 (2018)Google Scholar
  13. 13.
    Soong, F.K., Rosenberg, A.E., Juang, B.H., Rabiner, L.R.: Report: a vector quantization approach to speaker recognition. AT&T Techn. J. 66(2), 14–26 (1987)CrossRefGoogle Scholar
  14. 14.
    Todisco, M., Delgado, H., Evans, N.: A new feature for automatic speaker verification anti-spoofing: Constant q cepstral coefficients. In: Speaker Odyssey Workshop, Bilbao, Spain, vol. 25, pp. 249–252 (2016)Google Scholar
  15. 15.
    Villalba, J., Lleida, E.: Preventing replay attacks on speaker verification systems. In: 2011 Carnahan Conference on Security Technology, pp. 1–8. IEEE (2011)Google Scholar
  16. 16.
    Wu, X., He, R., Sun, Z., Tan, T.: A light CNN for deep face representation with noisy labels. IEEE Trans. Inf. Forensics Secur. 13(11), 2884–2896 (2018)CrossRefGoogle Scholar
  17. 17.
    Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology KarnatakaSurathkalIndia

Personalised recommendations