Skip to main content
Log in

Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Detecting double compressed (DC) speech signals is an important audio forensics task since it is highly related to the integrity and the authenticity of the recording. Adaptive multi-rate (AMR) speech codec is a popular audio compression technique specifically optimized for speech signals and it is a standard audio recording format in the vast majority of the smart phones. All of the previous studies addressing the detection of DC AMR signals report their findings for the speech signals compressed using the narrowband AMR codec (AMR-NB). Meanwhile, wideband AMR codec (AMR-WB) has been used by several mobile phone manufacturers, but DC AMR-WB speech signal detection performance remains unknown. To the best of our knowledge, this is the first study focusing on detecting the DC signals compressed using the AMR-WB speech codec. To this end, we propose three different deep neural network-based DC AMR-WB signal detection systems where the spectrogram representations of the speech signals are used as the input features. Experimental results conducted on TIMIT database provide several important findings regarding the DC AMR-WB speech detection. Firstly, DC AMR-WB detection is found to be a more challenging task than detecting the AMR-NB signals. For example, convolutional neural network (CNN)-based system yields 74.83% and 99.93% detection rates on AMR-WB and AMR-NB coded signals, respectively. Secondly, capturing the temporal information using long short-term memory (LSTM) network with the DC AMR-WB signal detection accuracy of 86.25% is found to be superior to the CNN system. Thirdly, combining the deep feature representations learned by CNN and LSTM networks further improves the performance. Fourthly, the detection rates are found to deteriorate when the signals are first encoded using different audio codecs prior to AMR-WB compression. Finally, applying score level or decision level fusion to the proposed three systems improves the detection rates, in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.goldwave.com/.

  2. https://www.audacityteam.org/download/.

  3. The dataset generated and analyzed during the current study are not publicly available since TIMIT is a licensed audio database. However, the features and the Python codes of the models utilized in this work will be publicly available soon.

References

  1. TIMIT Acoustic-Phonetic Continuous Speech Corpus. https://catalog.ldc.upenn.edu/LDC93S1 (1993)

  2. 3GPP TS 26.090-Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1392 (2015)

  3. 3GPP TS 26.190-Speech codec speech processing functions; Adaptive Multi-Rate-Wideband (AMR-WB) speech codec; Transcoding functions. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1424(2022)

  4. O. Abdel-Hamid, A. Mohamed, H. Jiang et al., Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  5. N.N. An, N.Q. Thanh, Y. Liu, Deep CNNs with self-attention for speaker identification. IEEE Access 7, 85327–85337 (2019)

    Article  Google Scholar 

  6. T. Bianchi, A. De Rosa, M. Fontani et al., Detection and classification of double compressed MP3 audio tracks, in Proceedings of First ACM Workshop on Information Hiding and Multimedia Security IH &MMSec, pp. 159–164 (2013)

  7. A. Büker, C. Hanilçi, Double compressed AMR audio detection using long-term features and deep neural networks, in Proceedings of ELECO, pp. 590–594 (2019)

  8. A. Büker, C. Hanilçi, Angular margin softmax loss and its variants for double compressed amr audio detection, in Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security. Association for Computing Machinery, IH &MMSec’21, pp. 45–50 (2021)

  9. A. Büker, C. Hanilci, Double compressed amr audio detection using spectral features with temporal segmentation, in Proceedings of ELECO, pp. 284–288 (2021). https://doi.org/10.23919/ELECO54474.2021.9677718

  10. A. Büker, C. Hanilçi, Deep convolutional neural networks for double compressed amr audio detection. IET Signal Proc. 15(4), 265–280 (2021)

    Article  Google Scholar 

  11. Q. Huang, R. Wang, D. Yan et al., AAC audio compression detection based on qmdct coefficient, in Cloud Computing and Security. ed. by X. Sun, Z. Pan, E. Bertino (Springer, Berlin, 2018), pp.347–359

    Chapter  Google Scholar 

  12. Q. Huang, R. Wang, D. Yan et al., AAC double compression audio detection algorithm based on the difference of scale factor. Information 9(7), 161 (2018)

    Article  Google Scholar 

  13. C. Jin, R. Wang, D. Yan et al., An efficient algorithm for double compressed AAC audio detection. Multimed. Tools Appl. 75, 4815–4832 (2016)

    Article  Google Scholar 

  14. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings ed. by Y. Bengio, Y. LeCun. http://arxiv.org/abs/1412.6980 (2015)

  15. G. Lavrentyeva, S. Novoselov, E. Malykh et al., Audio replay attack detection with deep learning frameworks, in Proceedings of INTERSPEECH, pp. 82–86 (2017)

  16. Q. Liu, A.H. Sung, M. Qiao, Detection of double MP3 compression. Cogn. Comput. 2, 291–296 (2010)

    Article  Google Scholar 

  17. D. Luo, R. Yang, J. Huang, Detecting double compressed AMR audio using deep learning, in Proceedings of ICASSP, pp. 2669–2673 (2014)

  18. D. Luo, R. Yang, B. Li et al., Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2017)

    Article  Google Scholar 

  19. R.C. Maher, Audio forensic examination. IEEE Signal Process. Mag. 26(2), 84–94 (2009)

    Article  Google Scholar 

  20. J. Oruh, S. Viriri, A. Adegun, Long short-term memory recurrent neural network for automatic speech recognition. IEEE Access 10, 30069–30079 (2022). https://doi.org/10.1109/ACCESS.2022.3159339

    Article  Google Scholar 

  21. L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, 1st edn. (Prentice Hall Press, Upper Saddle River, 2010)

    Google Scholar 

  22. J.F. Sampaio, O. de Nascimento, Detection of amr double compression using compressed-domain speech features. Forensic Sci. Int. Digit. Investig. 33, 200907 (2020)

    Article  Google Scholar 

  23. J.F.P. Sampaio, F.A.O. Nascimento, Double compressed AMR audio detection using linear prediction coefficients and support vector machine, in Proceedings of 22th Brazilian Conference on Automation (2018)

  24. Y. Shen, J. Jia, L. Cai, Detecting double compressed AMR-format audio recordings, in Proceedings of PCC (2012)

  25. R. Yang, Y.Q. Shi, J. Huang, Defeating fake-quality MP3, in Proceedings of 11th ACM Workshop on Multimedia and Security MM &Sec, pp. 117–124 (2009)

  26. Z. Zhao, H. Duan, G. Min et al., A lighten cnn-lstm model for speaker verification on embedded devices. Future Gener. Comput. Syst. 100, 751–758 (2019). https://doi.org/10.1016/j.future.2019.05.057

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aykut Büker or Cemal Hanilçi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Büker, A., Hanilçi, C. Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02668-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00034-024-02668-4

Keywords

Navigation