Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

Büker, Aykut; Hanilçi, Cemal

doi:10.1007/s00034-024-02668-4

Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

Published: 16 April 2024

(2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

31 Accesses
Explore all metrics

Abstract

Detecting double compressed (DC) speech signals is an important audio forensics task since it is highly related to the integrity and the authenticity of the recording. Adaptive multi-rate (AMR) speech codec is a popular audio compression technique specifically optimized for speech signals and it is a standard audio recording format in the vast majority of the smart phones. All of the previous studies addressing the detection of DC AMR signals report their findings for the speech signals compressed using the narrowband AMR codec (AMR-NB). Meanwhile, wideband AMR codec (AMR-WB) has been used by several mobile phone manufacturers, but DC AMR-WB speech signal detection performance remains unknown. To the best of our knowledge, this is the first study focusing on detecting the DC signals compressed using the AMR-WB speech codec. To this end, we propose three different deep neural network-based DC AMR-WB signal detection systems where the spectrogram representations of the speech signals are used as the input features. Experimental results conducted on TIMIT database provide several important findings regarding the DC AMR-WB speech detection. Firstly, DC AMR-WB detection is found to be a more challenging task than detecting the AMR-NB signals. For example, convolutional neural network (CNN)-based system yields 74.83% and 99.93% detection rates on AMR-WB and AMR-NB coded signals, respectively. Secondly, capturing the temporal information using long short-term memory (LSTM) network with the DC AMR-WB signal detection accuracy of 86.25% is found to be superior to the CNN system. Thirdly, combining the deep feature representations learned by CNN and LSTM networks further improves the performance. Fourthly, the detection rates are found to deteriorate when the signals are first encoded using different audio codecs prior to AMR-WB compression. Finally, applying score level or decision level fusion to the proposed three systems improves the detection rates, in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mel spectrogram-based audio forgery detection using CNN

Article 19 December 2022

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Article 17 February 2021

Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations

Notes

https://www.goldwave.com/.
https://www.audacityteam.org/download/.
The dataset generated and analyzed during the current study are not publicly available since TIMIT is a licensed audio database. However, the features and the Python codes of the models utilized in this work will be publicly available soon.

References

TIMIT Acoustic-Phonetic Continuous Speech Corpus. https://catalog.ldc.upenn.edu/LDC93S1 (1993)
3GPP TS 26.090-Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1392 (2015)
3GPP TS 26.190-Speech codec speech processing functions; Adaptive Multi-Rate-Wideband (AMR-WB) speech codec; Transcoding functions. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1424(2022)
O. Abdel-Hamid, A. Mohamed, H. Jiang et al., Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Article Google Scholar
N.N. An, N.Q. Thanh, Y. Liu, Deep CNNs with self-attention for speaker identification. IEEE Access 7, 85327–85337 (2019)
Article Google Scholar
T. Bianchi, A. De Rosa, M. Fontani et al., Detection and classification of double compressed MP3 audio tracks, in Proceedings of First ACM Workshop on Information Hiding and Multimedia Security IH &MMSec, pp. 159–164 (2013)
A. Büker, C. Hanilçi, Double compressed AMR audio detection using long-term features and deep neural networks, in Proceedings of ELECO, pp. 590–594 (2019)
A. Büker, C. Hanilçi, Angular margin softmax loss and its variants for double compressed amr audio detection, in Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security. Association for Computing Machinery, IH &MMSec’21, pp. 45–50 (2021)
A. Büker, C. Hanilci, Double compressed amr audio detection using spectral features with temporal segmentation, in Proceedings of ELECO, pp. 284–288 (2021). https://doi.org/10.23919/ELECO54474.2021.9677718
A. Büker, C. Hanilçi, Deep convolutional neural networks for double compressed amr audio detection. IET Signal Proc. 15(4), 265–280 (2021)
Article Google Scholar
Q. Huang, R. Wang, D. Yan et al., AAC audio compression detection based on qmdct coefficient, in Cloud Computing and Security. ed. by X. Sun, Z. Pan, E. Bertino (Springer, Berlin, 2018), pp.347–359
Chapter Google Scholar
Q. Huang, R. Wang, D. Yan et al., AAC double compression audio detection algorithm based on the difference of scale factor. Information 9(7), 161 (2018)
Article Google Scholar
C. Jin, R. Wang, D. Yan et al., An efficient algorithm for double compressed AAC audio detection. Multimed. Tools Appl. 75, 4815–4832 (2016)
Article Google Scholar
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings ed. by Y. Bengio, Y. LeCun. http://arxiv.org/abs/1412.6980 (2015)
G. Lavrentyeva, S. Novoselov, E. Malykh et al., Audio replay attack detection with deep learning frameworks, in Proceedings of INTERSPEECH, pp. 82–86 (2017)
Q. Liu, A.H. Sung, M. Qiao, Detection of double MP3 compression. Cogn. Comput. 2, 291–296 (2010)
Article Google Scholar
D. Luo, R. Yang, J. Huang, Detecting double compressed AMR audio using deep learning, in Proceedings of ICASSP, pp. 2669–2673 (2014)
D. Luo, R. Yang, B. Li et al., Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2017)
Article Google Scholar
R.C. Maher, Audio forensic examination. IEEE Signal Process. Mag. 26(2), 84–94 (2009)
Article Google Scholar
J. Oruh, S. Viriri, A. Adegun, Long short-term memory recurrent neural network for automatic speech recognition. IEEE Access 10, 30069–30079 (2022). https://doi.org/10.1109/ACCESS.2022.3159339
Article Google Scholar
L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, 1st edn. (Prentice Hall Press, Upper Saddle River, 2010)
Google Scholar
J.F. Sampaio, O. de Nascimento, Detection of amr double compression using compressed-domain speech features. Forensic Sci. Int. Digit. Investig. 33, 200907 (2020)
Article Google Scholar
J.F.P. Sampaio, F.A.O. Nascimento, Double compressed AMR audio detection using linear prediction coefficients and support vector machine, in Proceedings of 22th Brazilian Conference on Automation (2018)
Y. Shen, J. Jia, L. Cai, Detecting double compressed AMR-format audio recordings, in Proceedings of PCC (2012)
R. Yang, Y.Q. Shi, J. Huang, Defeating fake-quality MP3, in Proceedings of 11th ACM Workshop on Multimedia and Security MM &Sec, pp. 117–124 (2009)
Z. Zhao, H. Duan, G. Min et al., A lighten cnn-lstm model for speaker verification on embedded devices. Future Gener. Comput. Syst. 100, 751–758 (2019). https://doi.org/10.1016/j.future.2019.05.057
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Electronics Engineering, Bursa Technical University, 16310, Bursa, Turkey
Aykut Büker & Cemal Hanilçi

Authors

Aykut Büker
View author publications
You can also search for this author in PubMed Google Scholar
Cemal Hanilçi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Aykut Büker or Cemal Hanilçi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Büker, A., Hanilçi, C. Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02668-4

Download citation

Received: 25 August 2023
Revised: 14 March 2024
Accepted: 14 March 2024
Published: 16 April 2024
DOI: https://doi.org/10.1007/s00034-024-02668-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

Mel spectrogram-based audio forgery detection using CNN

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

Mel spectrogram-based audio forgery detection using CNN

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation