Skip to main content
Log in

An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper presents a high imperceptible and robust audio watermarking algorithm by optimizing the scaling parameter as per the imperceptible requirements with minimum bit error rate. The spread spectrum-based audio watermarking is used in this paper, where the scaling parameter is optimized to achieve trade-off between imperceptibility and robustness of watermarked audio. Primarily, the effect of scaling parameter on the perceptual quality of watermarked audio is investigated and the total error introduced in the host audio due to embedding the watermark is computed. The scaling parameter is optimized for maximizing the robustness by considering objective difference grade (ODG) score (for music signals), perceptual evaluation of speech quality (PESQ) score (for speech signals) as a constraint to meet the imperceptibility requirements. To solve this proposed optimization problem, two search algorithms are developed. The embedding is performed in low-pass framelet transform coefficients through SVD with the optimized scaling parameter. The experimental results show that the proposed algorithm achieves good imperceptibility with an average ODG score of −0.32 and PESQ score of 3.86 for music and speech signals, respectively, under various payload conditions. The proposed algorithm shows better robustness to the common signal processing attacks such as noise addition, filtering, resampling, MP3 compression, amplitude scaling, cropping, and requantization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The datasets used in this manuscript are accessed with the web links provided below: USAC database: http://www.voiceage.com/Audio-Samples-AMR-WB.html. NOIZEUS speech database: https://ecs.utdallas.edu/loizou/speech/noizeus. Audio steganalysis database: https://ieee-dataport.org/datasets. TIMIT speech database: https://catalog.ldc.upenn.edu/LDC93s1.

References

  1. A first hands-on lab on Speech Processing. https://www.csd.uoc.gr/~hy578/2018/Project0_Part1.pdf (2018)

  2. A. Al-Haj, An imperceptible and robust audio watermarking algorithm. EURASIP J. Audio Speech Music Process 2014(1), 1–12 (2014). https://doi.org/10.1186/s13636-014-0037-2

    Article  Google Scholar 

  3. P. Bassia, I. Pitas, N. Nikolaidis, Robust audio watermarking in the time domain. IEEE Trans. Multimedia 3(2), 232–241 (2001). https://doi.org/10.1109/6046.923822

    Article  Google Scholar 

  4. C.S. Burrus, R. Gopinath, H. Guo, Introduction to Wavelets and Wavelet Transforms-A Primer (Prentice-Hall, New Jersey, 1998)

    Google Scholar 

  5. S.T. Chen, H.N. Huang, Optimization-based audio watermarking with integrated quantization embedding. Multimed. Tools Appl. 75, 4735–4751 (2016). https://doi.org/10.1007/s11042-015-2500-1

    Article  Google Scholar 

  6. O. T. C. Chen, W. C. Wu, Highly robust, secure, and perceptual-quality echo hiding scheme. IEEE Trans. Audio Speech Lang. Process. 16(3), 629–638 (2008). https://doi.org/10.1109/TASL.2007.913022

    Article  Google Scholar 

  7. S.T. Chen, T.W. Huang, C.T. Yang, High-SNR steganography for digital audio signal in the wavelet domain. Multimed. Tools Appl. 80(6), 9597–9614 (2021). https://doi.org/10.1007/s11042-020-09980-6

    Article  Google Scholar 

  8. F. Djebbar, B. Ayad, K.A. Meraim, H. Hamam, Comparative study of digital audio steganography techniques. EURASIP J. Audio Speech Music Process. 2012(1), 25 (2012). https://doi.org/10.1186/1687-4722-2012-25

    Article  Google Scholar 

  9. S. Erkucuk, S. Krishnan, M. Zeytinoglu, A robust audio watermark representation based on linear chirps. IEEE Trans. Multimedia 8(5), 925–936 (2006). https://doi.org/10.1109/TMM.2006.879879

    Article  Google Scholar 

  10. M. Fallahpour, D. Megías, Audio watermarking based on fibonacci numbers. IEEE/ACM Trans. Audio Speech Lang. Process. 23(8), 1273–1282 (2015). https://doi.org/10.1109/TASLP.2015.2430818

    Article  Google Scholar 

  11. J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (1993). https://doi.org/10.35111/17gk-bn40

  12. B. Han, Properties of discrete framelet transforms. Math. Model. Nat. Phenom. 8(1), 18–47 (2013). https://doi.org/10.1051/mmnp/20138102

    Article  MathSciNet  MATH  Google Scholar 

  13. B. Han, Framelets and Wavelets: Algorithms, Analysis, and Applications (Springer, Berlin, 2018)

    MATH  Google Scholar 

  14. R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)

    Book  MATH  Google Scholar 

  15. H.T. Hu, T.T. Lee, High-performance self-synchronous blind audio watermarking in a unified FFT framework. IEEE Access 7, 19063–19076 (2019). https://doi.org/10.1109/ACCESS.2019.2893646

    Article  Google Scholar 

  16. Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7), 588–601 (2007). https://doi.org/10.1016/j.specom.2006.12.006

    Article  Google Scholar 

  17. H.T. Hu, H.H. Chou, T.T. Lee, Robust blind speech watermarking via FFT-based perceptual vector norm modulation with frame self-synchronization. IEEE Access 9, 9916–9925 (2021). https://doi.org/10.1109/ACCESS.2021.3049525

    Article  Google Scholar 

  18. G. Hua, J. Goh, V.L.L. Thing, Time-spread echo-based audio watermarking with optimized imperceptibility and robustness. IEEE/ACM Trans. Audio Speech Lang. Process. 23(2), 227–239 (2015). https://doi.org/10.1109/TASLP.2014.2387385

    Article  Google Scholar 

  19. G. Hua, J. Huang, Y.Q. Shi, J. Goh, V.L.L. Thing, Twenty years of digital audio watermarking-A comprehensive review. Signal Process. 128, 222–242 (2016). https://doi.org/10.1016/j.sigpro.2016.04.005

    Article  Google Scholar 

  20. M.J. Hwang, J. Lee, M. Lee, H.G. Kang, SVD-based adaptive QIM watermarking on stereo audio signals. IEEE Trans. Multimedia 20(1), 45–54 (2018). https://doi.org/10.1109/TMM.2017.2721642

    Article  Google Scholar 

  21. R. ITU-R, Recommendation ITU-R BS. 1387-1 method for objective measurements of perceived audio quality, BS. 1387-1 International Telecommunications Union-Recommendation, Geneva (1998)

  22. W. Jiang, X. Huang, Y. Quan, Audio watermarking algorithm against synchronization attacks using global characteristics and adaptive frame division. Signal Process. 162, 153–160 (2019). https://doi.org/10.1016/j.sigpro.2019.04.017

    Article  Google Scholar 

  23. R. Jiao, S. Ma, B. Li, Framelet image watermarking considering dynamic visual masking. Optik 126(21), 3197–3202 (2015). https://doi.org/10.1016/j.ijleo.2015.07.084

    Article  Google Scholar 

  24. P. Kabal, An examination and interpretation of ITU-R BS. 1387: perceptual evaluation of audio quality. TSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, pp. 1–89 (2002)

  25. X. Kang, R. Yang, J. Huang, Geometric invariant audio watermarking based on an LCM feature. IEEE Trans. Multimedia 13(2), 181–190 (2011). https://doi.org/10.1109/TMM.2010.2098850

    Article  Google Scholar 

  26. A. Kanhe, A. Gnanasekaran, A DCT-SVD based speech steganography in voiced frames. Circuits Syst. Signal Process. 37, 5049–5068 (2018). https://doi.org/10.1007/s00034-018-0805-9

    Article  Google Scholar 

  27. A. Kaur, M.K. Dutta, An optimized high payload audio watermarking algorithm based on LU-factorization. Multimedia Syst. 24(3), 341–353 (2018). https://doi.org/10.1007/s00530-017-0545-x

    Article  Google Scholar 

  28. B.S. Ko, R. Nishimura, Y. Suzuki, Time-spread echo method for digital audio watermarking. IEEE Trans. Multimedia 7(2), 212–221 (2005). https://doi.org/10.1109/TMM.2005.843366

    Article  Google Scholar 

  29. A. Lang, StirMark benchmark for audio (2008). http://sourceforge.net/projects/stirmark. Accessed on Jan 2022

  30. A.N. Lemma, J. Aprea, W. Oomen, L. van de Kerkhof, A temporal domain audio watermarking technique. IEEE Trans. Signal Process. 51(4), 1088–1097 (2003). https://doi.org/10.1109/TSP.2003.809372

    Article  MathSciNet  MATH  Google Scholar 

  31. W.N. Lie, L.C. Chang, Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimedia 8(1), 46–59 (2006). https://doi.org/10.1109/TMM.2005.861292

    Article  Google Scholar 

  32. Z. Liu, Y. Huang, J. Huang, Patchwork-based audio watermarking robust against de-synchronization and recapturing attacks. IEEE Trans. Inf. Forensics Secur. 14(5), 1171–1180 (2019). https://doi.org/10.1109/TIFS.2018.2871748

    Article  Google Scholar 

  33. S. Mishra, V.K. Yadav, M.C. Trivedi, T. Shrimali, Audio steganography techniques: a survey. In: Advances in Computer and Computational Sciences, pp. 581–589. Springer (2018). https://doi.org/10.1007/978-981-10-3773-3_56

  34. L. Rabiner, R. Schafer, Digital Processing of Speech Signals (Prentice Hall, USA, 1978)

    Google Scholar 

  35. I.T. Recommendation, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862 (2001)

  36. S. Sarreshtedari, M.A. Akhaee, A. Abbasfar, A watermarking method for digital speech self-recovery. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1917–1925 (2015). https://doi.org/10.1109/TASLP.2015.2456431

    Article  Google Scholar 

  37. I.W. Selesnick, A. Abdelnour, Symmetric wavelet tight frames with two generators. Appl. Comput. Harmon. Anal. 17(2), 211–225 (2004). https://doi.org/10.1016/j.acha.2004.05.003. (Special Issue: Frames in Harmonic Analysis, Part II)

    Article  MathSciNet  MATH  Google Scholar 

  38. Z. Su, G. Zhang, F. Yue, L. Chang, J. Jiang, X. Yao, SNR-constrained heuristics for optimizing the scaling parameter of robust audio watermarking. IEEE Trans. Multimedia 20(10), 2631–2644 (2018). https://doi.org/10.1109/TMM.2018.2812599

    Article  Google Scholar 

  39. M.S. Subhedar, V.H. Mankar, Secure image steganography using framelet transform and bidiagonal SVD. Multimed. Tools Appl. 79(3), 1865–1886 (2020). https://doi.org/10.1007/s11042-019-08221-9

    Article  Google Scholar 

  40. N.H. Sultan, N.H.A. Khammas, Z.H. Najm, Image watermarking based on framelet transform. Period. Eng. Natural Sci. 9(1), 37–47 (2021). https://doi.org/10.1016/j.ijleo.2015.07.084

    Article  Google Scholar 

  41. Unified speech and audio database(USAC). http://www.voiceage.com/Audio-Samples-AMR-WB.html (2008). Accessed on Feb 2021

  42. M. Unoki, R. Miyauchi, Detection of tampering in speech signals with inaudible watermarking technique. In: 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Process., pp. 118–121 (2012). https://doi.org/10.1109/IIH-MSP.2012.34

  43. K. Vivekananda Bhat, A.K. Das, J. Lee, A mean quantization watermarking scheme for audio signals using singular-value decomposition. IEEE Access 7, 157,480-157,488 (2019). https://doi.org/10.1109/ACCESS.2019.2949691

    Article  Google Scholar 

  44. Y. Wang, K. Yang, Y. Yang, J. Zhang, X. Zhao, Audio steganalysis dataset (2019). https://doi.org/10.21227/rab0-vf56. Accessed on Feb 2021

  45. Y. Xiang, D. Peng, I. Natgunanathan, W. Zhou, Effective pseudonoise sequence and decoding function for imperceptibility and robustness enhancement in time-spread echo-based audio watermarking. IEEE Trans. Multimedia 13(1), 2–13 (2011). https://doi.org/10.1109/TMM.2010.2080668

    Article  Google Scholar 

  46. Y. Xiang, I. Natgunanathan, D. Peng, W. Zhou, S. Yu, A dual-channel time-spread echo method for audio watermarking. IEEE Trans. Inf. Forensics Secur. 7(2), 383–392 (2012). https://doi.org/10.1109/TIFS.2011.2173678

    Article  Google Scholar 

  47. M. Xiao, Z. He, T. Quan, A robust digital watermarking algorithm based on framelet and SVD. In: Proceedings of SPIE 9811, MIPPR 2015: Multispectral Image Acquisition, Processing, and Analysis, 981119, vol. 9811, pp. 295–300. SPIE (2015). https://doi.org/10.1117/12.2209570

  48. Y. Xue, K. Mu, Y. Wang, Y. Chen, P. Zhong, J. Wen, Robust speech steganography using differential SVD. IEEE Access 7, 153,724-153,733 (2019). https://doi.org/10.1109/access.2019.2948946

    Article  Google Scholar 

  49. J. Zhao, T. Zong, Y. Xiang, L. Gao, W. Zhou, G. Beliakov, Desynchronization attacks resilient watermarking method based on frequency singular value coefficient modification. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2282–2295 (2021). https://doi.org/10.1109/TASLP.2021.3092555

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kasetty Praveen Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Effect of Additive White Gaussian Noise

In the proposed algorithm, the watermark is extracted by computing the average of non-diagonal elements of the matrix \([D_w]\) and by using the decision logic in Eq. (17). It can be observed that, the extraction of watermark depends upon the orthonormal matrices \(U_1\) and \(V_1\). If the singular valued matrix \(S_r\) at the extraction is invariant then the watermark can be extracted with minimum error.

The effect of AWGN on singular values is modeled intuitively as follows. Assume that, AWGN follows independent and identically distributed (i.i.d) random process.

$$\begin{aligned} y=x_w+n, \end{aligned}$$
(A.1)

where \(x_w\) is the watermarked signal, n is the White Gaussian noise, and y is the noisy signal. As \(L_2\) norm is preserved, the energy of watermarked signal \(E_w\) can be written as,

$$\begin{aligned} \Vert x_w\Vert ^2=\Vert c\Vert ^2=\Vert USV^T\Vert ^2=\Vert S\Vert ^2, \end{aligned}$$
(A.2)

where c indicates the framelet transform coefficients, U and V are unitary matrices.

By using the property of i.i.d for AWGN, the energy of noisy signal can be approximated as,

$$\begin{aligned} \begin{aligned} E_y&=E_w+E_n\\ \Vert S_r\Vert ^2&=\Vert S\Vert ^2+\Vert n\Vert ^2. \end{aligned} \end{aligned}$$
(A.3)

This shows that, AWGN effects the singular values at the extraction side. If the energy of watermarked signal is high when compared to noise energy then the singular values will be invariant to the AWGN attack.

From Eq. (16), it can be observed that the energy of watermarked signal depend on the scaling parameter \(\alpha \). Therefore, by maintaining good SNR at the embedding side, the effect of AWGN on watermarked signal can be minimized.

Appendix B: Effect of Amplitude Scaling

The effect of amplitude scaling on the extraction of watermark is discussed here. Consider the watermarked signal x(t) and its framelet transform coefficients are obtained by Eq. (3)

$$\begin{aligned} c_{j}(k)=\int x(t)2^{j/2}\phi (2^{j}t-k) \hbox {d}t \end{aligned}$$
(B.1)

The coefficients are arranged in a matrix [A] and SVD is performed to decompose into [U], [S], [V] matrices as below:

$$\begin{aligned} A=U\times S\times V^{T} \end{aligned}$$
(B.2)

If the amplitude of watermarked signal is scaled by a factor of \(\beta \) then the corresponding framelet coefficients are also scaled by the factor of \(\beta \) and is shown below:

$$\begin{aligned} \begin{aligned} c^\prime _{j}(k)&=\int \beta (x(t))2^{j/2}\phi (2^{j}t-k) \hbox {d}t\\ c^\prime _{j}(k)&=\beta \int x(t)2^{j/2}\phi (2^{j}t-k) \hbox {d}t \end{aligned} \end{aligned}$$
(B.3)

The coefficients are arranged in a matrix [B] and its SVD can be expressed as

$$\begin{aligned} B=U\times \beta (S)\times V^{T} \end{aligned}$$
(B.4)

This shows that, the decision rule in Eq. (17) doesn’t gets effected due to the amplitude scaling attack. Hence, the recovery of watermark with minimum error is possible.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, K.P., Kanhe, A. An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD. Circuits Syst Signal Process 42, 5684–5713 (2023). https://doi.org/10.1007/s00034-023-02382-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02382-7

Keywords

Navigation