An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD

Kumar, Kasetty Praveen; Kanhe, Aniruddha

doi:10.1007/s00034-023-02382-7

An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD

Published: 25 April 2023

Volume 42, pages 5684–5713, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

268 Accesses
Explore all metrics

Abstract

This paper presents a high imperceptible and robust audio watermarking algorithm by optimizing the scaling parameter as per the imperceptible requirements with minimum bit error rate. The spread spectrum-based audio watermarking is used in this paper, where the scaling parameter is optimized to achieve trade-off between imperceptibility and robustness of watermarked audio. Primarily, the effect of scaling parameter on the perceptual quality of watermarked audio is investigated and the total error introduced in the host audio due to embedding the watermark is computed. The scaling parameter is optimized for maximizing the robustness by considering objective difference grade (ODG) score (for music signals), perceptual evaluation of speech quality (PESQ) score (for speech signals) as a constraint to meet the imperceptibility requirements. To solve this proposed optimization problem, two search algorithms are developed. The embedding is performed in low-pass framelet transform coefficients through SVD with the optimized scaling parameter. The experimental results show that the proposed algorithm achieves good imperceptibility with an average ODG score of −0.32 and PESQ score of 3.86 for music and speech signals, respectively, under various payload conditions. The proposed algorithm shows better robustness to the common signal processing attacks such as noise addition, filtering, resampling, MP3 compression, amplitude scaling, cropping, and requantization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video steganography: recent advances and challenges

Article Open access 04 April 2023

A novel steganographic technique for medical image using SVM and IWT

Article 06 January 2023

A robust blind color watermarking algorithm based on the Radon-DCT transform

Article Open access 17 January 2024

Data Availability

The datasets used in this manuscript are accessed with the web links provided below: USAC database: http://www.voiceage.com/Audio-Samples-AMR-WB.html. NOIZEUS speech database: https://ecs.utdallas.edu/loizou/speech/noizeus. Audio steganalysis database: https://ieee-dataport.org/datasets. TIMIT speech database: https://catalog.ldc.upenn.edu/LDC93s1.

References

A first hands-on lab on Speech Processing. https://www.csd.uoc.gr/~hy578/2018/Project0_Part1.pdf (2018)
A. Al-Haj, An imperceptible and robust audio watermarking algorithm. EURASIP J. Audio Speech Music Process 2014(1), 1–12 (2014). https://doi.org/10.1186/s13636-014-0037-2
Article Google Scholar
P. Bassia, I. Pitas, N. Nikolaidis, Robust audio watermarking in the time domain. IEEE Trans. Multimedia 3(2), 232–241 (2001). https://doi.org/10.1109/6046.923822
Article Google Scholar
C.S. Burrus, R. Gopinath, H. Guo, Introduction to Wavelets and Wavelet Transforms-A Primer (Prentice-Hall, New Jersey, 1998)
Google Scholar
S.T. Chen, H.N. Huang, Optimization-based audio watermarking with integrated quantization embedding. Multimed. Tools Appl. 75, 4735–4751 (2016). https://doi.org/10.1007/s11042-015-2500-1
Article Google Scholar
O. T. C. Chen, W. C. Wu, Highly robust, secure, and perceptual-quality echo hiding scheme. IEEE Trans. Audio Speech Lang. Process. 16(3), 629–638 (2008). https://doi.org/10.1109/TASL.2007.913022
Article Google Scholar
S.T. Chen, T.W. Huang, C.T. Yang, High-SNR steganography for digital audio signal in the wavelet domain. Multimed. Tools Appl. 80(6), 9597–9614 (2021). https://doi.org/10.1007/s11042-020-09980-6
Article Google Scholar
F. Djebbar, B. Ayad, K.A. Meraim, H. Hamam, Comparative study of digital audio steganography techniques. EURASIP J. Audio Speech Music Process. 2012(1), 25 (2012). https://doi.org/10.1186/1687-4722-2012-25
Article Google Scholar
S. Erkucuk, S. Krishnan, M. Zeytinoglu, A robust audio watermark representation based on linear chirps. IEEE Trans. Multimedia 8(5), 925–936 (2006). https://doi.org/10.1109/TMM.2006.879879
Article Google Scholar
M. Fallahpour, D. Megías, Audio watermarking based on fibonacci numbers. IEEE/ACM Trans. Audio Speech Lang. Process. 23(8), 1273–1282 (2015). https://doi.org/10.1109/TASLP.2015.2430818
Article Google Scholar
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (1993). https://doi.org/10.35111/17gk-bn40
B. Han, Properties of discrete framelet transforms. Math. Model. Nat. Phenom. 8(1), 18–47 (2013). https://doi.org/10.1051/mmnp/20138102
Article MathSciNet MATH Google Scholar
B. Han, Framelets and Wavelets: Algorithms, Analysis, and Applications (Springer, Berlin, 2018)
MATH Google Scholar
R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)
Book MATH Google Scholar
H.T. Hu, T.T. Lee, High-performance self-synchronous blind audio watermarking in a unified FFT framework. IEEE Access 7, 19063–19076 (2019). https://doi.org/10.1109/ACCESS.2019.2893646
Article Google Scholar
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7), 588–601 (2007). https://doi.org/10.1016/j.specom.2006.12.006
Article Google Scholar
H.T. Hu, H.H. Chou, T.T. Lee, Robust blind speech watermarking via FFT-based perceptual vector norm modulation with frame self-synchronization. IEEE Access 9, 9916–9925 (2021). https://doi.org/10.1109/ACCESS.2021.3049525
Article Google Scholar
G. Hua, J. Goh, V.L.L. Thing, Time-spread echo-based audio watermarking with optimized imperceptibility and robustness. IEEE/ACM Trans. Audio Speech Lang. Process. 23(2), 227–239 (2015). https://doi.org/10.1109/TASLP.2014.2387385
Article Google Scholar
G. Hua, J. Huang, Y.Q. Shi, J. Goh, V.L.L. Thing, Twenty years of digital audio watermarking-A comprehensive review. Signal Process. 128, 222–242 (2016). https://doi.org/10.1016/j.sigpro.2016.04.005
Article Google Scholar
M.J. Hwang, J. Lee, M. Lee, H.G. Kang, SVD-based adaptive QIM watermarking on stereo audio signals. IEEE Trans. Multimedia 20(1), 45–54 (2018). https://doi.org/10.1109/TMM.2017.2721642
Article Google Scholar
R. ITU-R, Recommendation ITU-R BS. 1387-1 method for objective measurements of perceived audio quality, BS. 1387-1 International Telecommunications Union-Recommendation, Geneva (1998)
W. Jiang, X. Huang, Y. Quan, Audio watermarking algorithm against synchronization attacks using global characteristics and adaptive frame division. Signal Process. 162, 153–160 (2019). https://doi.org/10.1016/j.sigpro.2019.04.017
Article Google Scholar
R. Jiao, S. Ma, B. Li, Framelet image watermarking considering dynamic visual masking. Optik 126(21), 3197–3202 (2015). https://doi.org/10.1016/j.ijleo.2015.07.084
Article Google Scholar
P. Kabal, An examination and interpretation of ITU-R BS. 1387: perceptual evaluation of audio quality. TSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, pp. 1–89 (2002)
X. Kang, R. Yang, J. Huang, Geometric invariant audio watermarking based on an LCM feature. IEEE Trans. Multimedia 13(2), 181–190 (2011). https://doi.org/10.1109/TMM.2010.2098850
Article Google Scholar
A. Kanhe, A. Gnanasekaran, A DCT-SVD based speech steganography in voiced frames. Circuits Syst. Signal Process. 37, 5049–5068 (2018). https://doi.org/10.1007/s00034-018-0805-9
Article Google Scholar
A. Kaur, M.K. Dutta, An optimized high payload audio watermarking algorithm based on LU-factorization. Multimedia Syst. 24(3), 341–353 (2018). https://doi.org/10.1007/s00530-017-0545-x
Article Google Scholar
B.S. Ko, R. Nishimura, Y. Suzuki, Time-spread echo method for digital audio watermarking. IEEE Trans. Multimedia 7(2), 212–221 (2005). https://doi.org/10.1109/TMM.2005.843366
Article Google Scholar
A. Lang, StirMark benchmark for audio (2008). http://sourceforge.net/projects/stirmark. Accessed on Jan 2022
A.N. Lemma, J. Aprea, W. Oomen, L. van de Kerkhof, A temporal domain audio watermarking technique. IEEE Trans. Signal Process. 51(4), 1088–1097 (2003). https://doi.org/10.1109/TSP.2003.809372
Article MathSciNet MATH Google Scholar
W.N. Lie, L.C. Chang, Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimedia 8(1), 46–59 (2006). https://doi.org/10.1109/TMM.2005.861292
Article Google Scholar
Z. Liu, Y. Huang, J. Huang, Patchwork-based audio watermarking robust against de-synchronization and recapturing attacks. IEEE Trans. Inf. Forensics Secur. 14(5), 1171–1180 (2019). https://doi.org/10.1109/TIFS.2018.2871748
Article Google Scholar
S. Mishra, V.K. Yadav, M.C. Trivedi, T. Shrimali, Audio steganography techniques: a survey. In: Advances in Computer and Computational Sciences, pp. 581–589. Springer (2018). https://doi.org/10.1007/978-981-10-3773-3_56
L. Rabiner, R. Schafer, Digital Processing of Speech Signals (Prentice Hall, USA, 1978)
Google Scholar
I.T. Recommendation, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862 (2001)
S. Sarreshtedari, M.A. Akhaee, A. Abbasfar, A watermarking method for digital speech self-recovery. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1917–1925 (2015). https://doi.org/10.1109/TASLP.2015.2456431
Article Google Scholar
I.W. Selesnick, A. Abdelnour, Symmetric wavelet tight frames with two generators. Appl. Comput. Harmon. Anal. 17(2), 211–225 (2004). https://doi.org/10.1016/j.acha.2004.05.003. (Special Issue: Frames in Harmonic Analysis, Part II)
Article MathSciNet MATH Google Scholar
Z. Su, G. Zhang, F. Yue, L. Chang, J. Jiang, X. Yao, SNR-constrained heuristics for optimizing the scaling parameter of robust audio watermarking. IEEE Trans. Multimedia 20(10), 2631–2644 (2018). https://doi.org/10.1109/TMM.2018.2812599
Article Google Scholar
M.S. Subhedar, V.H. Mankar, Secure image steganography using framelet transform and bidiagonal SVD. Multimed. Tools Appl. 79(3), 1865–1886 (2020). https://doi.org/10.1007/s11042-019-08221-9
Article Google Scholar
N.H. Sultan, N.H.A. Khammas, Z.H. Najm, Image watermarking based on framelet transform. Period. Eng. Natural Sci. 9(1), 37–47 (2021). https://doi.org/10.1016/j.ijleo.2015.07.084
Article Google Scholar
Unified speech and audio database(USAC). http://www.voiceage.com/Audio-Samples-AMR-WB.html (2008). Accessed on Feb 2021
M. Unoki, R. Miyauchi, Detection of tampering in speech signals with inaudible watermarking technique. In: 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Process., pp. 118–121 (2012). https://doi.org/10.1109/IIH-MSP.2012.34
K. Vivekananda Bhat, A.K. Das, J. Lee, A mean quantization watermarking scheme for audio signals using singular-value decomposition. IEEE Access 7, 157,480-157,488 (2019). https://doi.org/10.1109/ACCESS.2019.2949691
Article Google Scholar
Y. Wang, K. Yang, Y. Yang, J. Zhang, X. Zhao, Audio steganalysis dataset (2019). https://doi.org/10.21227/rab0-vf56. Accessed on Feb 2021
Y. Xiang, D. Peng, I. Natgunanathan, W. Zhou, Effective pseudonoise sequence and decoding function for imperceptibility and robustness enhancement in time-spread echo-based audio watermarking. IEEE Trans. Multimedia 13(1), 2–13 (2011). https://doi.org/10.1109/TMM.2010.2080668
Article Google Scholar
Y. Xiang, I. Natgunanathan, D. Peng, W. Zhou, S. Yu, A dual-channel time-spread echo method for audio watermarking. IEEE Trans. Inf. Forensics Secur. 7(2), 383–392 (2012). https://doi.org/10.1109/TIFS.2011.2173678
Article Google Scholar
M. Xiao, Z. He, T. Quan, A robust digital watermarking algorithm based on framelet and SVD. In: Proceedings of SPIE 9811, MIPPR 2015: Multispectral Image Acquisition, Processing, and Analysis, 981119, vol. 9811, pp. 295–300. SPIE (2015). https://doi.org/10.1117/12.2209570
Y. Xue, K. Mu, Y. Wang, Y. Chen, P. Zhong, J. Wen, Robust speech steganography using differential SVD. IEEE Access 7, 153,724-153,733 (2019). https://doi.org/10.1109/access.2019.2948946
Article Google Scholar
J. Zhao, T. Zong, Y. Xiang, L. Gao, W. Zhou, G. Beliakov, Desynchronization attacks resilient watermarking method based on frequency singular value coefficient modification. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2282–2295 (2021). https://doi.org/10.1109/TASLP.2021.3092555
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Puducherry, Thiruvettakudy, Karaikal, Puducherry, 609609, India
Kasetty Praveen Kumar & Aniruddha Kanhe

Authors

Kasetty Praveen Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Aniruddha Kanhe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kasetty Praveen Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Effect of Additive White Gaussian Noise

In the proposed algorithm, the watermark is extracted by computing the average of non-diagonal elements of the matrix $[D_w]$ and by using the decision logic in Eq. (17). It can be observed that, the extraction of watermark depends upon the orthonormal matrices $U_1$ and $V_1$. If the singular valued matrix $S_r$ at the extraction is invariant then the watermark can be extracted with minimum error.

The effect of AWGN on singular values is modeled intuitively as follows. Assume that, AWGN follows independent and identically distributed (i.i.d) random process.

$$\begin{aligned} y=x_w+n, \end{aligned}$$

(A.1)

where $x_w$ is the watermarked signal, n is the White Gaussian noise, and y is the noisy signal. As $L_2$ norm is preserved, the energy of watermarked signal $E_w$ can be written as,

$$\begin{aligned} \Vert x_w\Vert ^2=\Vert c\Vert ^2=\Vert USV^T\Vert ^2=\Vert S\Vert ^2, \end{aligned}$$

(A.2)

where c indicates the framelet transform coefficients, U and V are unitary matrices.

By using the property of i.i.d for AWGN, the energy of noisy signal can be approximated as,

$$\begin{aligned} \begin{aligned} E_y&=E_w+E_n\\ \Vert S_r\Vert ^2&=\Vert S\Vert ^2+\Vert n\Vert ^2. \end{aligned} \end{aligned}$$

(A.3)

This shows that, AWGN effects the singular values at the extraction side. If the energy of watermarked signal is high when compared to noise energy then the singular values will be invariant to the AWGN attack.

From Eq. (16), it can be observed that the energy of watermarked signal depend on the scaling parameter $\alpha $. Therefore, by maintaining good SNR at the embedding side, the effect of AWGN on watermarked signal can be minimized.

Appendix B: Effect of Amplitude Scaling

The effect of amplitude scaling on the extraction of watermark is discussed here. Consider the watermarked signal x(t) and its framelet transform coefficients are obtained by Eq. (3)

$$\begin{aligned} c_{j}(k)=\int x(t)2^{j/2}\phi (2^{j}t-k) \hbox {d}t \end{aligned}$$

(B.1)

The coefficients are arranged in a matrix [A] and SVD is performed to decompose into [U], [S], [V] matrices as below:

$$\begin{aligned} A=U\times S\times V^{T} \end{aligned}$$

(B.2)

If the amplitude of watermarked signal is scaled by a factor of $\beta $ then the corresponding framelet coefficients are also scaled by the factor of $\beta $ and is shown below:

$$\begin{aligned} \begin{aligned} c^\prime _{j}(k)&=\int \beta (x(t))2^{j/2}\phi (2^{j}t-k) \hbox {d}t\\ c^\prime _{j}(k)&=\beta \int x(t)2^{j/2}\phi (2^{j}t-k) \hbox {d}t \end{aligned} \end{aligned}$$

(B.3)

The coefficients are arranged in a matrix [B] and its SVD can be expressed as

$$\begin{aligned} B=U\times \beta (S)\times V^{T} \end{aligned}$$

(B.4)

This shows that, the decision rule in Eq. (17) doesn’t gets effected due to the amplitude scaling attack. Hence, the recovery of watermark with minimum error is possible.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kumar, K.P., Kanhe, A. An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD. Circuits Syst Signal Process 42, 5684–5713 (2023). https://doi.org/10.1007/s00034-023-02382-7

Download citation

Received: 25 March 2022
Revised: 05 April 2023
Accepted: 06 April 2023
Published: 25 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00034-023-02382-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD

Abstract

Access this article

Similar content being viewed by others

Video steganography: recent advances and challenges

A novel steganographic technique for medical image using SVM and IWT

A robust blind color watermarking algorithm based on the Radon-DCT transform

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Effect of Additive White Gaussian Noise

Appendix B: Effect of Amplitude Scaling

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD

Abstract

Access this article

Similar content being viewed by others

Video steganography: recent advances and challenges

A novel steganographic technique for medical image using SVM and IWT

A robust blind color watermarking algorithm based on the Radon-DCT transform

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Effect of Additive White Gaussian Noise

Appendix B: Effect of Amplitude Scaling

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation