Abstract
This paper presents a high imperceptible and robust audio watermarking algorithm by optimizing the scaling parameter as per the imperceptible requirements with minimum bit error rate. The spread spectrum-based audio watermarking is used in this paper, where the scaling parameter is optimized to achieve trade-off between imperceptibility and robustness of watermarked audio. Primarily, the effect of scaling parameter on the perceptual quality of watermarked audio is investigated and the total error introduced in the host audio due to embedding the watermark is computed. The scaling parameter is optimized for maximizing the robustness by considering objective difference grade (ODG) score (for music signals), perceptual evaluation of speech quality (PESQ) score (for speech signals) as a constraint to meet the imperceptibility requirements. To solve this proposed optimization problem, two search algorithms are developed. The embedding is performed in low-pass framelet transform coefficients through SVD with the optimized scaling parameter. The experimental results show that the proposed algorithm achieves good imperceptibility with an average ODG score of −0.32 and PESQ score of 3.86 for music and speech signals, respectively, under various payload conditions. The proposed algorithm shows better robustness to the common signal processing attacks such as noise addition, filtering, resampling, MP3 compression, amplitude scaling, cropping, and requantization.
Similar content being viewed by others
Data Availability
The datasets used in this manuscript are accessed with the web links provided below: USAC database: http://www.voiceage.com/Audio-Samples-AMR-WB.html. NOIZEUS speech database: https://ecs.utdallas.edu/loizou/speech/noizeus. Audio steganalysis database: https://ieee-dataport.org/datasets. TIMIT speech database: https://catalog.ldc.upenn.edu/LDC93s1.
References
A first hands-on lab on Speech Processing. https://www.csd.uoc.gr/~hy578/2018/Project0_Part1.pdf (2018)
A. Al-Haj, An imperceptible and robust audio watermarking algorithm. EURASIP J. Audio Speech Music Process 2014(1), 1–12 (2014). https://doi.org/10.1186/s13636-014-0037-2
P. Bassia, I. Pitas, N. Nikolaidis, Robust audio watermarking in the time domain. IEEE Trans. Multimedia 3(2), 232–241 (2001). https://doi.org/10.1109/6046.923822
C.S. Burrus, R. Gopinath, H. Guo, Introduction to Wavelets and Wavelet Transforms-A Primer (Prentice-Hall, New Jersey, 1998)
S.T. Chen, H.N. Huang, Optimization-based audio watermarking with integrated quantization embedding. Multimed. Tools Appl. 75, 4735–4751 (2016). https://doi.org/10.1007/s11042-015-2500-1
O. T. C. Chen, W. C. Wu, Highly robust, secure, and perceptual-quality echo hiding scheme. IEEE Trans. Audio Speech Lang. Process. 16(3), 629–638 (2008). https://doi.org/10.1109/TASL.2007.913022
S.T. Chen, T.W. Huang, C.T. Yang, High-SNR steganography for digital audio signal in the wavelet domain. Multimed. Tools Appl. 80(6), 9597–9614 (2021). https://doi.org/10.1007/s11042-020-09980-6
F. Djebbar, B. Ayad, K.A. Meraim, H. Hamam, Comparative study of digital audio steganography techniques. EURASIP J. Audio Speech Music Process. 2012(1), 25 (2012). https://doi.org/10.1186/1687-4722-2012-25
S. Erkucuk, S. Krishnan, M. Zeytinoglu, A robust audio watermark representation based on linear chirps. IEEE Trans. Multimedia 8(5), 925–936 (2006). https://doi.org/10.1109/TMM.2006.879879
M. Fallahpour, D. Megías, Audio watermarking based on fibonacci numbers. IEEE/ACM Trans. Audio Speech Lang. Process. 23(8), 1273–1282 (2015). https://doi.org/10.1109/TASLP.2015.2430818
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (1993). https://doi.org/10.35111/17gk-bn40
B. Han, Properties of discrete framelet transforms. Math. Model. Nat. Phenom. 8(1), 18–47 (2013). https://doi.org/10.1051/mmnp/20138102
B. Han, Framelets and Wavelets: Algorithms, Analysis, and Applications (Springer, Berlin, 2018)
R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)
H.T. Hu, T.T. Lee, High-performance self-synchronous blind audio watermarking in a unified FFT framework. IEEE Access 7, 19063–19076 (2019). https://doi.org/10.1109/ACCESS.2019.2893646
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7), 588–601 (2007). https://doi.org/10.1016/j.specom.2006.12.006
H.T. Hu, H.H. Chou, T.T. Lee, Robust blind speech watermarking via FFT-based perceptual vector norm modulation with frame self-synchronization. IEEE Access 9, 9916–9925 (2021). https://doi.org/10.1109/ACCESS.2021.3049525
G. Hua, J. Goh, V.L.L. Thing, Time-spread echo-based audio watermarking with optimized imperceptibility and robustness. IEEE/ACM Trans. Audio Speech Lang. Process. 23(2), 227–239 (2015). https://doi.org/10.1109/TASLP.2014.2387385
G. Hua, J. Huang, Y.Q. Shi, J. Goh, V.L.L. Thing, Twenty years of digital audio watermarking-A comprehensive review. Signal Process. 128, 222–242 (2016). https://doi.org/10.1016/j.sigpro.2016.04.005
M.J. Hwang, J. Lee, M. Lee, H.G. Kang, SVD-based adaptive QIM watermarking on stereo audio signals. IEEE Trans. Multimedia 20(1), 45–54 (2018). https://doi.org/10.1109/TMM.2017.2721642
R. ITU-R, Recommendation ITU-R BS. 1387-1 method for objective measurements of perceived audio quality, BS. 1387-1 International Telecommunications Union-Recommendation, Geneva (1998)
W. Jiang, X. Huang, Y. Quan, Audio watermarking algorithm against synchronization attacks using global characteristics and adaptive frame division. Signal Process. 162, 153–160 (2019). https://doi.org/10.1016/j.sigpro.2019.04.017
R. Jiao, S. Ma, B. Li, Framelet image watermarking considering dynamic visual masking. Optik 126(21), 3197–3202 (2015). https://doi.org/10.1016/j.ijleo.2015.07.084
P. Kabal, An examination and interpretation of ITU-R BS. 1387: perceptual evaluation of audio quality. TSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, pp. 1–89 (2002)
X. Kang, R. Yang, J. Huang, Geometric invariant audio watermarking based on an LCM feature. IEEE Trans. Multimedia 13(2), 181–190 (2011). https://doi.org/10.1109/TMM.2010.2098850
A. Kanhe, A. Gnanasekaran, A DCT-SVD based speech steganography in voiced frames. Circuits Syst. Signal Process. 37, 5049–5068 (2018). https://doi.org/10.1007/s00034-018-0805-9
A. Kaur, M.K. Dutta, An optimized high payload audio watermarking algorithm based on LU-factorization. Multimedia Syst. 24(3), 341–353 (2018). https://doi.org/10.1007/s00530-017-0545-x
B.S. Ko, R. Nishimura, Y. Suzuki, Time-spread echo method for digital audio watermarking. IEEE Trans. Multimedia 7(2), 212–221 (2005). https://doi.org/10.1109/TMM.2005.843366
A. Lang, StirMark benchmark for audio (2008). http://sourceforge.net/projects/stirmark. Accessed on Jan 2022
A.N. Lemma, J. Aprea, W. Oomen, L. van de Kerkhof, A temporal domain audio watermarking technique. IEEE Trans. Signal Process. 51(4), 1088–1097 (2003). https://doi.org/10.1109/TSP.2003.809372
W.N. Lie, L.C. Chang, Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimedia 8(1), 46–59 (2006). https://doi.org/10.1109/TMM.2005.861292
Z. Liu, Y. Huang, J. Huang, Patchwork-based audio watermarking robust against de-synchronization and recapturing attacks. IEEE Trans. Inf. Forensics Secur. 14(5), 1171–1180 (2019). https://doi.org/10.1109/TIFS.2018.2871748
S. Mishra, V.K. Yadav, M.C. Trivedi, T. Shrimali, Audio steganography techniques: a survey. In: Advances in Computer and Computational Sciences, pp. 581–589. Springer (2018). https://doi.org/10.1007/978-981-10-3773-3_56
L. Rabiner, R. Schafer, Digital Processing of Speech Signals (Prentice Hall, USA, 1978)
I.T. Recommendation, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862 (2001)
S. Sarreshtedari, M.A. Akhaee, A. Abbasfar, A watermarking method for digital speech self-recovery. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1917–1925 (2015). https://doi.org/10.1109/TASLP.2015.2456431
I.W. Selesnick, A. Abdelnour, Symmetric wavelet tight frames with two generators. Appl. Comput. Harmon. Anal. 17(2), 211–225 (2004). https://doi.org/10.1016/j.acha.2004.05.003. (Special Issue: Frames in Harmonic Analysis, Part II)
Z. Su, G. Zhang, F. Yue, L. Chang, J. Jiang, X. Yao, SNR-constrained heuristics for optimizing the scaling parameter of robust audio watermarking. IEEE Trans. Multimedia 20(10), 2631–2644 (2018). https://doi.org/10.1109/TMM.2018.2812599
M.S. Subhedar, V.H. Mankar, Secure image steganography using framelet transform and bidiagonal SVD. Multimed. Tools Appl. 79(3), 1865–1886 (2020). https://doi.org/10.1007/s11042-019-08221-9
N.H. Sultan, N.H.A. Khammas, Z.H. Najm, Image watermarking based on framelet transform. Period. Eng. Natural Sci. 9(1), 37–47 (2021). https://doi.org/10.1016/j.ijleo.2015.07.084
Unified speech and audio database(USAC). http://www.voiceage.com/Audio-Samples-AMR-WB.html (2008). Accessed on Feb 2021
M. Unoki, R. Miyauchi, Detection of tampering in speech signals with inaudible watermarking technique. In: 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Process., pp. 118–121 (2012). https://doi.org/10.1109/IIH-MSP.2012.34
K. Vivekananda Bhat, A.K. Das, J. Lee, A mean quantization watermarking scheme for audio signals using singular-value decomposition. IEEE Access 7, 157,480-157,488 (2019). https://doi.org/10.1109/ACCESS.2019.2949691
Y. Wang, K. Yang, Y. Yang, J. Zhang, X. Zhao, Audio steganalysis dataset (2019). https://doi.org/10.21227/rab0-vf56. Accessed on Feb 2021
Y. Xiang, D. Peng, I. Natgunanathan, W. Zhou, Effective pseudonoise sequence and decoding function for imperceptibility and robustness enhancement in time-spread echo-based audio watermarking. IEEE Trans. Multimedia 13(1), 2–13 (2011). https://doi.org/10.1109/TMM.2010.2080668
Y. Xiang, I. Natgunanathan, D. Peng, W. Zhou, S. Yu, A dual-channel time-spread echo method for audio watermarking. IEEE Trans. Inf. Forensics Secur. 7(2), 383–392 (2012). https://doi.org/10.1109/TIFS.2011.2173678
M. Xiao, Z. He, T. Quan, A robust digital watermarking algorithm based on framelet and SVD. In: Proceedings of SPIE 9811, MIPPR 2015: Multispectral Image Acquisition, Processing, and Analysis, 981119, vol. 9811, pp. 295–300. SPIE (2015). https://doi.org/10.1117/12.2209570
Y. Xue, K. Mu, Y. Wang, Y. Chen, P. Zhong, J. Wen, Robust speech steganography using differential SVD. IEEE Access 7, 153,724-153,733 (2019). https://doi.org/10.1109/access.2019.2948946
J. Zhao, T. Zong, Y. Xiang, L. Gao, W. Zhou, G. Beliakov, Desynchronization attacks resilient watermarking method based on frequency singular value coefficient modification. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2282–2295 (2021). https://doi.org/10.1109/TASLP.2021.3092555
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Effect of Additive White Gaussian Noise
In the proposed algorithm, the watermark is extracted by computing the average of non-diagonal elements of the matrix \([D_w]\) and by using the decision logic in Eq. (17). It can be observed that, the extraction of watermark depends upon the orthonormal matrices \(U_1\) and \(V_1\). If the singular valued matrix \(S_r\) at the extraction is invariant then the watermark can be extracted with minimum error.
The effect of AWGN on singular values is modeled intuitively as follows. Assume that, AWGN follows independent and identically distributed (i.i.d) random process.
where \(x_w\) is the watermarked signal, n is the White Gaussian noise, and y is the noisy signal. As \(L_2\) norm is preserved, the energy of watermarked signal \(E_w\) can be written as,
where c indicates the framelet transform coefficients, U and V are unitary matrices.
By using the property of i.i.d for AWGN, the energy of noisy signal can be approximated as,
This shows that, AWGN effects the singular values at the extraction side. If the energy of watermarked signal is high when compared to noise energy then the singular values will be invariant to the AWGN attack.
From Eq. (16), it can be observed that the energy of watermarked signal depend on the scaling parameter \(\alpha \). Therefore, by maintaining good SNR at the embedding side, the effect of AWGN on watermarked signal can be minimized.
Appendix B: Effect of Amplitude Scaling
The effect of amplitude scaling on the extraction of watermark is discussed here. Consider the watermarked signal x(t) and its framelet transform coefficients are obtained by Eq. (3)
The coefficients are arranged in a matrix [A] and SVD is performed to decompose into [U], [S], [V] matrices as below:
If the amplitude of watermarked signal is scaled by a factor of \(\beta \) then the corresponding framelet coefficients are also scaled by the factor of \(\beta \) and is shown below:
The coefficients are arranged in a matrix [B] and its SVD can be expressed as
This shows that, the decision rule in Eq. (17) doesn’t gets effected due to the amplitude scaling attack. Hence, the recovery of watermark with minimum error is possible.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, K.P., Kanhe, A. An Adaptive Embedding Approach for High Imperceptible and Robust Audio Watermarking Using Framelet Transform and SVD. Circuits Syst Signal Process 42, 5684–5713 (2023). https://doi.org/10.1007/s00034-023-02382-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02382-7