Abstract
The ever-increasing use of artificial intelligence (AI) to optimize embedding a watermark or find the optimal location in the host image for inserting a watermark might solve some problems in this field. However, the main problem, which is finding the optimum trade-off point among several watermarking criteria, has still not been investigated by researchers in this field, especially for speech signals. This paper aims to find the best trade-off among the watermarking requirements such as capacity, inaudibility, and robustness by applying an AI model. Moreover, a novel watermarking technique is proposed by modification of the probability density function (PDF) of the linear predictive (LP) residual and wavelet detail coefficient. For this method, a mathematical model is developed based on applying higher-order statistics for embedding and extracting the watermark. Sinh-arcsinh is used to shape the skewness and kurtosis of normal distribution for the LP residual or the wavelet high-frequency sub-bands, respectively, based on the watermark bits. Experimental results will show that although LP residual is not robust and it shows random behavior for modeling its PDF, the wavelet high-frequency band is quite robust and it can model the PDF of the wavelet. Moreover, it is demonstrated that AI has the capability to compromise among the watermarking criteria. Conclusions are drawn based on theoretical (maximum likelihood) and AI (machine learning) approaches, which confirm the effectiveness of the proposed model. Finally, in conclusion, several potential areas are discussed for further exploration.
Similar content being viewed by others
References
M.A. Akhaee, N.K. Kalantari, F. Marvasti, Robust audio and speech watermarking using Gaussian and Laplacian modeling. Signal Process. 90(8), 2487–2497 (2010)
L. Alzubaidi et al., A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. J. Big Data 10(1), 46 (2023)
P. Amrit, A.K. Singh, Survey on watermarking methods in the artificial intelligence domain and beyond. Comput. Commun. 188, 52–65 (2022)
P. Bhinder, N. Jindal, K. Singh, An improved robust image-adaptive watermarking with two watermarks using statistical decoder. Multimed. Tools Appl. 79, 183–217 (2020)
F. Deeba et al., Digital watermarking using deep neural network. Int. J. Mach. Learn. Comput. 10(2), 277–282 (2020)
S. Gazor, W. Zhang, Speech probability distribution. IEEE Signal Process. Lett. 10(7), 204–207 (2003)
C. Gu et al., Watermarking pre-trained language models with backdooring. arXiv preprint arXiv:2210.07543 (2022)
C.C. Hsu, Synthesizing personalized non-speech vocalization from discrete speech representations. arXiv preprint arXiv:2206.12662 (2022). Available: https://www.resemble.ai/neural-speech-watermarker
M.C. Jones, A. Pewsey, Sinh-arcsinh distributions. Biometrika 96(4), 761–780 (2009)
C.T. Leondes, Stochastic Digital Control System Techniques: Advances in Theory and Applications (Academic Press, 1996)
Y. Li, H. Wang, M. Barni, A survey of deep neural network watermarking techniques. Neurocomputing 461, 171–193 (2021)
X. Liang, S. Xiang, Robust reversible audio watermarking based on high-order difference statistics. Signal Process. 173, 107584 (2020)
S. Lounici et al. Yes we can: watermarking machine learning models beyond classification, in 2021 IEEE 34th Computer Security Foundations Symposium (CSF). IEEE (2021)
C.O. Mawalim, M. Unoki, Speech watermarking method using McAdams coefficient based on random forest learning. Entropy 23(10), 1246 (2021)
I. Miller, Probability, Random Variables, and Stochastic Processes (JSTOR, 1966)
S.-M. Mun et al., Finding robust domain from attacks: a learning framework for blind watermarking. Neurocomputing 337, 191–202 (2019)
M.A. Nematollahi, Digital speech watermarking for online speaker recognition systems (2015)
M.A. Nematollahi, A machine learning approach for digital watermarking. Aust. J. Multi Discipl. Eng. (2023). https://doi.org/10.1080/14488388.2023.2200051
M.A. Nematollahi et al., Speaker frame selection for digital speech watermarking. Natl. Acad. Sci. Lett. 39, 197–201 (2016)
M.A. Nematollahi, S.A.R. Al-Haddad, An overview of digital speech watermarking. Int. J. Speech Technol. 16, 471–488 (2013)
M.A. Nematollahi et al., Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition. Multimed. Tools Appl. 76, 7251–7281 (2017)
M.A. Nematollahi, C. Vorakulpipat, H. Gamboa Rosales, Optimization of a blind speech watermarking technique against amplitude scaling. Secur. Commun. Netw. 2017 (2017)
M.A. Nematollahi, C. Vorakulpipat, H. Gamboa Rosales, Semifragile speech watermarking based on least significant bit replacement of line spectral frequencies. Math. Probl. Eng. 2017 (2017)
M.A. Nematollahi et al., Digital speech watermarking based on linear predictive analysis and singular value decomposition. Proc. Natl. Acad. Sci. India Sect. A 87, 433–446 (2017)
M.A. Nematollahi, C. Vorakulpipat, H.G. Rosales, Digital Watermarking (Springer, 2017)
K. Pavlović et al., Robust speech watermarking by a jointly trained embedder and detector using a DNN. Digital Signal Process. 122, 103381 (2022)
M. Płachta et al., Detection of image steganography using deep learning and ensemble classifiers. Electronics 11(10), 1565 (2022)
P. Rathi, S. Bhadauria, S. Rathi, Watermarking of deep recurrent neural network using adversarial examples to protect intellectual property. Appl. Artif. Intell. 36(1), 2008613 (2022)
M. Steinebach et al. StirMark benchmark: audio watermarking attacks, in Proceedings International Conference on Information Technology: Coding and Computing. IEEE (2001)
S. Sun et al. Detect and remove watermark in deep neural networks via generative adversarial networks. in Information Security: 24th International Conference, ISC 2021, Virtual Event, November 10–12, 2021, Proceedings 24. Springer (2021)
L. Tegendal, Watermarking in audio using deep learning (2019)
S. Verdú, A general formula for channel capacity. IEEE Trans. Inf. Theory 40(4), 1147–1157 (1994)
J. Zhang et al., An integrated multi-head dual sparse self-attention network for remaining useful life prediction. Reliab. Eng. Syst. Saf. 233, 109096 (2023)
J. Zhang et al., Lifetime extension approach based on Levenberg-Marquardt neural network and power routing of DC–DC converters. IEEE Trans. Power Electron. (2023). https://doi.org/10.1109/TPEL.2023.3275791
J. Zhang et al., A parallel hybrid neural network with integration of spatial and temporal features for remaining useful life prediction in prognostics. IEEE Trans. Instrum. Meas. 72, 1–12 (2022)
J. Zhang et al., An integrated multitasking intelligent bearing fault diagnosis scheme based on representation learning under imbalanced sample condition. IEEE Trans. Neural Netw. Learn. Syst. (2023). https://doi.org/10.1109/TNNLS.2022.3232147
W.R. Zwet, Convex Transformations of Random Variables (Mathematisch Centrum, Amsterdam, 1964)
Acknowledgements
The authors would like to appreciate the anonymous reviewers for all useful and constructive comments on the manuscript. All comments have been considered, and the paper is revised accordingly. This research was funded by Ministry of Education Humanities and Social Sciences Planning Fund Research Project,"Internet+Alzheimers Disease Personnel Safety Protection" Implementation Countermeasure Integration Research, Item Number: 16YJAZH040.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Nematollahi, M.A. Artificial Intelligence Approach for Tuning Speech-Adaptive Watermarking using Higher-Order Statistics (HOS). Circuits Syst Signal Process 43, 3297–3323 (2024). https://doi.org/10.1007/s00034-024-02618-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-024-02618-0