Abstract
Digital speech watermarking is a robust way to hide and thus secure data like audio and video from any intentional or unintentional manipulation through transmission. In terms of some signal characteristics including bandwidth, voice/non-voice and production model, digital speech signal is different from audio, music and other signals. Although, various review articles on image, audio and video watermarking are available, there are still few review papers on digital speech watermarking. Therefore this article presents an overview of digital speech watermarking including issues of robustness, capacity and imperceptibility. Other issues discussed are types of digital speech watermarking, application, models and masking methods. This article further highlights the related challenges in the real world, research opportunities and future works in this area, yet to be explored fully.
Similar content being viewed by others
References
Akhaee, M. A., Khademi Kalantari, N., & Marvasti, F. (2010). Robust audio and speech watermarking using Gaussian and Laplacian modeling. Signal Processing, 90(8), 2487–2497.
Alcántara, J. I., Dooley, G. J., Blamey, P. J., & Seligman, P. M. (1994). Preliminary evaluation of a formant enhancement algorithm on the perception of speech in noise for normally hearing listeners. International Journal of Audiology, 33(1), 15–27.
Ali, A., & Ahmad, M. (2010). Digital audio watermarking based on the discrete wavelets transform and singular value decomposition. European Journal of Scientific Research, 39(1), 6–21.
Arora, S., & Emmanuel, S. Adaptive spread spectrum based watermarking of speech (2013).
Barni, M., & Bartolini, F. (2004). Watermarking systems engineering: enabling digital assets security and other applications. Signal processing and communications series (Vol. 21). Boca Raton: CRC Press.
Bender, W., Gruhl, D., Morimoto, N., & Lu, A. (1996). Techniques for data hiding. IBM Systems Journal, 35(3–4), 313–336.
Blamey, P., Dowell, R., Clark, G. M., & Seligman, P. (1987). Acoustic parameters measured by a formant-estimating speech processor for a multiple-channel cochlear implant. The Journal of the Acoustical Society of America, 82, 38.
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Celik, M., Sharma, G., & Tekalp, A. M. (2005). Pitch and duration modification for speech watermarking. Paper presented at the Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP, 2005).
Chen, S., & Leung, H. (2007). Speech bandwidth extension by data hiding and phonetic classification. Paper presented at the IEEE international conference on acoustics, speech and signal processing (ICASSP 2007).
Chen, O.-C., & Liu, C.-H. (2007). Content-dependent watermarking scheme in compressed speech with identifying manner and location of attacks. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1605–1616.
Chen, S.-H., & Yu, S.-Y. (2008). Speech watermarking based on wavelet transform and BCH Coding. Paper presented at the IEEE international conference on sensor networks, ubiquitous and trustworthy computing (SUTC’08).
Chen, N., & Zhu, J. (2007a). Multipurpose speech watermarking based on multistage vector quantization of linear prediction coefficients. The Journal of China Universities of Posts and Telecommunications, 14(4), 64–69.
Chen, N., & Zhu, J. (2007b). Robust speech watermarking algorithm. Electronics Letters, 43(24), 1393–1395.
Cheng, Y. M., & O’Shaughnessy, D. (1991). Speech enhancement based conceptually on auditory evidence. IEEE Transactions on Signal Processing, 39(9), 1943–1954.
Cheng, Q., & Sorensen, J. (2001). Spread spectrum signaling for speech watermarking. Paper presented at the Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’01).
Cheng, Q., & Sorensen, J. S. (2005). Spread spectrum signaling for speech watermarking. Google Patents.
Chu, W. C. (2003). Speech coding algorithms: foundation and evolution of standardized coders. New York: Wiley-Interscience.
Ciloglu, T., & Utku Karaaslan, S. (2000). An improved all-pass watermarking scheme for speech and audio. Paper presented at the IEEE international conference on multimedia and expo (ICME 2000).
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. New York: Wiley-Interscience.
Cox, I. J., Miller, M. L., & McKellips, A. L. (1999). Watermarking as communications with side information. Proceedings of the IEEE, 87(7), 1127–1141.
Cox, I., Miller, M., Bloom, J., & Honsinger, C. (2002). Digital watermarking. Journal of Electronic Imaging, 11(3), 414.
Dau, T., Püschel, D., & Kohlrausch, A. (1996a). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99, 3615.
Dau, T., Püschel, D., & Kohlrausch, A. (1996b). A quantitative model of the “effective” signal processing in the auditory system. II. Simulations and measurements. The Journal of the Acoustical Society of America, 99, 3623.
Deng, Z., Yang, Z., Shao, X., Xu, N., Wu, C., & Guo, H. (2007). Design and implementation of steganographic speech telephone. In Advances in multimedia information processing—PCM 2007 (pp. 429–432).
Dong, X., Bocko, M. F., & Ignjatovic, Z. (2004). Data hiding via phase manipulation of audio signals. Paper presented at the Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’04).
Dowling, R., & Turner, L. (1993). Modelling the detectability of changes in auditory signals. Paper presented at the IEEE international conference on acoustics, speech, and signal processing (ICASSP-93).
Faundez-Zanuy, M. (2010). Digital watermarking: new speech and image applications. Advances in Nonlinear Speech Processing, 84–89.
Faundez-Zanuy, M., Hagmüller, M., & Kubin, G. (2006). Speaker verification security improvement by means of speech watermarking. Speech Communication, 48(12), 1608–1619.
Faundez-Zanuy, M., Hagmüller, M., & Kubin, G. (2007). Speaker identification security improvement by means of speech watermarking. Pattern Recognition, 40(11), 3027–3034.
Faundez-Zanuy, M., Lucena-Molina, J. J., & Hagmüller, M. (2010). Speech watermarking: an approach for the forensic analysis of digital telephonic recordings. Journal of Forensic Sciences, 55(4), 1080–1087.
Fazel, A., & Chakrabartty, S. (2011). An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits and Systems Magazine, 11(2), 62–81.
Geiser, B., & Vary, P. (2008). High rate data hiding in ACELP speech codecs. Paper presented at the IEEE international conference on acoustics, speech and signal processing (ICASSP 2008).
Geiser, B., Jax, P., & Vary, P. (2005). Artificial bandwidth extension of speech supported by watermark-transmitted side information. Paper presented at the proceedings of the 9th European conference on speech communication and technology INTERSPEECH 2005-EUROSPEECH.
Girin, L., & Marchand, S. (2004). Watermarking of speech signals using the sinusoidal model and frequency modulation of the partials. Paper presented at the IEEE international conference on acoustics, speech, and signal processing (ICASSP’04).
Gray, R., Buzo, A., Gray, A. Jr., & Matsuyama, Y. (1980). Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 367–376.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics (Vol. 1974). New York: Wiley.
Guillemot, L., & Moureaux, J. (2004). Hybrid transmission, compression and data hiding: quantisation index modulation as source coding strategy. Electronics Letters, 40(17), 1053–1055.
Guillemot, L., & Moureaux, J.-M. (2006). Indexing lattice vectors in a joint watermarking and compression scheme. Paper presented at the Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP 2006).
Hagmüller, M., Hering, H., Kröpfl, A., & Kubin, G. (2004). Speech watermarking for air traffic control. Watermark, 8(9), 10.
Harjito, B., Han, S., Potdar, V., Chang, E., & Xie, M. (2010). Secure communication in wireless multimedia sensor networks using watermarking. Paper presented at the 4th IEEE international conference on digital ecosystems and technologies (DEST).
Hatada, M., Sakai, T., Komatsu, N., & Yamazaki, Y. (2002). Digital watermarking based on process of speech production. Paper presented at the ITCom 2002: the convergence of information technologies and communications.
Hofbauer, K. (2009). Speech watermarking and air traffic control. Ph.D. dissertation, Graz University of Technology, Graz, Austria.
Hofbauer, K., Hering, H., & Kubin, G. (2005). Speech watermarking for the VHF radio channel. Paper presented at the proceedings of the 4th Eurocontrol innovative research workshop.
Hofbauer, K., Kubin, G., & Kleijn, W. B. (2009). Speech watermarking for analog flat-fading bandpass channels. IEEE Transactions on Audio, Speech, and Language Processing, 17(8), 1624–1637.
Huang, H.-C., & Fang, W.-C. (2010). Metadata-based image watermarking for copyright protection. Simulation Modelling Practice and Theory, 18(4), 436–445.
Huang, X., Acero, A., Hon, H.-W., & Reddy, R. (2001). Spoken language processing: a guide to theory, algorithm & system development. New York: Prentice Hall.
Huang, H.-C., Chu, S.-C., Pan, J.-S., Huang, C.-Y., & Liao, B.-Y. (2011). Tabu search based multi-watermarks embedding algorithm with multiple description coding. Information Sciences, 181(16), 3379–3396.
Jalil, Z. (2010). Copyright protection of plain text using digital watermarking.
Kiah, M. M., Zaidan, B., Zaidan, A., Ahmed, A. M., & Al-bakri, S. H. (2011). A review of audio based steganography and digital watermarking. International Journal of Physical Sciences, 6(16), 3837–3850.
Kim, D.-S. (2003). Perceptual phase quantization of speech. IEEE Transactions on Speech and Audio Processing, 11(4), 355–364.
Kleijn, W. B., & Paliwal, K. K. (1995). Speech coding and synthesis. Amsterdam: Elsevier
Kubin, G., Atal, B., & Kleijn, W. (1993). Performance of noise excitation for unvoiced speech. Paper presented at the Proceedings of the IEEE workshop on speech coding for telecommunications.
Kundur, D. (1999). Multiresolution digital watermarking: algorithms and implications for multimedia signals. University of Toronto.
Lacy, J., Quackenbush, S. R., Reibman, A. R., Shur, D., & Snyder, J. H. (1998). On combining watermarking with perceptual coding. Paper presented at the proceedings of the IEEE international conference on acoustics, speech and signal processing.
Levitt, H. (1971). Transformed up-down methods in psychoacoustics. The Journal of the Acoustical Society of America, 49, 467.
Lin, Y.-P., & Vaidyanathan, P. (1998). A kaiser window approach for the design of prototype filters of cosine modulated filterbanks. IEEE Signal Processing Letters, 5(6), 132–134.
Lin, Y.-C., Huang, Z.-K., Pong, R.-T., & Wang, C.-C. (2005). A robust watermarking scheme combined with the FSVQ for images. Paper presented at the third International conference on information technology and applications (ICITA 2005).
Liu, C.-H., & Chen, O.-C. (2004). Fragile speech watermarking scheme with recovering speech contents. Paper presented at the 47th Midwest symposium on circuits and systems (MWSCAS’04).
Lu, Z.-M., Xu, D.-G., & Sun, S.-H. (2005). Multipurpose image watermarking algorithm based on multistage vector quantization. IEEE Transactions on Image Processing, 14(6), 822–831.
Ma, L., Wu, Z.-j., Hu, Y., & Yang, W. (2007). An information-hiding model for secure communication. In Advanced intelligent computing theories and applications. With aspects of theoretical and methodological issues (pp. 1305–1314).
Malvar, H. S. (1990). Lapped transforms for efficient transform/subband coding. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(6), 969–978.
Malvar, H. S. (1992a). Signal processing with lapped transforms. Norwood: Artech House.
Malvar, K. (1992b). Extended lapped transforms: properties, applications, and fast algorithms. IEEE Transactions on Signal Processing, 40(11), 2703–2714.
McLoughlin, I. (2009). Applied speech and audio processing: with Matlab examples. Cambridge: Cambridge University Press.
Moulines, E., & Laroche, J. (1995). Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication, 16(2), 175–205.
Narimannejad, M., & Ahadi, S. M. (2011). Watermarking of speech signal through phase quantization of sinusoidal model. Paper presented at the 19th Iranian conference on electrical engineering (ICEE).
Nussbaumer, H. (1981). Pseudo QMF filter bank. IBM Technical Disclosure Bulletin, 24(6), 3081–3087.
Painter, T., & Spanias, A. (2000). Perceptual coding of digital audio. Proceedings of the IEEE, 88(4), 451–515.
Paliwal, K. K., & Alsteris, L. (2003). Usefulness of phase spectrum in human speech perception. Paper presented at the proc. Eurospeech.
Pérez-González, F., Mosquera, C., Barni, M., & Abrardo, A. (2005). Rational dither modulation: a high-rate data-hiding method invariant to gain attacks. IEEE Transactions on Signal Processing, 53(10), 3960–3975.
Pobloth, H. (2004). Perceptual and squared error aspects in speech and audio coding. Signaler, sensorer och system.
Pobloth, H., & Kleijn, W. B. (1999). On phase perception in speech. Paper presented at the Proceedings of the IEEE international conference on acoustics, speech, and signal processing.
Ruiz, F. J., & Deller, J. Jr. (2000). Digital watermarking of speech signals for the national gallery of the spoken word. Paper presented at the Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’00).
Sagi, A., & Malah, D. (2006). Bandwidth extension of telephone speech aided by data embedding. EURASIP Journal on Advances in Signal Processing, 2007.
Sang, J., Liao, X., & Alam, M. (2006). Neural-network-based zero-watermark scheme for digital images. Optical Engineering, 45(9), 097006.
Saraswathi, S. (2010). Speech authentication based on audio watermarking. International Journal of Information Technology, 16(1).
Schroeder, M. R., Atal, B. S., & Hall, J. (1979). Optimizing digital speech coders by exploiting masking properties of the human ear. The Journal of the Acoustical Society of America, 66, 1647.
Shen, L., Li, X., Wang, H., & Zhang, R. (2004). Speech hiding based on auditory wavelet. In Computational science and its applications (ICCSA 2004) (pp. 414–420).
Shlien, S. (1997). The modulated lapped transform, its time-varying forms, and its applications to audio coding standards. IEEE Transactions on Speech and Audio Processing, 5(4), 359–366.
Singh, J., Garg, P., & De, A. N. (2009). A combined watermarking and encryption algorithm for secure VoIP. Information Security Journal, 18(2), 99–105.
Swanson, M. D., Zhu, B., Tewfik, A. H., & Boney, L. (1998). Robust audio watermarking using perceptual masking. Signal Processing, 66(3), 337–355.
Taal, C. H., Hendriks, R. C., & Heusdens, R. (2012). A low-complexity spectro-temporal distortion measure for audio processing applications. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1553–1564.
Tempest, W. (1985). The noise handbook. New York: Academic Press.
Thomas, I. (1968). The influence of first and second formants on the intelligibility of clipped speech.
Unoki, M., Imabeppu, K., Hamada, D., Haniu, A., & Miyauchi, R. (2011). Embedding limitations with digital-audio watermarking method based on cochlear delay characteristics. Journal of Information Hiding and Multimedia Signal Processing, 2(1), 1–23.
van de Par, S., Kohlrausch, A., Heusdens, R., Jensen, J., & Jensen, S. H. (2005). A perceptual model for sinusoidal audio coding based on spectral integration. EURASIP Journal on Applied Signal Processing, 2005, 1292–1304.
Vary, P., & Martin, R. (2006). Digital speech transmission: enhancement, coding and error concealment. New York: Wiley.
William, S. (2006). Cryptography and network security (4th ed.). Delhi: Pearson Education India.
Wu, C.-P., & Kuo, C.-C. J. (2002). Fragile speech watermarking based on exponential scale quantization for tamper detection. Paper presented at the IEEE international conference on acoustics, speech, and signal processing (ICASSP).
Yan, B., & Guo, Y.-J. (2011). Speech authentication by semi-fragile speech watermarking utilizing analysis by synthesis and spectral distortion optimization. Multimedia Tools and Applications, 1–23.
Yan, B., Lu, Z.-M., Sun, S.-H., & Pan, J.-S. (2005). Speech authentication by semi-fragile watermarking. Paper presented at the knowledge-based intelligent, information and engineering systems.
Zhao, X., Guo, Y., Liu, J., & Yan, Y. (2011). Quantization Index Modulation audio watermarking system using a psychoacoustic model. Paper presented at the 8th international conference on information, communications and signal processing (ICICS).
Zhe-Ming, L., Bin, Y., & Sheng-He, S. (2005). Watermarking combined with CELP speech coding for authentication. IEICE Transactions on Information and Systems, 88(2), 330–334.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nematollahi, M.A., Al-Haddad, S.A.R. An overview of digital speech watermarking. Int J Speech Technol 16, 471–488 (2013). https://doi.org/10.1007/s10772-013-9192-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9192-6