Abstract
With the wide application of low bit-rate codecs in speech communication systems, low bit-rate speech streams have become new cover media of great potential for steganography. In this paper, through analyzing the pitch period prediction process in G.729 codec, the pitch parameter of the second speech subframe is found suitable for performing embedding. Then a novel triple-layer steganography method is proposed for low bit-rate speech streams. In this method, modification directions (adding or subtracting one) of the pitch parameter are selected adaptively in order to achieve a high embedding efficiency. Based on the “Hamming + 1” scheme, we use the matrix encoding method twice to increase the hiding capacity. Experimental results show that while keeping a good perceived quality of the synthetic speech, the proposed method has a good real-time performance and a satisfactory steganography security.
Similar content being viewed by others
References
An4 database (1991) http://www.speech.cs.cmu.edu/databases/an4/.
Crandall R (1998) Some notes on steganography. http://os.inf.tu-dresden.de/~westfeld/crandall.pdf. Accessed 1 December 2009
Dittmann J, Hesse D, Hillert R (2005) Steganography and steganalysis in voice over IP scenarios: Operational aspects and first experiences with a new steganalysis tool set. Proc SPIE 5681:607–618, Security, Steganography, and Watermarking of Multimedia Contents VII, San Jose
Huang, Y., Xiao, B., Xiao, H (2008) Implementation of Covert Communication Based on Steganography. Intelligent Information Hiding and Multimedia Signal Processing:1512–1515, IIH-MSP, International Conference on. IEEE.
ITU-T, Recommendation P (2001) 862-perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T).
ITU-T, Recommendation G107 (2002) The E-Model, a computational model for use in transmission planning.
Liu L, Li M, Li Q, Liang Y (2008) Perceptually Transparent Information Hiding in G.729 Bitstream. Intelligent Information Hiding and Multimedia Signal Processing:406–409, IIH-MSP, International Conference on. IEEE
Liu Qingzhong, Sung A H, Qiao Mengyu (2009) Temporal Derivative-based Spectrum and Mel-cepstrum Audio Steganalysis. Information Forensics and Security 4:359–368. IEEE Transactions on.
Mazurczyk W, Szczypiorski K (2008) Steganography of VoIP streams. On the Move to Meaningful Internet Systems: OTM 5332:1001–1018
Tian H, Zhou K, Jiang H, Liu J, Huang Y, Feng D (2009) An adaptive steganography scheme for voice over IP. IEEE International Symposium on Circuits and Systems (ISCAS), Taipei, Taiwan, 24–27.
Tian H, Zhou K, Feng D (2010) Dynamic Matrix encoding strategy for voice-over-IP steganography. J Cent S Univ Technol 17:1285–1292
Xiao B., Huang Y., Tang S. (2008) An approach to Information Hiding in Low bit-rate Speech Stream. Global Telecommunications Conference:1–5, IEEE GLOBECOM.
Yongfeng Huang, Chenghao Liu, Shanyu Tang, Sen Bai (2012) Steganography Integration Into a Low-Bit Rate Speech Codec. Information Forensics and Security 7:1865–1875, IEEE Transactions on.
Yu C, Huang L-S, Yang W (2012) A 3G Speech data hiding method based on pitch period. Journal of Chinese Computer Systems 33:1445–1449
Zhang Weiming, Wang Shuozhong, Zhang Xinpeng (2007) Improving embedding efficiency of covering codes for applications in steganography. Communications Letters 11:680–682. IEEE.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Assume that there are 2k–1 speech frames. In the first layer, k secret bits can be embedded in 2k–1 pitch parameters using the ME method. After calculating the value of r, there are two cases:
Case 1: r ≠ 0, P (Case 1) = (2k–1)/2k. In the third layer, we can embed another k secret bits into 2k–1 L3-0 bits using the ME method. After calculating the value of h, there are also two cases:
Case 1.1: h ≠ 0, P (Case 1.1) = (2k–1)/2k. By controlling the modification direction of p r 2, make the result of Eq. (12) not equal to the secret bit to be embedded, namely λ ≠ β. Then flip the L3-0 h bit. The total number of cover bits changed is 2.
Case 1.2: h = 0, P (Case 1.2) = 1/2k. By controlling the modification direction of p r 2, make the result of Eq. (12) equal to the secret bit to be embedded, namely λ = β. The total number of cover bits changed is 1.
Case 2: r = 0, P (Case 2) = 1/2k. According to the values of λ and β, there are also two cases:
Case 2.1: λ = β, P (Case 2) = 1/2.
Case 2.1.1: h ≠ 0, P (Case 2.1.1) = P (Case 1.1). Flip the L3-0 f and L3-0 g bits. The total number of cover bits changed is 2.
Case 2.1.2: h = 0, P (Case 2.1.2) = P (Case 1.2). All the three layers have been completed, so the total number of cover bits changed is 0.
Case 2.2: λ ≠ β, P (Case 2.2) = 1/2.
Case 2.2.1: h ≠ 0, P (Case 2.2.1) = P (Case 1.1). Flip the L3-0 h bit. The total number of cover bits changed is 1.
Case 2.2.2: h = 0, P (Case 2.2.2) = P (Case 1.2). Flip the L3 ‐ 0 h ', L3 ‐ 0 f ' and L3 ‐ 0 g ' bits. The total number of cover bits changed is 3.
In conclusion, when embedding 2 k + 1 secret bits, the average number of cover bits changed is:
Thus the bit-change rate (the average rate of being changed per cover bit) of the proposed method is:
Where N = 2 · (2k–1) is the total number of cover bits. Therefore, we can figure out the embedding rate and the embedding efficiency:
Rights and permissions
About this article
Cite this article
Yan, S., Tang, G., Sun, Y. et al. A triple-layer steganography scheme for low bit-rate speech streams. Multimed Tools Appl 74, 11763–11782 (2015). https://doi.org/10.1007/s11042-014-2265-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2265-y