Skip to main content
Log in

A triple-layer steganography scheme for low bit-rate speech streams

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the wide application of low bit-rate codecs in speech communication systems, low bit-rate speech streams have become new cover media of great potential for steganography. In this paper, through analyzing the pitch period prediction process in G.729 codec, the pitch parameter of the second speech subframe is found suitable for performing embedding. Then a novel triple-layer steganography method is proposed for low bit-rate speech streams. In this method, modification directions (adding or subtracting one) of the pitch parameter are selected adaptively in order to achieve a high embedding efficiency. Based on the “Hamming + 1” scheme, we use the matrix encoding method twice to increase the hiding capacity. Experimental results show that while keeping a good perceived quality of the synthetic speech, the proposed method has a good real-time performance and a satisfactory steganography security.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. An4 database (1991) http://www.speech.cs.cmu.edu/databases/an4/.

  2. Crandall R (1998) Some notes on steganography. http://os.inf.tu-dresden.de/~westfeld/crandall.pdf. Accessed 1 December 2009

  3. Dittmann J, Hesse D, Hillert R (2005) Steganography and steganalysis in voice over IP scenarios: Operational aspects and first experiences with a new steganalysis tool set. Proc SPIE 5681:607–618, Security, Steganography, and Watermarking of Multimedia Contents VII, San Jose

  4. Huang, Y., Xiao, B., Xiao, H (2008) Implementation of Covert Communication Based on Steganography. Intelligent Information Hiding and Multimedia Signal Processing:1512–1515, IIH-MSP, International Conference on. IEEE.

  5. ITU-T, Recommendation P (2001) 862-perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T).

  6. ITU-T, Recommendation G107 (2002) The E-Model, a computational model for use in transmission planning.

  7. Liu L, Li M, Li Q, Liang Y (2008) Perceptually Transparent Information Hiding in G.729 Bitstream. Intelligent Information Hiding and Multimedia Signal Processing:406–409, IIH-MSP, International Conference on. IEEE

  8. Liu Qingzhong, Sung A H, Qiao Mengyu (2009) Temporal Derivative-based Spectrum and Mel-cepstrum Audio Steganalysis. Information Forensics and Security 4:359–368. IEEE Transactions on.

  9. Mazurczyk W, Szczypiorski K (2008) Steganography of VoIP streams. On the Move to Meaningful Internet Systems: OTM 5332:1001–1018

    Google Scholar 

  10. Tian H, Zhou K, Jiang H, Liu J, Huang Y, Feng D (2009) An adaptive steganography scheme for voice over IP. IEEE International Symposium on Circuits and Systems (ISCAS), Taipei, Taiwan, 24–27.

  11. Tian H, Zhou K, Feng D (2010) Dynamic Matrix encoding strategy for voice-over-IP steganography. J Cent S Univ Technol 17:1285–1292

    Article  Google Scholar 

  12. Xiao B., Huang Y., Tang S. (2008) An approach to Information Hiding in Low bit-rate Speech Stream. Global Telecommunications Conference:1–5, IEEE GLOBECOM.

  13. Yongfeng Huang, Chenghao Liu, Shanyu Tang, Sen Bai (2012) Steganography Integration Into a Low-Bit Rate Speech Codec. Information Forensics and Security 7:1865–1875, IEEE Transactions on.

  14. Yu C, Huang L-S, Yang W (2012) A 3G Speech data hiding method based on pitch period. Journal of Chinese Computer Systems 33:1445–1449

    Google Scholar 

  15. Zhang Weiming, Wang Shuozhong, Zhang Xinpeng (2007) Improving embedding efficiency of covering codes for applications in steganography. Communications Letters 11:680–682. IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shufan Yan.

Appendix

Appendix

Assume that there are 2k–1 speech frames. In the first layer, k secret bits can be embedded in 2k–1 pitch parameters using the ME method. After calculating the value of r, there are two cases:

Case 1: r ≠ 0, P (Case 1) = (2k–1)/2k. In the third layer, we can embed another k secret bits into 2k–1 L3-0 bits using the ME method. After calculating the value of h, there are also two cases:

Case 1.1: h ≠ 0, P (Case 1.1) = (2k–1)/2k. By controlling the modification direction of p r 2, make the result of Eq. (12) not equal to the secret bit to be embedded, namely λ ≠ β. Then flip the L3-0 h bit. The total number of cover bits changed is 2.

Case 1.2: h = 0, P (Case 1.2) = 1/2k. By controlling the modification direction of p r 2, make the result of Eq. (12) equal to the secret bit to be embedded, namely λ = β. The total number of cover bits changed is 1.

Case 2: r = 0, P (Case 2) = 1/2k. According to the values of λ and β, there are also two cases:

Case 2.1: λ = β, P (Case 2) = 1/2.

Case 2.1.1: h ≠ 0, P (Case 2.1.1) = P (Case 1.1). Flip the L3-0 f and L3-0 g bits. The total number of cover bits changed is 2.

Case 2.1.2: h = 0, P (Case 2.1.2) = P (Case 1.2). All the three layers have been completed, so the total number of cover bits changed is 0.

Case 2.2: λ ≠ β, P (Case 2.2) = 1/2.

Case 2.2.1: h ≠ 0, P (Case 2.2.1) = P (Case 1.1). Flip the L3-0 h bit. The total number of cover bits changed is 1.

Case 2.2.2: h = 0, P (Case 2.2.2) = P (Case 1.2). Flip the L3 ‐ 0 h ', L3 ‐ 0 f ' and L3 ‐ 0 g ' bits. The total number of cover bits changed is 3.

In conclusion, when embedding 2 k + 1 secret bits, the average number of cover bits changed is:

$$ \begin{array}{l}D=P\left(\mathrm{Case}\ 1\right)\times \left[P\left(\mathrm{Case}\ 1.1\right)\times 2+P\left(\mathrm{Case}\ 1.2\right)\times 1\right]\\ {}\kern1.5em +P\left(\mathrm{Case}\ 2\right)\times P\left(\mathrm{Case}\ 2.1\right)\times \left[P\left(\mathrm{Case}\ \mathrm{2.1.1}\right)\times 2+P\left(\mathrm{Case}\ \mathrm{2.1.2}\right)\times 0\right]\\ {}\kern1.5em +P\left(\mathrm{Case}\ 2\right)\times P\left(\mathrm{Case}\ 2.2\right)\times \left[P\left(\mathrm{Case}\ \mathrm{2.2.1}\right)\times 1+P\left(\mathrm{Case}\ \mathrm{2.2.2}\right)\times 3\right]\\ {}\kern0.75em =\frac{4^{k+1}-3\cdot {2}^k+2}{2^{2k+1}}\end{array} $$
(22)

Thus the bit-change rate (the average rate of being changed per cover bit) of the proposed method is:

$$ C=\frac{D}{N}=\frac{4^{k+1}-3\cdot {2}^k+2}{4\cdot \left({2}^{3k}-{2}^{2k}\right)} $$
(23)

Where N = 2 · (2k–1) is the total number of cover bits. Therefore, we can figure out the embedding rate and the embedding efficiency:

$$ R=\frac{2k+1}{N}=\frac{2k+1}{2\cdot \left({2}^k-1\right)} $$
(24)
$$ E=\frac{R}{C}=\frac{\left(2k+1\right)\cdot {2}^{2k+1}}{4^{k+1}-3\cdot {2}^k+2} $$
(25)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, S., Tang, G., Sun, Y. et al. A triple-layer steganography scheme for low bit-rate speech streams. Multimed Tools Appl 74, 11763–11782 (2015). https://doi.org/10.1007/s11042-014-2265-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2265-y

Keywords

Navigation