High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning

Zou, Jiajun; Yang, Zhongliang; Zhang, Siyu; Rehman, Sadaqat ur; Huang, Yongfeng

doi:10.1007/978-3-030-69449-4_7

Jiajun Zou^12,13,
Zhongliang Yang^12,13,
Siyu Zhang^12,13,
Sadaqat ur Rehman¹⁴ &
…
Yongfeng Huang^12,13

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12617))

Included in the following conference series:

International Workshop on Digital Watermarking

835 Accesses
12 Citations

Abstract

With the rapid development of natural language processing technology, various linguistic steganographic methods have been proposed increasingly, which may bring great challenges in the governance of cyberspace security. The previous linguistic steganalysis methods based on neural networks with word embedding layer could only extract the context-independent word-level features, which are insufficient for capturing the complex semantic dependencies in sentences, thus may limit the performance of text steganalysis. In this paper, we propose a novel linguistic steganalysis model. We first employ the BERT or Glove component to extract the contextualized association relationships of words in the sentences. Then we put these extracted features into BiLSTM to further get context information. We use the attention mechanism to find out local parts that may be discordant in text. Finally, based on these extracted features, we use the softmax classifier to decide if the input sentence is cover or stego. Experimental results show that the proposed model can achieve currently the best performance of text steganalysis and hidden capacity estimation. Further experiments found that proposed model can even locate where the secret information may be embedded in the text to a certain extent. To the best of our knowledge, we made the first attempt to achieve text steganography positioning in the field of text steganalysis (Code and datasets are available at https://github.com/YangzlTHU/Linguistic-Steganography-and-Steganalysis).

Supported by the National Key Research and Development Program of China under Grant No. 2018YFB0804103 and the National Natural Science Foundation of China (No. U1936208, No. 61862002 and No. U1936216).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Pre-trained word embedding of GloVe can be downloaded from http://nlp.stanford.edu/projects/glove/.

References

Bao, Y., Yang, H., Yang, Z., Liu, S., Huang, Y.: Text steganalysis with attentional LSTM-CNN. arXiv preprint arXiv:1912.12871 (2019)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Boukis, A.C., Reiter, K., Frölich, M., Hofheinz, D., Meier, M.A.: Multicomponent reactions provide key molecules for secret communication. Nat. Commun. 9(1), 1–10 (2018)
Article Google Scholar
Chang, C.Y., Clark, S.: Practical linguistic steganography using contextual synonym substitution and vertex colour coding. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1194–1203 (2010)
Google Scholar
Chang, C.Y., Clark, S.: Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method. Comput. Linguist. 40(2), 403–448 (2014)
Article Google Scholar
Dai, F.Z., Cai, Z.: Towards near-imperceptible steganographic text. arXiv preprint arXiv:1907.06679 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fang, T., Jaggi, M., Argyraki, K.: Generating steganographic text with LSTMs. arXiv preprint arXiv:1705.10742 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018)
Google Scholar
Luo, Y., Huang, Y., Li, F., Chang, C.: Text steganography based on Ci-poetry generation using Markov chain model. TIIS 10(9), 4568–4584 (2016)
Google Scholar
Michel, J.B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, classification (1992)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Rizzo, S.G., Bertini, F., Montesi, D.: Content-preserving text watermarking through unicode homoglyph substitution. In: Proceedings of the 20th International Database Engineering & Applications Symposium, pp. 97–104 (2016)
Google Scholar
Sarkar, T., Selvakumar, K., Motiei, L., Margulies, D.: Message in a molecule. Nat. Commun. 7(1), 1–9 (2016)
Article Google Scholar
Shannon, C.E.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28(4), 656–715 (1949)
Article MathSciNet Google Scholar
Simmons, G.J.: The prisoners’ problem and the subliminal channel. In: Chaum, D. (ed.) Advances in Cryptology, pp. 51–67. Springer, Boston (1984). https://doi.org/10.1007/978-1-4684-4730-9_5
Chapter Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. j. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Taskiran, C.M., Topkara, U., Topkara, M., Delp, E.J.: Attacks on lexical natural language steganography systems. In: Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072, p. 607209. International Society for Optics and Photonics (2006)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, Y., Zhang, W., Li, W., Yu, X., Yu, N.: Non-additive cost functions for color image steganography based on inter-channel correlations and differences. IEEE Trans. Inf. Forensics Secur. 15, 2081–2095 (2019)
Article Google Scholar
Wen, J., Zhou, X., Zhong, P., Xue, Y.: Convolutional neural network based text steganalysis. IEEE Signal Process. Lett. 26(3), 460–464 (2019)
Article Google Scholar
Wulf, W.A., Jones, A.K.: Reflections on cybersecurity. Science 326(5955), 943–944 (2009)
Article Google Scholar
Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10 (2020)
Google Scholar
Yang, H., Cao, X.: Linguistic steganalysis based on meta features and immune mechanism. Chin. J. Electron. 19(4), 661–666 (2010)
Google Scholar
Yang, Z., Guo, X., Chen, Z., Huang, Y., Zhang, Y.: RNN-stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2019). https://doi.org/10.1109/TIFS.2018.2871746
Article Google Scholar
Yang, Z., Du, X., Tan, Y., Huang, Y., Zhang, Y.J.: AAG-stega: automatic audio generation-based steganography. arXiv preprint arXiv:1809.03463 (2018)
Yang, Z., Huang, Y., Zhang, Y.J.: A fast and efficient text steganalysis method. IEEE Signal Process. Lett. 26(4), 627–631 (2019)
Article Google Scholar
Yang, Z., Huang, Y., Zhang, Y.J.: TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows. Multimed. Tools Appl. 1–24 (2020)
Google Scholar
Yang, Z., Wang, K., Ma, S., Huang, Y., Kang, X., Zhao, X.: IStego100K: large-scale image steganalysis dataset. In: Wang, H., Zhao, X., Shi, Y., Kim, H.J., Piva, A. (eds.) IWDW 2019. LNCS, vol. 12022, pp. 352–364. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43575-2_29
Chapter Google Scholar
Yang, Z., Wei, N., Liu, Q., Huang, Y., Zhang, Y.: GAN-TStega: text steganography based on generative adversarial networks. In: Wang, H., Zhao, X., Shi, Y., Kim, H.J., Piva, A. (eds.) IWDW 2019. LNCS, vol. 12022, pp. 18–31. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43575-2_2
Chapter Google Scholar
Yang, Z., Wei, N., Sheng, J., Huang, Y., Zhang, Y.J.: TS-CNN: text steganalysis from semantic space based on convolutional neural network. arXiv preprint arXiv:1810.08136 (2018)
Yang, Z., Zhang, P., Jiang, M., Huang, Y., Zhang, Y.-J.: RITS: real-time interactive text steganography based on automatic dialogue model. In: Sun, X., Pan, Z., Bertino, E. (eds.) ICCCS 2018. LNCS, vol. 11065, pp. 253–264. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00012-7_24
Chapter Google Scholar
Yang, Z., Zhang, S., Hu, Y., Hu, Z., Huang, Y.: VAE-stega: linguistic steganography based on variational auto-encoder. IEEE Trans. Inf. Forensics Secur. 16, 880–895 (2020)
Article Google Scholar
Ziegler, Z.M., Deng, Y., Rush, A.M.: Neural linguistic steganography. arXiv preprint arXiv:1909.01496 (2019)

Download references

Author information

Authors and Affiliations

Beijing National Research Center for Information Science and Technology, Beijing, China
Jiajun Zou, Zhongliang Yang, Siyu Zhang & Yongfeng Huang
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Jiajun Zou, Zhongliang Yang, Siyu Zhang & Yongfeng Huang
Department of Computer Science, Namal Institute, Mianwali, 42250, Pakistan
Sadaqat ur Rehman

Authors

Jiajun Zou
View author publications
You can also search for this author in PubMed Google Scholar
Zhongliang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Siyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sadaqat ur Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Yongfeng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongliang Yang .

Editor information

Editors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Xianfeng Zhao
Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ, USA
Yun-Qing Shi
Department of Information Engineering, University of Florence, Florence, Italy
Alessandro Piva
School of Cybersecurity, Korea University, Seoul, Korea (Republic of)
Hyoung Joong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, J., Yang, Z., Zhang, S., Rehman, S.u., Huang, Y. (2021). High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning. In: Zhao, X., Shi, YQ., Piva, A., Kim, H.J. (eds) Digital Forensics and Watermarking. IWDW 2020. Lecture Notes in Computer Science(), vol 12617. Springer, Cham. https://doi.org/10.1007/978-3-030-69449-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-69449-4_7
Published: 12 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69448-7
Online ISBN: 978-3-030-69449-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics