Abstract
With the rapid development of natural language processing technology, various linguistic steganographic methods have been proposed increasingly, which may bring great challenges in the governance of cyberspace security. The previous linguistic steganalysis methods based on neural networks with word embedding layer could only extract the context-independent word-level features, which are insufficient for capturing the complex semantic dependencies in sentences, thus may limit the performance of text steganalysis. In this paper, we propose a novel linguistic steganalysis model. We first employ the BERT or Glove component to extract the contextualized association relationships of words in the sentences. Then we put these extracted features into BiLSTM to further get context information. We use the attention mechanism to find out local parts that may be discordant in text. Finally, based on these extracted features, we use the softmax classifier to decide if the input sentence is cover or stego. Experimental results show that the proposed model can achieve currently the best performance of text steganalysis and hidden capacity estimation. Further experiments found that proposed model can even locate where the secret information may be embedded in the text to a certain extent. To the best of our knowledge, we made the first attempt to achieve text steganography positioning in the field of text steganalysis (Code and datasets are available at https://github.com/YangzlTHU/Linguistic-Steganography-and-Steganalysis).
Supported by the National Key Research and Development Program of China under Grant No. 2018YFB0804103 and the National Natural Science Foundation of China (No. U1936208, No. 61862002 and No. U1936216).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Pre-trained word embedding of GloVe can be downloaded from http://nlp.stanford.edu/projects/glove/.
References
Bao, Y., Yang, H., Yang, Z., Liu, S., Huang, Y.: Text steganalysis with attentional LSTM-CNN. arXiv preprint arXiv:1912.12871 (2019)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Boukis, A.C., Reiter, K., Frölich, M., Hofheinz, D., Meier, M.A.: Multicomponent reactions provide key molecules for secret communication. Nat. Commun. 9(1), 1–10 (2018)
Chang, C.Y., Clark, S.: Practical linguistic steganography using contextual synonym substitution and vertex colour coding. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1194–1203 (2010)
Chang, C.Y., Clark, S.: Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method. Comput. Linguist. 40(2), 403–448 (2014)
Dai, F.Z., Cai, Z.: Towards near-imperceptible steganographic text. arXiv preprint arXiv:1907.06679 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fang, T., Jaggi, M., Argyraki, K.: Generating steganographic text with LSTMs. arXiv preprint arXiv:1705.10742 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018)
Luo, Y., Huang, Y., Li, F., Chang, C.: Text steganography based on Ci-poetry generation using Markov chain model. TIIS 10(9), 4568–4584 (2016)
Michel, J.B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, classification (1992)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Rizzo, S.G., Bertini, F., Montesi, D.: Content-preserving text watermarking through unicode homoglyph substitution. In: Proceedings of the 20th International Database Engineering & Applications Symposium, pp. 97–104 (2016)
Sarkar, T., Selvakumar, K., Motiei, L., Margulies, D.: Message in a molecule. Nat. Commun. 7(1), 1–9 (2016)
Shannon, C.E.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28(4), 656–715 (1949)
Simmons, G.J.: The prisoners’ problem and the subliminal channel. In: Chaum, D. (ed.) Advances in Cryptology, pp. 51–67. Springer, Boston (1984). https://doi.org/10.1007/978-1-4684-4730-9_5
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. j. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Taskiran, C.M., Topkara, U., Topkara, M., Delp, E.J.: Attacks on lexical natural language steganography systems. In: Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072, p. 607209. International Society for Optics and Photonics (2006)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, Y., Zhang, W., Li, W., Yu, X., Yu, N.: Non-additive cost functions for color image steganography based on inter-channel correlations and differences. IEEE Trans. Inf. Forensics Secur. 15, 2081–2095 (2019)
Wen, J., Zhou, X., Zhong, P., Xue, Y.: Convolutional neural network based text steganalysis. IEEE Signal Process. Lett. 26(3), 460–464 (2019)
Wulf, W.A., Jones, A.K.: Reflections on cybersecurity. Science 326(5955), 943–944 (2009)
Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10 (2020)
Yang, H., Cao, X.: Linguistic steganalysis based on meta features and immune mechanism. Chin. J. Electron. 19(4), 661–666 (2010)
Yang, Z., Guo, X., Chen, Z., Huang, Y., Zhang, Y.: RNN-stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2019). https://doi.org/10.1109/TIFS.2018.2871746
Yang, Z., Du, X., Tan, Y., Huang, Y., Zhang, Y.J.: AAG-stega: automatic audio generation-based steganography. arXiv preprint arXiv:1809.03463 (2018)
Yang, Z., Huang, Y., Zhang, Y.J.: A fast and efficient text steganalysis method. IEEE Signal Process. Lett. 26(4), 627–631 (2019)
Yang, Z., Huang, Y., Zhang, Y.J.: TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows. Multimed. Tools Appl. 1–24 (2020)
Yang, Z., Wang, K., Ma, S., Huang, Y., Kang, X., Zhao, X.: IStego100K: large-scale image steganalysis dataset. In: Wang, H., Zhao, X., Shi, Y., Kim, H.J., Piva, A. (eds.) IWDW 2019. LNCS, vol. 12022, pp. 352–364. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43575-2_29
Yang, Z., Wei, N., Liu, Q., Huang, Y., Zhang, Y.: GAN-TStega: text steganography based on generative adversarial networks. In: Wang, H., Zhao, X., Shi, Y., Kim, H.J., Piva, A. (eds.) IWDW 2019. LNCS, vol. 12022, pp. 18–31. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43575-2_2
Yang, Z., Wei, N., Sheng, J., Huang, Y., Zhang, Y.J.: TS-CNN: text steganalysis from semantic space based on convolutional neural network. arXiv preprint arXiv:1810.08136 (2018)
Yang, Z., Zhang, P., Jiang, M., Huang, Y., Zhang, Y.-J.: RITS: real-time interactive text steganography based on automatic dialogue model. In: Sun, X., Pan, Z., Bertino, E. (eds.) ICCCS 2018. LNCS, vol. 11065, pp. 253–264. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00012-7_24
Yang, Z., Zhang, S., Hu, Y., Hu, Z., Huang, Y.: VAE-stega: linguistic steganography based on variational auto-encoder. IEEE Trans. Inf. Forensics Secur. 16, 880–895 (2020)
Ziegler, Z.M., Deng, Y., Rush, A.M.: Neural linguistic steganography. arXiv preprint arXiv:1909.01496 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zou, J., Yang, Z., Zhang, S., Rehman, S.u., Huang, Y. (2021). High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning. In: Zhao, X., Shi, YQ., Piva, A., Kim, H.J. (eds) Digital Forensics and Watermarking. IWDW 2020. Lecture Notes in Computer Science(), vol 12617. Springer, Cham. https://doi.org/10.1007/978-3-030-69449-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-69449-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69448-7
Online ISBN: 978-3-030-69449-4
eBook Packages: Computer ScienceComputer Science (R0)