Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario

Guo, Shengnan; Chen, Xuekai; Wang, Zhuang; Yang, Zhongliang; Zhou, Linna

doi:10.1007/978-981-97-2585-4_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14511))

Included in the following conference series:

International Workshop on Digital Watermarking

56 Accesses

Abstract

With the rapid development of the Internet, more and more methods of text steganography have emerged. However, these methods are easily abused in public networks for malicious purposes, which poses a great threat to cyberspace security. At present, a large number of text steganalysis methods have been proposed to game with text steganography. However, existing methods typically assume a balanced class distribution. In reality, stego texts are far less than cover texts. How to accurately detect stego texts in massive texts becomes a challenge. In this paper, we propose a text steganalysis method based on an under-sample method and ensemble learning in imbalanced scenarios. Specifically, we introduce the thinking of clustering to under-sample the majority class samples (cover texts) based on the detection difficulty of the samples, in order to select samples with rich information. Ensemble learning is then used to ensemble the detection results of multiple base classifiers and guide the sampling process. We designed several experiments to test the detection performance of the proposed model. Experimental results show that the proposed model can effectively compensate for the deficiencies of existing methods, even in highly imbalanced datasets, the model can still detect stego texts effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)
Article MathSciNet Google Scholar
Chen, Z., Huang, L., Miao, H., Yang, W., Meng, P.: Steganalysis against substitution-based linguistic steganography based on context clusters. Comput. Electr. Eng. 37(6), 1071–1081 (2011)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)
Article MathSciNet Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
Article Google Scholar
Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Google Scholar
Huang, Y.F., Tang, S., Yuan, J.: Steganography in inactive frames of VoIP streams encoded by source codec. IEEE Trans. Inf. Forensics Secur. 6(2), 296–306 (2011)
Article Google Scholar
Johnson, N.F., Sallee, P.A.: Detection of hidden information, covert channels and information flows. In: Wiley Handbook of Science and Technology for Homeland Security, pp. 1–37 (2008)
Google Scholar
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS, vol. 2101, pp. 63–66. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48229-6_9
Chapter Google Scholar
Li, S., Wang, J., Liu, P.: Detection of generative linguistic steganography based on explicit and latent text word relation mining using deep learning. IEEE Trans. Dependable Secure Comput. 20(2), 1476–1487 (2022)
Article Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)
Google Scholar
Liu, Y., Chawla, N.V., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)
Article Google Scholar
Liu, Z., Wei, P., Jiang, J., Cao, W., Bian, J., Chang, Y.: MESA: boost ensemble imbalanced learning with meta-sampler. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14463–14474 (2020)
Google Scholar
Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Sig. Process. Lett. 26(12), 1907–1911 (2019)
Article Google Scholar
Samanta, S., Dutta, S., Sanyal, G.: A real time text steganalysis by using statistical method. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH), pp. 264–268. IEEE (2016)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)
Article Google Scholar
Sun, B., Chen, H., Wang, J., Xie, H.: Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comput. Sci. 12, 331–350 (2018)
Article Google Scholar
Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)
Article Google Scholar
Tang, W., Li, B., Tan, S., Barni, M., Huang, J.: CNN-based adversarial embedding for image steganography. IEEE Trans. Inf. Forensics Secur. 14(8), 2074–2087 (2019)
Article Google Scholar
Wang, Y., Zhang, W., Li, W., Yu, X., Yu, N.: Non-additive cost functions for color image steganography based on inter-channel correlations and differences. IEEE Trans. Inf. Forensics Secur. 15, 2081–2095 (2019)
Article Google Scholar
Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5017–5026 (2019)
Google Scholar
Wu, H., Yi, B., Ding, F., Feng, G., Zhang, X.: Linguistic steganalysis with graph neural networks. IEEE Sig. Process. Lett. 28, 558–562 (2021)
Article Google Scholar
Xiang, L., Sun, X., Luo, G., Xia, B.: Linguistic steganalysis using the features derived from synonym frequency. Multimedia Tools Appl. 71, 1893–1911 (2014)
Article Google Scholar
Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10 (2020)
Google Scholar
Yang, H., Cao, X.: Linguistic steganalysis based on meta features and immune mechanism. Chin. J. Electron. 19(4), 661–666 (2010)
Google Scholar
Yang, J., Yang, Z., Zhang, S., Tu, H., Huang, Y.: SeSy: linguistic steganalysis framework integrating semantic and syntactic features. IEEE Sig. Process. Lett. 29, 31–35 (2021)
Article Google Scholar
Yang, Z.L., Guo, X.Q., Chen, Z.M., Huang, Y.F., Zhang, Y.J.: RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2018)
Article Google Scholar
Yang, Z., Du, X., Tan, Y., Huang, Y., Zhang, Y.J.: AAG-Stega: automatic audio generation-based steganography. arXiv preprint arXiv:1809.03463 (2018)
Yang, Z., Huang, Y., Zhang, Y.J.: A fast and efficient text steganalysis method. IEEE Sig. Process. Lett. 26(4), 627–631 (2019)
Article Google Scholar
Yang, Z., Huang, Y., Zhang, Y.J.: TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows. Multimedia Tools Appl. 79, 18293–18316 (2020)
Article Google Scholar
Zhang, S., Yang, Z., Yang, J., Huang, Y.: Provably secure generative linguistic steganography. arXiv preprint arXiv:2106.02011 (2021)
Zhou, F., et al.: Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification. Data Min. Knowl. Disc. 36(5), 1601–1622 (2022)
Article MathSciNet Google Scholar
Ziegler, Z.M., Deng, Y., Rush, A.M.: Neural linguistic steganography. arXiv preprint arXiv:1909.01496 (2019)
Zou, J., Yang, Z., Zhang, S., ur Rehman, S., Huang, Y.: High-performance linguistic steganalysis, capacity estimation and steganographic positioning. In: Zhao, X., Shi, Y.Q., Piva, A., Kim, H.J. (eds.) IWDW 2020. LNSC, vol. 12617, pp. 80–93. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69449-4_7
Chapter Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China under Grant 2022YFC3303301 and in part by the National Natural Science Foundation of China under Grant 62172053 and Grant 62302059.

Author information

Authors and Affiliations

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Shengnan Guo, Xuekai Chen, Zhuang Wang, Zhongliang Yang & Linna Zhou

Authors

Shengnan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xuekai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhuang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongliang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Linna Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongliang Yang .

Editor information

Editors and Affiliations

School of Cyber Security, Qilu University of Technology, Jinan, China
Bin Ma
Qilu University of Technology, Jinan, China
Jian Li
Qilu University of Technology, Jinan, China
Qi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, S., Chen, X., Wang, Z., Yang, Z., Zhou, L. (2024). Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_22

Download citation

DOI: https://doi.org/10.1007/978-981-97-2585-4_22
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2584-7
Online ISBN: 978-981-97-2585-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario