PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation

He, Xiaoling; Zhou, Yuanding; Qin, Chuan; Qian, Zhenxing; Zhang, Xinpeng

doi:10.1007/978-981-97-2585-4_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14511))

Included in the following conference series:

International Workshop on Digital Watermarking

63 Accesses

Abstract

With the development of communication technologies, the practice of creating new texts by manipulating original sentence structures through multi-turn machine translation is widespread across various domains. Existing plagiarism detection models often treat different features uniformly and overlook the significance of disparities within high-dimensional features. Therefore, this paper proposes a novel plagiarism detection model towards multi-turn text back-translation (PDMTT), adopting a novel mechanism that combines local and global features and enhances them. The grouping enhancement fusion (GEF) mechanism assigns importance coefficients to sub-features, reinforcing critical aspects while diminishing less relevant ones. These enhanced features, generated by the GEF mechanism, are leveraged to extract high-quality text representations, thereby improving the precision of the model in distinguishing original content from back-translated texts. Furthermore, we improve the back-translation plagiarism detection capability of our model by optimizing the contrastive loss function and utilizing the fused translated representations as targets. To validate the effectiveness of our model, we also constructed a multi-tuple back-translation plagiarism dataset for model training and validation. Experimental results demonstrate that the proposed PDMTT outperforms previous methods in back-translation plagiarism detection, yielding superior text representations. The ablation study further confirms that the incorporation of the GEF mechanism effectively enhances the discrimination capability of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lu, L., Zhou, L.: DNAP: detection of news article plagiarism. In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, pp. 337–341 (2021)
Google Scholar
Jones, M.: Back-translation: the latest form of plagiarism. In: The 4th Asia Pacific Conference on Educational Integrity, Wollongong, Australia, pp. 1–7 (2009)
Google Scholar
Anchal, P., Urvashi, G.: A review on diverse algorithms used in the context of plagiarism detection. In: 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, pp. 1–6 (2023)
Google Scholar
Salha, A., Naomie, S., Ajith, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C 42(2), 133–149 (2012)
Article Google Scholar
Alzahrani, S., Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection lab report for PAN at CLEF 2010. In: CLEF 2010 LABs and Workshops, Notebook Papers, Padua, Italy, 22–23 September 2010 (2010)
Google Scholar
zu Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006). https://doi.org/10.1007/11735106_66
Chapter Google Scholar
El-Rashidy, M.A., Mohamed, R.G., El-Fishawy, N.A., et al.: An effective text plagiarism detection system based on feature selection and SVM techniques. Multimedia Tools Appl. 83, 2609–2646 (2023). https://doi.org/10.1007/s11042-023-15703-4
Article Google Scholar
Poibeau, T.: Machine Translation. MIT Press, Cambridge (2017)
Book Google Scholar
Yoon, K.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1746–1751 (2014)
Google Scholar
Cho, K., Van, M.B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1724–1734 (2014)
Google Scholar
Jeffrey, P., Richard, S., Christopher, D.M.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014)
Google Scholar
Jacob, D., Ming-Wei, C., Kenton, L., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019)
Google Scholar
Jun, G., Di, H., Xu, T., et al.: Representation degeneration problem in training natural language generation models. In: International Conference on Learning Representations, New Orleans, America, 6–9 May 2018 (2018)
Google Scholar
Nils, R., Iryna, G.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019, pp. 3982–3992 (2019)
Google Scholar
Li, B., Zhou, H., He, J.X., et al.: On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16–18 November 2020, pp. 9119–9130 (2020)
Google Scholar
Su, J.L., Cao, J.R., Liu, W.J., Ouyang, Y.W.: Whitening sentence representations for better semantics and faster retrieval. CoRR abs/2103.15316 (2021)
Google Scholar
Yan, Y.M., Li, R.M., Wang, S.R., et al.: ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5065–5075. Association for Computational Linguistics (2021)
Google Scholar
Spaces.Ac.cn. https://spaces.ac.cn/archives/8860. Accessed 12 June 2022
Li, X., Hu, X.L., Yang, J.: Spatial group-wise enhance: improving semantic feature learning in convolutional networks. CoRR abs/1905.09646 (2019)
Google Scholar
Hu, B.T., Chen, Q.C., Zhu, F.Z.: LCSTS: a large scale chinese short text summarization dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 1967–1972 (2015)
Google Scholar
Cer, D., Diab, M., Agirre, E., et al.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: The 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, August 2017, pp. 1–14 (2017)
Google Scholar
Nils, R., Philip, B., Iryna, G.: Task-oriented intrinsic evaluation of semantic textual similarity. In: Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers, pp. 87–96. The COLING 2016 Organizing Committee, Osaka (2016)
Google Scholar
Wang, T.Z., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th International Conference on Machine Learning. PMLR, vol. 119, pp. 9929–9939 (2020)
Google Scholar
Gao, T.Y., Yao, X.C., Chen, D.Q.: SimCSE: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, Virtual, Punta Cana, 7–11 November 2021, pp. 6894–6910 (2021)
Google Scholar
Conneau, A., Kiela, D., Schwenk, H., et al.: Supervised learning of universal sentence representations from natural language inference data. In: Palmer, M., Hwa, R., Riedel, S. (eds) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, 07–11 September 2017, pp. 670–680. Association for Computational Linguistics, Copenhagen (2017)
Google Scholar
Feng, M.F., Chen, Y.S., Guo, Y.C., et al.: Learning text representations for finding similar exercises. In: 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), Yilan, Taiwan, 20–22 May 2019, pp. 1–2 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants U20B2051 and 62172280; in part by the Natural Science Foundation of Shanghai under Grants 21ZR1444600; and in part by the Shanghai Science and Technology Committee Capability Construction Project for Shanghai Municipal Universities under Grant 20060502300.

Author information

Authors and Affiliations

University of Shanghai for Science and Technology, Shanghai, China
Xiaoling He, Yuanding Zhou & Chuan Qin
Fudan University, Shanghai, China
Zhenxing Qian & Xinpeng Zhang

Authors

Xiaoling He
View author publications
You can also search for this author in PubMed Google Scholar
Yuanding Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxing Qian
View author publications
You can also search for this author in PubMed Google Scholar
Xinpeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuan Qin .

Editor information

Editors and Affiliations

School of Cyber Security, Qilu University of Technology, Jinan, China
Bin Ma
Qilu University of Technology, Jinan, China
Jian Li
Qilu University of Technology, Jinan, China
Qi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, X., Zhou, Y., Qin, C., Qian, Z., Zhang, X. (2024). PDMTT: A Plagiarism Detection Model Towards Multi-turn Text Back-Translation. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_6

Download citation

DOI: https://doi.org/10.1007/978-981-97-2585-4_6
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2584-7
Online ISBN: 978-981-97-2585-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics