Skip to main content

Utilizing Deep Natural Language Processing to Detect Plagiarism

  • Conference paper
  • First Online:
Proceedings of the 2nd International Conference on Cognitive and Intelligent Computing (ICCIC 2022)

Part of the book series: Cognitive Science and Technology ((CSAT))

Included in the following conference series:

  • 200 Accesses

Abstract

Plagiarism is now pervasive in a variety of spheres of life, including academia and research. The development of plagiarism strategies used by plagiarists makes it difficult for existing approaches to accurately detect plagiarism. Plagiarism is checked using a variety of aspects, including syntactic, lexical, semantic, and structural features. This study examines novel and contemporary plagiarism detection tasks, particularly text-based and monolingual plagiarism detection. We suggested a four-stage innovative approach for detecting plagiarism. The natural language processing (NLP) methodology is used in this framework as opposed to the more conventional string-matching methods. By combining two metrics—skip gram and dice coefficient—on the basis of a corpus-based approach, this system investigates text similarity. Using the deep and shallow NLP approach, the text's deeper meaning is investigated. Our findings indicate that deep NLP is swiftly recognizing heavy revision. Shallow NLP efficiently prepares text for future processing. The findings of Word2vec are comparable to those of straightforward deep NLP techniques, however Word2vec also emphasizes documents that other methods might miss. Deep NLP also records changes in synonyms and phrases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kashkur M, Parshutin S (2010) Research into plagiarism cases and plagiarism detection methods. Riga Tech Univ Sci J 44(1):138–143

    Google Scholar 

  2. Scanlon PM, Neumann DR (2002) Internet plagiarism among college students. J Coll Stud Dev 43(3):374–385

    Google Scholar 

  3. Meuschke N, Gipp B (2013) State of the art in the detecting academic plagiarism. Int J Educ Integrity 9(1):50–71

    Google Scholar 

  4. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE 42(2):133–149

    Google Scholar 

  5. Mihalcea R, Liu H, Lieberman H (2006) Proceedings of the international conference on computational linguistic and intelligent systems. In: (NLP) Natural language processing for (NLP) natural language programming, Text Processing (CICLing), Mexico City (Mexico), 19–25 Feb 2006, pp 319–330

    Google Scholar 

  6. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    Google Scholar 

  7. Stamatatos E (2009) Intrinsic plagiarism detection using character n-gram profiles. In: Proceedings of the Spanish society for natural language processing (SEPLN) international conference, San Sebastian (Spain), 8–10 Sept 2009, pp 38–46

    Google Scholar 

  8. Automatic student plagiarism detection: future perspectives. J Educ Comput Res 43(4):507–527 (2010)

    Google Scholar 

  9. Botana G, Leon J, Olmos R, Escudero I (2010) Latent semantic analysis parameters for essay evaluation using small-scale corpora. J Quant Linguist 17(1):1–29

    Google Scholar 

  10. Micol D, Munoz R, Ferrandez O (2011) Investigating advanced techniques for document content similarity applied to external plagiarism analysis. In: Recent advances in natural language processing (RANLP) conference proceedings, Hissar (Bulgaria), 12–14 Sept 2011, pp 240–246

    Google Scholar 

  11. Using natural language processing for automatic detection of plagiarism. In: Proceedings of the international plagiarism conference (IPC), Northumbria University (Newcastle), 21–23 June, 2010

    Google Scholar 

  12. Kucecka T (2011) Plagiarism detection in obfuscated documents using an N-gram technique. ACM 3(2):67–71

    Google Scholar 

  13. Bose R (2004) Natural language processing: current state and future directions. Int J Comput 12(1):1–11

    Google Scholar 

  14. Parker A, Hamblen J (1989) Computer algorithms for plagiarism detection. IEEE Trans Edu 32:94–99. http://dx.doi.org/10.1109/13.28038

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Praveen Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Praveen Kumar, K., Jaya Kumari, D., Uma Sankar, P. (2023). Utilizing Deep Natural Language Processing to Detect Plagiarism. In: Kumar, A., Ghinea, G., Merugu, S. (eds) Proceedings of the 2nd International Conference on Cognitive and Intelligent Computing. ICCIC 2022. Cognitive Science and Technology. Springer, Singapore. https://doi.org/10.1007/978-981-99-2742-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2742-5_29

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2741-8

  • Online ISBN: 978-981-99-2742-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics