Skip to main content

Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7934)

Abstract

Intrinsic plagiarism detection deals with the task of finding plagiarized sections in text documents without using a reference corpus. This paper describes a novel approach in this field by analyzing the grammar of authors and using sliding windows to find significant differences in writing styles. To find suspicious text passages, the algorithm splits a document into single sentences, calculates syntax grammar trees and builds profiles based on frequently used grammar patterns. The text is then traversed, where each window is compared to the document profile using a distance metric. Finally, all sentences that have a significantly higher distance according to a utilized Gaussian normal distribution are marked as suspicious. A preliminary evaluation of the algorithm shows very promising results.

Keywords

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Augsten, N., Böhlen, M., Gamper, J.: The pq-Gram Distance between Ordered Labeled Trees. ACM Transactions on Database Systems, TODS (2010)

    Google Scholar 

  2. Gottron, T.: External Plagiarism Detection Based on Standard IR Technology and Fast Recognition of Common Subsequences. In: CLEF (Notebook Papers/LABs/Workshops) (2010)

    Google Scholar 

  3. Joachims, T.: Text Categorization with Suport Vector Machines: Learning with Many Relevant Features. In: Proceedings of the 10th European Conference on Machine Learning, London, UK, pp. 137–142 (1998)

    Google Scholar 

  4. Joshi, A.K., Schabes, Y.: Tree-Adjoining Grammars. Handbook of Formal Languages 3, 69–124 (1997)

    Article  MathSciNet  Google Scholar 

  5. Karlgren, J.: Stylistic Experiments For Information Retrieval. PhD thesis, Swedish Institute for Computer Science (2000)

    Google Scholar 

  6. Kestemont, M., et al.: Intrinsic Plagiarism Detection Using Character Trigram Distance Scores. In: CLEF Labs and Worksh. Papers, Amsterdam, Netherlands (2011)

    Google Scholar 

  7. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proc. of the 41st Meeting on Comp. Linguistics, Stroudsburg, PA, USA, pp. 423–430 (2003)

    Google Scholar 

  8. Koppel, M., Schler, J.: Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In: IJCAI 2003 Workshop on Computational Approaches to Style Analysis and Synthesis, pp. 69–72 (2003)

    Google Scholar 

  9. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Comp. Linguistics 19, 313–330 (1993)

    Google Scholar 

  10. Oberreuter, G., et al.: Approaches for Intrinsic and External Plagiarism Detection. In: Notebook Papers of CLEF Labs and Workshops (2011)

    Google Scholar 

  11. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China (2010)

    Google Scholar 

  12. Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: CLEF (Notebook Papers/Labs/Workshop) (2009)

    Google Scholar 

  13. Stamatatos, E., Kokkinakis, G., Fakotakis, N.: Automatic text categorization in terms of genre and author. Comput. Linguist. 26, 471–495 (2000)

    Article  Google Scholar 

  14. Tschuggnall, M., Specht, G.: Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors. In: 15. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web, Magdeburg, Germany (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tschuggnall, M., Specht, G. (2013). Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2013. Lecture Notes in Computer Science, vol 7934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38824-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38824-8_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38823-1

  • Online ISBN: 978-3-642-38824-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics