Skip to main content

Intrinsic Plagiarism Detection

  • Conference paper
Advances in Information Retrieval (ECIR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Included in the following conference series:

Abstract

Current research in the field of automatic plagiarism detection for text documents focuses on algorithms that compare plagiarized documents against potential original documents. Though these approaches perform well in identifying copied or even modified passages, they assume a closed world: a reference collection must be given against which a plagiarized document can be compared.

This raises the question whether plagiarized passages within a document can be detected automatically if no reference is given, e. g. if the plagiarized passages stem from a book that is not available in digital form. We call this problem class intrinsic plagiarism detection. The paper is devoted to this problem class; it shows that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style.

Our contributions are fourfold: (i) a taxonomy of plagiarism delicts along with detection methods, (ii) new features for the quantification of style aspects, (iii) a publicly available plagiarism corpus for benchmark comparisons, and (iv) promising results in non-trivial plagiarism detection settings: in our experiments we achieved recall values of 85% with a precision of 75% and better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Davis, J., Garcia-Molina, H.: Copy detection mechanisms for digital documents. In: Proc. SIGMOD 1995, pp. 398–409 (1995)

    Google Scholar 

  2. Garside, R., Leech, G., McEnery, A.: Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman (1997)

    Google Scholar 

  3. Hoad, T.C., Zobel, J.: Methods for Identifying Versioned and Plagiarised Documents. JASIST 54(3), 203–215 (2003)

    Article  Google Scholar 

  4. Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proc. of ICML (2004)

    Google Scholar 

  5. McCabe, D.: Research Report of the Center for Academic Integrity (2005), http://www.academicintegrity.org

  6. Meyer zu Eißen, S., Stein, B.: Genre Classification of Web Pages: User Study and Feasibility Analysis. In: Biundo, S., Frühwirth, T., Palm, G. (eds.) KI 2004. LNCS (LNAI), vol. 3238, pp. 256–269. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Stein, B.: Fuzzy-Fingerprints for Text-based Information Retrieval. In: Proc. of 5th Int. Conf. on Knowledge Management, Graz, Austria. JUCS (2005)

    Google Scholar 

  8. Stein, B., Meyer, S.: Near similarity search and plagiarism analysis. In: Proc. of GfKl 2005. Springer, Heidelberg (2005)

    Google Scholar 

  9. University of Leipzig. Wortschatz (1995), http://wortschatz.uni-leipzig.de

  10. Wikipedia. Plagiarism (2005), http://en.wikipedia.org/wiki/Plagiarism

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eissen, S.M.z., Stein, B. (2006). Intrinsic Plagiarism Detection. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_66

Download citation

  • DOI: https://doi.org/10.1007/11735106_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33347-0

  • Online ISBN: 978-3-540-33348-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics