Skip to main content

Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text

  • Conference paper
Advances in Artificial Intelligence (MICAI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6437))

Included in the following conference series:

Abstract

Traditionally, External Plagiarism Detection has been carried out by determining and measuring the similar sections between a given pair of documents, known as source and suspicious documents. One of the main difficulties of this task resides on the fact that not all similar text sections are examples of plagiarism, since thematic coincidences also tend to produce portions of common text. In order to face this problem in this paper we propose to represent the common (possibly reused) text by means of a set features that denote its relevance and fragmentation. This new representation, used in conjunction with supervised learning algorithms, provides more elements for the automatic detection of document plagiarism; in particular, our experimental results show that it clearly outperformed the accuracy results achieved by traditional n-gram based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barrón-Cedeño, A., Rosso, P.: On Automatic Plagiarism Detection Based on n-grams Comparison. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (ECIR), Berlin, Heidelberg (2009)

    Google Scholar 

  2. Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M.: A Plagiarism Detection Procedure in Three Steps: Selection, Matches and “Squares”. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009)

    Google Scholar 

  3. Clough, P.: Old and new challenges in automatic plagiarism detection. National Plagiarism Advisory Service 76 (2003)

    Google Scholar 

  4. Clough, P., Gaizauskas, R., Piao, S., Wilks, Y.: METER: Measuring Text Reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (2002)

    Google Scholar 

  5. Gaizauskas, R., Foster, J., Wilks, Y., Arundel, J., Clough, P., Piao, S.: The meter corpus: A corpus for analysing journalistic text reuse. In: Proceedings of the Corpus Linguistics 2001 Conference (2001)

    Google Scholar 

  6. Grozea, C., Gehl, C., Popescu, M.: ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009)

    Google Scholar 

  7. Kasprzak, J., Brandejs, M., Křipač, M.: Finding Plagiarism by Evaluating Document Similarities. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009)

    Google Scholar 

  8. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st International Competition on Plagiarism Detection. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009)

    Google Scholar 

  9. Sebastiani, F.: Machine learning in automated text categorization. ACM Comp. Surv. 34(1) (2002)

    Google Scholar 

  10. Witten, I.H., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2005)

    MATH  Google Scholar 

  11. Zechner, M., Muhr, M., Kern, R., Granitzer, M.: External and Intrinsic Plagiarism Detection using Vector Space Models. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sánchez-Vega, F., Villaseñor-Pineda, L., Montes-y-Gómez, M., Rosso, P. (2010). Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Artificial Intelligence. MICAI 2010. Lecture Notes in Computer Science(), vol 6437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16761-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16761-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16760-7

  • Online ISBN: 978-3-642-16761-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics