Skip to main content

Retrieving Candidate Plagiarised Documents Using Query Expansion

  • Conference paper
Advances in Information Retrieval (ECIR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

Abstract

External plagiarism detection systems compare suspicious texts against a reference collection to identify the original one(s). The suspicious text may not contain a verbatim copy of the reference collection since plagiarists often try to disguise their behaviour by altering the text. For large reference collections, such as those accessible via the internet, it is not practical to compare the suspicious text with every document in the reference collection. Consequently many approaches to plagiarism detection begin by identifying a set of candidate documents from the reference collection. We report an IR-based approach to the candidate document selection problem that uses query expansion to identify candidates which have been altered. The reported system outperforms a previously reported approach and is also robust to changes in the reference collection text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barrón-Cedeño, A., Rosso, P., Benedí, J.: Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 523–534. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Boisvert, R., Irwin, M.: Plagiarism on the rise. Communications of the ACM 49(6), 23–24 (2006)

    Article  Google Scholar 

  3. Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 196–205. ACM (2008)

    Google Scholar 

  4. Campbell, C.: Writing with other’s words: Using background reading text in academic compositions. In: Kroll, B. (ed.) Second Language Writing: Research Insights for the Classroom, pp. 211–230. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  5. Ceska, Z.: Plagiarism Detection Based on Singular Value Decomposition. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 108–119. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Chen, C., Yeh, J., Ke, H.: Plagiarism Detection using ROUGE and WordNet. Journal of Computing 2(3), 34–44 (2010)

    Google Scholar 

  7. Chong, M., Specia, L., Mitkov, R.: Using Natural Language Processing for Automatic Detection of Plagiarism. In: Proceedings of the 4th International Plagiarism Conference (IPC 2010), Newcastle, UK (2010)

    Google Scholar 

  8. Clough, P., Stevenson, M.: Developing A Corpus of Plagiarised Short Answers. In: Language Resources and Evaluation: Special Issue on Plagiarism and Authorship Analysis. Springer, Heidelberg (2010)

    Google Scholar 

  9. Efthimiadis, E.: Query expansion. Annual Review of Information Systems and Technology (ARIST) 31, 121–187 (1996)

    Google Scholar 

  10. Fox, E.A., Shaw, J.A.: Combination of Multiple Searches. In: Harman, D.K. (ed.) Proceedings TREC-2, pp. 243–249 (1994)

    Google Scholar 

  11. Johns, A., Myers, P.: An analysis of summary protocols of university ESL students. Applied Linguistics 11, 253–271 (1990)

    Article  Google Scholar 

  12. Judge, G.: Plagiarism: Bringing Economics and Education Together (With a Little Help from IT). Computers in Higher Education Economics Review 20(1), 21–26 (2008)

    Google Scholar 

  13. Keck, C.: The use of paraphrase in summary writing: A comparison of l1 and l2 writers. Journal of Second Language Writing 15, 261–278 (2006)

    Article  Google Scholar 

  14. Lane, P., Lyon, C., Malcolm, J.: Demonstration of the Ferret plagiarism detector. In: Proceedings of the 2nd International Plagiarism Conference (2006)

    Google Scholar 

  15. Martin, B.: Plagiarism: a misplaced emphasis. Journal of Information Ethics 3(2), 36–47 (1994)

    Google Scholar 

  16. Maurer, H., Kappe, F., Zaka, B.: Plagiarism - A Survey. Journal of Universal Computer Science 12(8), 1050–1084 (2006)

    Google Scholar 

  17. McCabe, D.: Research report of the center for academic integrity (2005), http://www.academicintegrity.org

  18. McCabe, D., Butterfield, K., Trevino, L.: Academic Dishonesty in Graduate Business Programs: Prevalence, Causes, and Proposed Action. Academy of Management Learning and Education 5(3), 1–294 (2006)

    Article  Google Scholar 

  19. Meyer zu Eissen, S., Stein, B., Kulig, M.: Plagiarism detection without reference collections. In: Advances in Data Analysis, pp. 359–366. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Mozgovoy, M., Kakkonen, T., Sutinen, E.: Using Natural Language Parsers in Plagiarism Detection. In: Proceedings of SLaTE 2007 Workshop, Pennsylvania, USA (2007)

    Google Scholar 

  21. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier Information Retrieval Platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  22. Park, C.: In other (people’s) words: plagiarism by university students – literature and lessons. Assessment and Evaluation in Higher Education 28(5) (2003)

    Google Scholar 

  23. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005 (2010)

    Google Scholar 

  24. Potthast, M., Stein, B., Eiselt, A., Cedeño, A., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Proceedings of the CLEF 2010 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Padua, Italy (2010)

    Google Scholar 

  25. Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323 (1971)

    Google Scholar 

  26. Shivakumar, N., Garcia-Molina, H.: SCAM: A Copy Detection Mechanism for Digital Documents. In: Proceedings of the 2nd Annual Conference on the Theory and Practice of Digital Libraries, Texas, USA (1995)

    Google Scholar 

  27. Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E.: 3rd PAN Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse. In: 25th Annual Conference of the Spanish Society for Natural Language Processing (SEPLN), pp. 1–77 (2009)

    Google Scholar 

  28. Uzuner, O., Katz, B., Nahnsen, T.: Using syntactic information to identify plagiarism. In: Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, pp. 37–44. Association for Computational Linguistics (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nawab, R.M.A., Stevenson, M., Clough, P. (2012). Retrieving Candidate Plagiarised Documents Using Query Expansion. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28997-2_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28996-5

  • Online ISBN: 978-3-642-28997-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics