Skip to main content

A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection

  • Conference paper
Information Access Evaluation. Multilinguality, Multimodality, and Visualization (CLEF 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8138))

Abstract

The present paper introduces the first corpus for the evaluation of Arabic intrinsic plagiarism detection. The corpus consists of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Springer Policy on Publishing Integrity. Guidelines for Journal Editors

    Google Scholar 

  2. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st International Competition on Plagiarism Detection. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 1–9 (2009)

    Google Scholar 

  3. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Huang, C.-R., Jurafsky, D. (eds.) Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005. ACL (2010)

    Google Scholar 

  4. Potthast, M., Barrón-cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops (2010)

    Google Scholar 

  5. Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)

    Google Scholar 

  6. Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop –Working Notes Papers (2012)

    Google Scholar 

  7. Juola, P.: An Overview of the Traditional Authorship Attribution Subtask Notebook for PAN at CLEF 2012. In: Forner, P., Karlgren, J., and Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop –Working Notes Papers (2012)

    Google Scholar 

  8. Yakout, M.M.: Examples of Plagiarism in Scientific and Cultural Communities (in Arabic), http://www.yaqout.net/ba7s_4.html

  9. Abbasi, A., Chen, H.: Applying Authorship Analysis to Arabic Web Content. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 183–197. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Shaker, K., Corne, D.: Authorship Attribution in Arabic using a hybrid of evolutionary search and linear discriminant analysis. In: 2010 UK Workshop on Computational Intelligence (UKCI), pp. 1–6. IEEE (2010)

    Google Scholar 

  11. Ouamour, S., Sayoud, H.: Authorship attribution of ancient texts written by ten arabic travelers using a SMO-SVM classifier. In: 2012 International Conference on Communications and Information Technology (ICCIT), pp. 44–47. IEEE (2012)

    Google Scholar 

  12. Bensalem, I., Rosso, P., Chikhi, S.: Intrinsic Plagiarism Detection in Arabic Text: Preliminary Experiments. In: Berlanga, R., Rosso, P. (eds.) 2nd Spanish Conference on Information Retrieval (CERI 2012), Valencia (2012)

    Google Scholar 

  13. Jadalla, A., Elnagar, A.: A Plagiarism Detection System for Arabic Text-Based Documents. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI 2012. LNCS, vol. 7299, pp. 145–153. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Alzahrani, S., Salim, N.: Statement-Based Fuzzy-Set Information Retrieval versus Fingerprints Matching for Plagiarism Detection in Arabic Documents. In: 5th Postgraduate Annual Research Seminar (PARS 2009), Johor Bahru, Malaysia, pp. 267–268 (2009)

    Google Scholar 

  15. Menai, M.E.B.: Detection of Plagiarism in Arabic Documents. International Journal of Information Technology and Computer Science 10, 80–89 (2012)

    Article  Google Scholar 

  16. Jaoua, M., Jaoua, F.K., Hadrich Belguith, L., Ben Hamadou, A.: Automatic Detection of Plagiarism in Arabic Documents Based on Lexical Chains. Arab Computer Society Journal 4, 1–11 (2011) (in Arabic)

    Google Scholar 

  17. Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing Interaction Logs to Understand Text Reuse from the Web. In: 51st Annual Meeting of the Association of Computational Linguistics (ACL 2013). ACM (to appear, 2013)

    Google Scholar 

  18. Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Language Resources and Evaluation 45, 63–82 (2010)

    Article  Google Scholar 

  19. Bensalem, I., Rosso, P., Chikhi, S.: Building Arabic Corpora from Wikisource. In: 10th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2013). IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bensalem, I., Rosso, P., Chikhi, S. (2013). A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40802-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40801-4

  • Online ISBN: 978-3-642-40802-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics