Using Sentence Similarity Measure for Plagiarism Detection of Arabic Documents

  • Wafa Wali
  • Bilel Gargouri
  • Abdelmajid Ben Hamadou
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)


Plagiarism detection it is a challenging task, particularly in natural language texts. Some plagiarism detection tools have been developed for diverse natural languages, especially English. In this paper, we propose, a new plagiarism detection system devoted to Arabic text documents. This system is based on an algorithm that uses a semantic sentence similarity measure. Indeed, the sentence similarity measure aggregates in a linear function between three components: the lexical-based LS including the common words, the semantic-based SS using the synonymy relationships, and the syntactico-semantic- based SSS semantic arguments properties notably semantic argument and thematic role. It measures the semantic similarity between words that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the LMF Arabic standardized dictionary ElMadar. The performance of the proposed system was confirmed through experiments with student thesis reports that promising capabilities in identifying literal and some types of intelligent plagiarism. We also demonstrate its advantages over other plagiarism detection tools, including Aplag.


Plagiarism Sentence similarity Arabic language Lexical Markup Framework Semantic information Syntactico-semantic information 


  1. 1.
    Abdi, A., Idris, N., Alguliyev, R.M., Aliguliyev, R.M.: PDLK: plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42(22), 8936–8946 (2015)CrossRefGoogle Scholar
  2. 2.
    Riad, A.M., Farahat, A.S., Zaher, M.A.: Studying different methods for plagiarism detection. Int. J. Comput. Sci. Eng. (IJCSE) 2(5), 147–154 (2013)Google Scholar
  3. 3.
    Alzahrani, S.M., Salim, N., Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 133–149 (2012)CrossRefGoogle Scholar
  4. 4.
    Barrón-Cedeño, A., Vila, M., Martí, M.A., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)CrossRefGoogle Scholar
  5. 5.
    Bensalem, I., Rosso, P., Chikhi, S.: Intrinsic plagiarism detection using n-gram classes. In: EMNLP, pp. 1459–1464 (2014)Google Scholar
  6. 6.
    Darwish, K., Magdy, W. et al.: Arabic information retrieval. Found. Trends® Inf. Retr. 7(4), 239–342 (2014)Google Scholar
  7. 7.
    Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)Google Scholar
  8. 8.
    Franco-Salvador, M., Rosso, P., Montes-y Gómez, M.: A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52(4), 550–570 (2016)CrossRefGoogle Scholar
  9. 9.
    Green, S., Manning, C.D.: Better Arabic parsing: baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 394–402. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  10. 10.
    Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, Paris (1901)Google Scholar
  11. 11.
    Jadalla, A., Elnagar, A.: A plagiarism detection system for Arabic text-based documents, pp. 145–153. Springer, Heidelberg (2012)Google Scholar
  12. 12.
    Khan, I.H., Siddiqui, M.A., Mansoor, K.: A framework for plagiarism detection in Arabic documents. Comput. Sci. Inf. Technol. 01–09 (2015)Google Scholar
  13. 13.
    Khemakhem, A., Gargouri, A., Hamadou, A.B., Francopoulou, G.: ISO standard modeling of a large Arabic dictionary. Nat. Lang. Eng. 22, 849–879 (2016)CrossRefGoogle Scholar
  14. 14.
    Menai, M.E.B.: Detection of plagiarism in Arabic documents. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 4(10), 80 (2012)Google Scholar
  15. 15.
    Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)Google Scholar
  16. 16.
    Velásquez, J.D., Covacevich, Y., Molina, F., Marrese-Taylor, E., Rodríguez, C., Bravo-Marquez, F.: Docode 3.0 (document copy detector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fusion 27, 64–75 (2016)CrossRefGoogle Scholar
  17. 17.
    Wali, W., Gargouri, B., Hamadou, A.B.: Supervised learning to measure the semantic similarity between Arabic sentences. In: Computational Collective Intelligence, pp. 158–167. Springer, Cham (2015)Google Scholar
  18. 18.
    Wali, W., Gargouri, B., Hamadou, A.B.: Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge. Vietnam J. Comput. Sci. 4(1), 51–60 (2017)CrossRefGoogle Scholar
  19. 19.
    Wali, W., Gargouri, B., Hamadou, A.B.: Using standardized lexical semantic knowledge to measure similarity. In: Knowledge Science, Engineering and Management, pp. 93–104. Springer, Cham (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Wafa Wali
    • 1
  • Bilel Gargouri
    • 1
  • Abdelmajid Ben Hamadou
    • 1
  1. 1.MIRACL LaboratorySfax UniversitySfaxTunisia

Personalised recommendations