Identification of Plagiarism Using Syntactic and Semantic Filters

  • R. Vijay Sundar Ram
  • Efstathios Stamatatos
  • Sobha Lalitha Devi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8404)


We present a work on detection of manual paraphrasing in documents in comparison with a set of source documents. Manual paraphrasing is a realistic type of plagiarism, where the obfuscation is introduced manually in documents. We have used PAN-PC-10 data set to develop and evaluate our algorithm. The proposed approach consists of two steps, namely, identification of probable plagiarized passages using dice similarity measure and filtering the obtained passages using syntactic rules and lexical semantic features extracted from obfuscation patterns. The algorithm works at sentence level. The results are encouraging in difficult cases of plagiarism that most of the existing approaches fail to detect.


Manual paraphrasing Syntactic rules and Lexical Semantics Plagiarism detection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alzahrani, S., Salim, N.: Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection: Lab Report for PAN at CLEF 2010. In: Notebook Papers of Labs and Workshops CLEF 2010, Padua, Italy (2010)Google Scholar
  2. 2.
    Brill, E.: Some Advances in transformation Based Part of Speech Tagging. In: Proceedings of the Twelfth International Conference on Artificial Intelligence (AAAI 1994), Seattle, WA (1994)Google Scholar
  3. 3.
    Chong, M. and Specia. L.: Lexical Generalisation for Word-level Matching in Plagiarism Detection. In: Recent Advances in Natural Language Processing, pp 704–709, Hissar, Bulgaria, (2011) Google Scholar
  4. 4.
    Dice, L.R.: Measures of the Amount of Ecologic Association Between Species. Ecology 26(3), 297–302 (1945)CrossRefGoogle Scholar
  5. 5.
    Lalitha Devi, S., Ram, V.S., Rao, P.R.K.: Resolution of Pronominal Anaphors using Linear and Tree CRFs. In: 8th DAARC, Faro, Portugal (2011)Google Scholar
  6. 6.
    Aimmanee, P.: Automatic Plaiarism Detection Using Word-Sentence Based S-gram. Chiang Mai Journal of Science 38 (special issue), 1–7 (2011)Google Scholar
  7. 7.
    Palkovskii, Y., Belov, A., Muzyka, I.: Using WordNet-based Semantic Similarity Measurement in External Plagiarism Detection - Notebook for PAN at CLEF (2011)Google Scholar
  8. 8.
    Potthast, M., Hagen, M., Gollub, T., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection. In: Forner, P., Navigli, R., Tufis, D. (eds.), Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain, September 23-26 (2013)Google Scholar
  9. 9.
    Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.), CLEF 2012 Evaluation Labs and Workshop – Working Notes Papers (September 2012)Google Scholar
  10. 10.
    Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 11 Labs and Workshops (2011)Google Scholar
  11. 11.
    Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proc. of the 23rd Int. Conf. on Computational Linguistics, COLING 2010, Beijing, China, August 23-27, pp. 997–1005 (2010)Google Scholar
  12. 12.
    Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Braschler, M., Harman, D., Pianta, E. (eds.), Notebook Papers of CLEF 10 Labs and Workshops (September 2010)Google Scholar
  13. 13.
    Ngai, G., Florian, R.: Transformation-Based Learning in the Fast Lane. In: NAACL 2001, Pittsburgh, PA, pp. 40–47 (2001)Google Scholar
  14. 14.
    Stamatatos, E.: Plagiarism Detection Using Stopword n-grams. Journal of the American Society for Information Science and Technology 62(12), 2512–2527 (2011)CrossRefGoogle Scholar
  15. 15.
    Uzuner, O., Katz, B., Nahnsen, T.: Using Syntactic Information to Identify Plagiarism. In: 2nd Workshop on Building Educational Applications using NLP (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • R. Vijay Sundar Ram
    • 1
  • Efstathios Stamatatos
    • 2
  • Sobha Lalitha Devi
    • 1
  1. 1.AU-KBC Research CentreMIT Campus of Anna UniversityChennaiIndia
  2. 2.Dept. of Information and Communication Systems Eng.University of the AegeanKarlovassiGreece

Personalised recommendations