Cross-Language Plagiarism Detection Using a Multilingual Semantic Network

  • Marc Franco-Salvador
  • Parth Gupta
  • Paolo Rosso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)


Cross-language plagiarism refers to the type of plagiarism where the source and suspicious documents are in different languages. Plagiarism detection across languages is still in its infancy state. In this article, we propose a new graph-based approach that uses a multilingual semantic network to compare document paragraphs in different languages. In order to investigate the proposed approach, we used the German-English and Spanish-English cross-language plagiarism cases of the PAN-PC’11 corpus. We compare the obtained results with two state-of-the-art models. Experimental results indicate that our graph-based approach is a good alternative for cross-language plagiarism detection.


Machine Translation Context Model Statistical Machine Translation Knowledge Graph Plagiarism Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barrón-Cedeño, A.: On the mono- and cross-language detection of text re-use and plagiarism. Ph.D. thesis, Universitat Politènica de València (2012)Google Scholar
  2. 2.
    Barrón-Cedeño, A., Rosso, P., Pinto, D., Juan, A.: On cross-lingual plagiarism analysis using a statistical model. In: Proceedings of the ECAI 2008 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 (2008)Google Scholar
  3. 3.
    Havasi, C.: Conceptnet 3: A flexible, multilingual semantic network for common sense knowledge. In: The 22nd Conference on Artificial Intelligence (2007)Google Scholar
  4. 4.
    Mcnamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Inf. Retr. 7(1-2), 73–97 (2004)CrossRefGoogle Scholar
  5. 5.
    Montes-y-Gómez, M., Gelbukh, A., López-López, A., Baeza-Yates, R.: Flexible Comparison of Conceptual GraphsWork done under partial support of CONACyT, CGEPI-IPN, and SNI, Mexico. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 102–111. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Navigli, R., Ponzetto, S.P.: Babelnet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 216–225 (2010)Google Scholar
  7. 7.
    Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Language Resources and Evaluation, Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Google Scholar
  8. 8.
    Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: CLEF (Notebook Papers/Labs/Workshop) (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marc Franco-Salvador
    • 1
  • Parth Gupta
    • 1
  • Paolo Rosso
    • 1
  1. 1.Natural Language Engineering Lab. - ELiRF, DSICUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations