Skip to main content
Log in

Searching for Translated Plagiarism with the Help of Desktop Grids

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Translated or cross-lingual plagiarism is defined as the translation of someone else’s work or words without marking it as such or without giving credit to the original author. The existence of cross-lingual plagiarism is not new, but only in recent years, due to the rapid development of the natural language processing, appeared the first algorithms which tackled the difficult task of detecting it. Most of these algorithms utilize machine translation to compare texts written in different languages. We propose a different method, which can effectively detect translations between language-pairs where machine translations still produce low quality results. Our new algorithm presented in this paper is based on information retrieval (IR) and a dictionary based similarity metric. The preprocessing of the candidate documents for the IR is computationally intensive, but easily parallelizable. We propose a desktop Grid solution for this task. As the application is time sensitive and the desktop Grid peers are unreliable, a resubmission mechanism is used which assures that all jobs of a batch finish within a reasonable time period without dramatically increasing the load on the whole system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. KOPI Online Plagiarism Search Portal. http://kopi.sztaki.hu/index.php?language=eng. Accessed 8 Apr 2012

  2. Huang, R.: Plagiarism and the Web. In: A Comparison of Internet Sources for Secondary and Higher Education Students. http://pages.turnitin.com/rs/iparadigms/images/Turnitin_WhitePaper_SourcesSECvsHE.pdf. Accessed 6 Apr 2012

  3. Size of Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia. Accessed 8 Apr 2012

  4. SZTAKI Desktop Grid. http://szdg.lpds.sztaki.hu/szdg/. Accessed 8 Apr 2012

  5. Anderson, D.P.: BOINC: a system for public-resource computing and storage. In: 5th IEEE/ACM International Workshop on Grid Computing, 8 November 2004

  6. BOINC Statistics. http://boincstats.com. Accessed 8 Apr 2012

  7. Kacsuk, P., et al.: SZTAKI Desktop Grid, (SZDG): a flexible and scalable desktop Grid system. J. Grid Computing 7, 439–461 (2009)

    Article  Google Scholar 

  8. Kacsuk, P., Farkas, Z., Fedak, G.: Towards making BOINC and EGEE interoperable. In: IEEE Fourth International Conference on eScience. eScience ’08, pp. 478–484 (2008)

  9. The BOINC application programming interface. http://boinc.berkeley.edu/trac/wiki/BasicApi. Accessed 8 Apr 2012

  10. BME MOKK: Hunspell stemmer. http://hunspell.sourceforge.net/. Accessed 8 Apr 2012

  11. Cavnar, W.B., Trenkle, J.M.: N-Gram-based text categorization. In: Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175. UNLV Publications/Reprographics, Las Vegas (1994)

  12. Rehurek, R., Kolkus, M.: Language identification on the web: extending the dictionary method. In: 10th International Conference on Intelligent Text Processing and Computational Linguistics (2009)

  13. Pataki, M., Vajna, M.: Detecting the language of document written in multiple languages (“Többnyelvű dokumentum nyelvének megállapítása”). VIII. In: Hungarian Computer Linguistics Conference (MSZNY 2011) (2011)

  14. Marosi, A.C., Balaton, Z., Kacsuk, P.G.W.: A generic wrapper for running legacy applications on desktop Grids. In: IEEE Parallel and Distributed Processing Symposium, International, pp. 1–6 (2009)

  15. Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Proceedings of the Euro-Par’05 Conference, pp. 432–441 (2003)

  16. Nawab, R., Stevenson, M., Clough, P.: External plagiarism detection using information retrieval and sequence alignment. Notebook for PAN at CLEF (2011)

  17. Grman, J., Ravas, R.: Improved implementation for finding text similarities in large collections of data. Notebook for PAN at CLEF (2011)

  18. Sameer, R., et al.: External & intrinsic plagiarism detection VSM & discourse markers based approach. Notebook for PAN at CLEF (2011)

  19. Grozea, C., Popescu, M.: The Encoplot Similarity Measure for Automatic Detection of Plagiarism. http://brainsignals.de/encsimTR.pdf. Accessed 6 Apr 2012

  20. Kasprzak, J., Brandejs, M.: Mproving the reliability of the plagiarism detection system. Lab Report for PAN at CLEF (2010)

  21. Pataki, M.: A new approach for searching translated plagiarism. In: Proceedings of the 5th International Plagiarism Conference. Newcastle, UK (2012)

  22. A Copy of All Pages from All Wikipedia Wikis, in HTML Form. http://dumps.wikimedia.org/other/static_html_dumps/. Accessed 6 Apr 2012

  23. Wikipedia text dumps. http://kopiwiki.dsd.sztaki.hu/. Accessed 8 Apr 2012

  24. Wikipedia Static HTML Dumps. http://dumps.wikimedia.org/other/static_html_dumps/. Accessed 8 Apr 2012

  25. WikiTaxi. http://www.wikitaxi.org/delphi/doku.php/products/wikitaxi/index. Accessed 8 Apr 2012

  26. WP2TXT: Wikipedia to text converter. http://wp2txt.rubyforge.org/. Accessed 8 Apr 2012

  27. Potthast, M., et al.: Overview of the 2nd International Competition on Plagiarism Detection. http://www.clef2010.org/resources/proceedings/clef2010labs_submission_125.pdf. Accessed 8 Apr 2012

  28. Ceska, Z., Toman, M., Jezek, K.: Multilingual plagiarism detection. In: Proceedings of the 13th International Conference on Artificial Intelligence, pp. 83–92. Springer, Heidelberg (2008)

    Google Scholar 

  29. Vossen, P.: EuroWordNet: a multilingual database for information retrieval. In: Proceedings of the DELOS workshop on Cross-language Information Retrieval (1997)

  30. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)

    MATH  Google Scholar 

  31. Kondo, D., et al.: Scheduling task parallel applications for rapid application turnaround on enterprise desktop Grids. J. Grid Comput. 5(4), 379–405 (2007)

    Article  Google Scholar 

  32. Heien, E.M., Fujimoto, N., Hagihara, K.: Computing low latency batches with unreliable workers in volunteer computing environments. In: Parallel and Distributed Processing. IPDPS 2008. IEEE International Symposium, 14–18 April 2008. doi:10.1109/IPDPS.2008.4536442

  33. Lázaro, D., Kondo, D., Marquès, J.M.: Long-term availability prediction for groups of volunteer resources. J. Parallel Distributed Comput. 72(2), 281–296 (2012)

    Article  Google Scholar 

  34. Bouguerra, S., Kondo, D., Trystam, D.: On the scheduling of checkpoints on desktop Grids. In: 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011) (2011)

  35. Reynolds, C., et al.: Scientific workflow makespan reduction through cloud augmented desktop Grids. In: 3rd IEEE International Conference on Cloud Computing technology and Science, CloudCom 2011. Athens, Greece (2011). doi:10.1109/CloudCom.2011.13

  36. Delamare, S., et al.: SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures. In: International Symposium on High Performance Distributed Computing (HPDC’2012). Delft, Nederlands (2012)

  37. Marosi, A.C., Kacsuk, P.: Workers in the clouds. In: Parallel 2011 19th Euromicro International Conference on Distributed and Network-based Processing (PDP), pp. 519–526 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Attila Csaba Marosi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pataki, M., Marosi, A.C. Searching for Translated Plagiarism with the Help of Desktop Grids. J Grid Computing 11, 149–166 (2013). https://doi.org/10.1007/s10723-012-9224-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-012-9224-5

Keywords

Navigation