Fast Plagiarism Detection by Sentence Hashing

Ceglarek, Dariusz; Haniewicz, Konstanty

doi:10.1007/978-3-642-29350-4_4

Dariusz Ceglarek²³ &
Konstanty Haniewicz²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7268))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1807 Accesses
6 Citations

Abstract

This work presents a Sentence Hashing Algorithm for Plagiarism Detection - SHAPD. To present a user with the best results the algorithm makes use of special trait of the written texts - their natural sentence fragmentation, later employing a set of special techniques for text representation. Results obtained demonstrate that the algorithm delivers solution faster than the alternatives. Its algorithmic complexity is logarithmic, thus its performance is better than most algorithms using dynamic programming used to find the longest common subsequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected Combinatorial Research Problems. Technical Report, Stanford University, Stanford, CA, USA (1972)
Google Scholar
Szymanski, T.G.: A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Laboratory, Princeton University (1975)
Google Scholar
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20 (1980)
Google Scholar
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantically Enhanced Intellectual Property Protection System - SEIPro2S. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 449–459. Springer, Heidelberg (2009)
Chapter Google Scholar
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantic Compression for Specialised Information Retrieval Systems. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 111–121. Springer, Heidelberg (2010)
Chapter Google Scholar
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Quality of Semantic Compression in Classification. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010. LNCS, vol. 6421, pp. 162–171. Springer, Heidelberg (2010)
Chapter Google Scholar
Irving, R.: Plagiarism and collusion detection using the Smith-Waterman algorithm. Technical Report TR-2004-164, University of Glasgow, Computing Science Departament Research Report (2004)
Google Scholar
Yeates, S.: Automatic Extraction of Acronym from Text. In: Proceedings of the Third New Zealand Computer Science Research Students Conference. University of Waikato, New Zealand (1999)
Google Scholar
Alonso, L., et al.: Approaches to text summarization: Questions and answers. Inteligentia Artificial. Revista Iberoamericana de Inteligencia Artificial (20), 34–52 (2003)
Google Scholar
Burrows, S., Tahaghoghi, S.M.M., Zobel, J.: Efficient plagiarism detection for large code repositories. Softw. Pract. Exper. 37, 151–175 (2007)
Article Google Scholar
Ota, T., Masuyama, S.: Automatic plagiarism detection among term papers. In: Proceedings of the 3rd International Universal Communication Symposium, pp. 395–399. ACM, New York (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Poznan School of Banking, Poland
Dariusz Ceglarek
Poznan University of Economics, Poland
Konstanty Haniewicz

Authors

Dariusz Ceglarek
View author publications
You can also search for this author in PubMed Google Scholar
Konstanty Haniewicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski , Marcin Korytkowski & Rafał Scherer , &
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ceglarek, D., Haniewicz, K. (2012). Fast Plagiarism Detection by Sentence Hashing. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-29350-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29349-8
Online ISBN: 978-3-642-29350-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics