Abstract
This work presents a Sentence Hashing Algorithm for Plagiarism Detection - SHAPD. To present a user with the best results the algorithm makes use of special trait of the written texts - their natural sentence fragmentation, later employing a set of special techniques for text representation. Results obtained demonstrate that the algorithm delivers solution faster than the alternatives. Its algorithmic complexity is logarithmic, thus its performance is better than most algorithms using dynamic programming used to find the longest common subsequence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected Combinatorial Research Problems. Technical Report, Stanford University, Stanford, CA, USA (1972)
Szymanski, T.G.: A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Laboratory, Princeton University (1975)
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20 (1980)
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantically Enhanced Intellectual Property Protection System - SEIPro2S. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 449–459. Springer, Heidelberg (2009)
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantic Compression for Specialised Information Retrieval Systems. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 111–121. Springer, Heidelberg (2010)
Ceglarek, D., Haniewicz, K., Rutkowski, W.: Quality of Semantic Compression in Classification. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010. LNCS, vol. 6421, pp. 162–171. Springer, Heidelberg (2010)
Irving, R.: Plagiarism and collusion detection using the Smith-Waterman algorithm. Technical Report TR-2004-164, University of Glasgow, Computing Science Departament Research Report (2004)
Yeates, S.: Automatic Extraction of Acronym from Text. In: Proceedings of the Third New Zealand Computer Science Research Students Conference. University of Waikato, New Zealand (1999)
Alonso, L., et al.: Approaches to text summarization: Questions and answers. Inteligentia Artificial. Revista Iberoamericana de Inteligencia Artificial (20), 34–52 (2003)
Burrows, S., Tahaghoghi, S.M.M., Zobel, J.: Efficient plagiarism detection for large code repositories. Softw. Pract. Exper. 37, 151–175 (2007)
Ota, T., Masuyama, S.: Automatic plagiarism detection among term papers. In: Proceedings of the 3rd International Universal Communication Symposium, pp. 395–399. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ceglarek, D., Haniewicz, K. (2012). Fast Plagiarism Detection by Sentence Hashing. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-29350-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29349-8
Online ISBN: 978-3-642-29350-4
eBook Packages: Computer ScienceComputer Science (R0)