Skip to main content

Fast Plagiarism Detection by Sentence Hashing

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7268))

Included in the following conference series:

Abstract

This work presents a Sentence Hashing Algorithm for Plagiarism Detection - SHAPD. To present a user with the best results the algorithm makes use of special trait of the written texts - their natural sentence fragmentation, later employing a set of special techniques for text representation. Results obtained demonstrate that the algorithm delivers solution faster than the alternatives. Its algorithmic complexity is logarithmic, thus its performance is better than most algorithms using dynamic programming used to find the longest common subsequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chvatal, V., Klarner, D.A., Knuth, D.E.: Selected Combinatorial Research Problems. Technical Report, Stanford University, Stanford, CA, USA (1972)

    Google Scholar 

  2. Szymanski, T.G.: A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Laboratory, Princeton University (1975)

    Google Scholar 

  3. Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20 (1980)

    Google Scholar 

  4. Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantically Enhanced Intellectual Property Protection System - SEIPro2S. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 449–459. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Ceglarek, D., Haniewicz, K., Rutkowski, W.: Semantic Compression for Specialised Information Retrieval Systems. In: Nguyen, N.T., Katarzyniak, R., Chen, S.-M. (eds.) Advances in Intelligent Information and Database Systems. SCI, vol. 283, pp. 111–121. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Ceglarek, D., Haniewicz, K., Rutkowski, W.: Quality of Semantic Compression in Classification. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010. LNCS, vol. 6421, pp. 162–171. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Irving, R.: Plagiarism and collusion detection using the Smith-Waterman algorithm. Technical Report TR-2004-164, University of Glasgow, Computing Science Departament Research Report (2004)

    Google Scholar 

  8. Yeates, S.: Automatic Extraction of Acronym from Text. In: Proceedings of the Third New Zealand Computer Science Research Students Conference. University of Waikato, New Zealand (1999)

    Google Scholar 

  9. Alonso, L., et al.: Approaches to text summarization: Questions and answers. Inteligentia Artificial. Revista Iberoamericana de Inteligencia Artificial (20), 34–52 (2003)

    Google Scholar 

  10. Burrows, S., Tahaghoghi, S.M.M., Zobel, J.: Efficient plagiarism detection for large code repositories. Softw. Pract. Exper. 37, 151–175 (2007)

    Article  Google Scholar 

  11. Ota, T., Masuyama, S.: Automatic plagiarism detection among term papers. In: Proceedings of the 3rd International Universal Communication Symposium, pp. 395–399. ACM, New York (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ceglarek, D., Haniewicz, K. (2012). Fast Plagiarism Detection by Sentence Hashing. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29350-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29349-8

  • Online ISBN: 978-3-642-29350-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics