Skip to main content

Plagiarism Detection in Software Using Efficient String Matching

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7336))

Abstract

String matching refers to the problem of finding occurrence(s) of a pattern string within another string or body of a text. It plays a vital role in plagiarism detection in software codes, where it is required to identify similar program in a large populations. String matching has been used as a tool in a software metrics, which is used to measure the quality of software development process. In the recent years, many algorithms exist for solving the string matching problem. Among them, Berry–Ravindran algorithm was found to be fairly efficient. Further refinement of this algorithm is made in TVSBS and SSABS algorithms. However, these algorithms do not give the best possible shift in the search phase. In this paper, we propose an algorithm which gives the best possible shift in the search phase and is faster than the previously known algorithms. This algorithm behaves like Berry-Ravindran in the worst case. Further extension of this algorithm has been made for parameterized string matching which is able to detect plagiarism in a software code.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Roy, C.K., Cordy, J.R.: A survey on software clone detection research, Technical Report (2007)

    Google Scholar 

  2. Whale, G.: Software Metrics and Plagiarism Detection. Journal of System Software 13, 131–138 (1990)

    Article  Google Scholar 

  3. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communication of ACM 20, 762–772 (1977)

    Article  MATH  Google Scholar 

  4. Amir, A., Navarro, G.: Parameterized Matching on Non-linear Structures. Information Processing Letters (IPL) 109(15), 864–867 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Sunday, D.M.: A very fast substring search algorithm. Communication of ACM 33, 132–142 (1990)

    Article  Google Scholar 

  6. Berry, T., Ravindran, S.: A fast string matching algorithm and experimental results. In: Holub, J., Simánek, M. (eds.) Proceedings of the Prague Stringology Club Workshop 1999, Collaborative Report DC-99-05, Czech Technical University, Prague, Czech Republic, pp. 16–26 (2001)

    Google Scholar 

  7. Baker, B.S.: A program for identifying duplicated code. In: Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, vol. 24, pp. 49–57 (1992)

    Google Scholar 

  8. Sheik, S.S., Aggarwal, S.K., Poddar, A., Balakrishnan, N., Sekar, K.: A FAST pattern matching algorithm. J. Chem. Inf. Comput. Sci. 44, 1251–1256 (2004)

    Article  Google Scholar 

  9. Horspool, R.N.: Practical fast searching in strings. Software – Practice and Experience 10, 501–506 (1980)

    Article  Google Scholar 

  10. Raita, T.: Tuning the Boyer–Moore–Horspool string-searching algorithm. Software – Practice Experience 22, 879–884 (1992)

    Article  Google Scholar 

  11. Abrahamson, K.: Generalized String Matching. SIAM Journal on Computing 16, 1039–1051 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  12. Charras, C., Lecroq, T.: Handbook of Exact String matching algorithms, http://www-igm.univ-mlv.fr/~lecroq/string/

  13. Thathoo, R., Virmani, A., Laxmi, S.S., Balakrishnan, N., Sekar, K.: TVSBS: A fast exact pattern matching algorithm for biological sequences. Current Science 91(1) (2006)

    Google Scholar 

  14. Baker, B.S.: Parameterized pattern matching by Boyer-Moore type algorithms. In: Proceedings of the 6th ACM-SIAM Annual Symposium on Discrete Algorithms, San Francisco, CA, pp. 541–550 (1995)

    Google Scholar 

  15. Salmela, L., Tarhio, J.: Fast Parameterized Matching with q-Grams. Journal of Discrete Algorithms 6(3), 408–419 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Prasad, R., Agarwal, S., Misra, S.: Parameterized String Matching Algorithms with Application to Molecular Biology. Nigerian Journal of Technological Research, 5 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pandey, K.L., Agarwal, S., Misra, S., Prasad, R. (2012). Plagiarism Detection in Software Using Efficient String Matching. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31128-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31128-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31127-7

  • Online ISBN: 978-3-642-31128-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics