Abstract
String matching refers to the problem of finding occurrence(s) of a pattern string within another string or body of a text. It plays a vital role in plagiarism detection in software codes, where it is required to identify similar program in a large populations. String matching has been used as a tool in a software metrics, which is used to measure the quality of software development process. In the recent years, many algorithms exist for solving the string matching problem. Among them, Berry–Ravindran algorithm was found to be fairly efficient. Further refinement of this algorithm is made in TVSBS and SSABS algorithms. However, these algorithms do not give the best possible shift in the search phase. In this paper, we propose an algorithm which gives the best possible shift in the search phase and is faster than the previously known algorithms. This algorithm behaves like Berry-Ravindran in the worst case. Further extension of this algorithm has been made for parameterized string matching which is able to detect plagiarism in a software code.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Roy, C.K., Cordy, J.R.: A survey on software clone detection research, Technical Report (2007)
Whale, G.: Software Metrics and Plagiarism Detection. Journal of System Software 13, 131–138 (1990)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communication of ACM 20, 762–772 (1977)
Amir, A., Navarro, G.: Parameterized Matching on Non-linear Structures. Information Processing Letters (IPL) 109(15), 864–867 (2009)
Sunday, D.M.: A very fast substring search algorithm. Communication of ACM 33, 132–142 (1990)
Berry, T., Ravindran, S.: A fast string matching algorithm and experimental results. In: Holub, J., Simánek, M. (eds.) Proceedings of the Prague Stringology Club Workshop 1999, Collaborative Report DC-99-05, Czech Technical University, Prague, Czech Republic, pp. 16–26 (2001)
Baker, B.S.: A program for identifying duplicated code. In: Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, vol. 24, pp. 49–57 (1992)
Sheik, S.S., Aggarwal, S.K., Poddar, A., Balakrishnan, N., Sekar, K.: A FAST pattern matching algorithm. J. Chem. Inf. Comput. Sci. 44, 1251–1256 (2004)
Horspool, R.N.: Practical fast searching in strings. Software – Practice and Experience 10, 501–506 (1980)
Raita, T.: Tuning the Boyer–Moore–Horspool string-searching algorithm. Software – Practice Experience 22, 879–884 (1992)
Abrahamson, K.: Generalized String Matching. SIAM Journal on Computing 16, 1039–1051 (1987)
Charras, C., Lecroq, T.: Handbook of Exact String matching algorithms, http://www-igm.univ-mlv.fr/~lecroq/string/
Thathoo, R., Virmani, A., Laxmi, S.S., Balakrishnan, N., Sekar, K.: TVSBS: A fast exact pattern matching algorithm for biological sequences. Current Science 91(1) (2006)
Baker, B.S.: Parameterized pattern matching by Boyer-Moore type algorithms. In: Proceedings of the 6th ACM-SIAM Annual Symposium on Discrete Algorithms, San Francisco, CA, pp. 541–550 (1995)
Salmela, L., Tarhio, J.: Fast Parameterized Matching with q-Grams. Journal of Discrete Algorithms 6(3), 408–419 (2008)
Prasad, R., Agarwal, S., Misra, S.: Parameterized String Matching Algorithms with Application to Molecular Biology. Nigerian Journal of Technological Research, 5 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pandey, K.L., Agarwal, S., Misra, S., Prasad, R. (2012). Plagiarism Detection in Software Using Efficient String Matching. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31128-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-31128-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31127-7
Online ISBN: 978-3-642-31128-4
eBook Packages: Computer ScienceComputer Science (R0)