Detecting Similarity in Multi-procedure Student Programs Using only Static Code Structure
Plagiarism is prevalent in most undergraduate programming courses, including those where more advanced programming is taught. Typical strategies used to avoid detection include changing variable names and adding empty spaces or comments to the code. Although these changes affect the visual components of the source code, the underlying structure of the code remains the same. This similarity in structure can indicate the presence of plagiarism.
A system has been developed to detect the similarity in the structure of student programs. The detection system works in two phases: The first phase parses the source code and creates a syntax tree, representing the syntactical structure of each of the programs, while the second takes as inputs two program syntax trees and applies various comparison algorithms to detect their similarity. The outcome of the comparison allows the system to report a result from one of four similarity categories: identical structure, isomorphic structure, containing many structural similarities, and containing few structural similarities. Empirical tests on small sample programs show that the prototype implementation is effective in detecting plagiarism in source code, although in some cases manual checking is needed to confirm the presence of plagiarism.
KeywordsPlagiarism detection Code structure Student code
- 2.Bryant, D.: Building trees, hunting for trees, and comparing trees: theory and methods in phylogenetic analysis. Ph.D. thesis, University of Canterbury (1997)Google Scholar
- 4.Itokawa, Y., Wada, M., Ishii, T., Uchida, T.: Tree pattern matching algorithm using a succinct data structure. Proc. Int. MultiConf. Eng. Comput. Sci. 1, 206–211 (2011)Google Scholar
- 6.Paris, M.: Source code and text plagiarism detection strategies. In: 4th Annual Conference of the LTSN Centre for Information and Computer Sciences, pp. 74–78. LTSN Centre for Information and Computer Sciences (2003)Google Scholar
- 7.Prechelt, L., Malpohl, G., Phillippsen, M.: JPlag: finding plagiarisms among a set of programs. Technical report, Karlsruhe Institute of Technology (2000)Google Scholar
- 8.Puflović, D., Gligorijević, M.F., Stoimenov, L.: CSPlag: a source code plagiarism detection using syntax trees and intermediate language. In: Proceedings of the 52nd International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST 2017), pp. 102–105 (2017)Google Scholar
- 9.Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85. SIGMOD 2003, ACM, New York, NY, USA (2003). https://doi.org/10.1145/872757.872770
- 10.Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: Artificial Intelligence and Statistics, pp. 488–495 (2009)Google Scholar
- 11.Terry, P.: Compiling with C# and Java. Pearson Education, London (2005)Google Scholar
- 12.Wang, B., Swenson, K.M.: A faster algorithm for computing the kernel of maximum agreement subtrees. IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2019). https://doi.org/10.1109/TCBB.2019.2922955
- 16.Zhang, F., Wu, D., Liu, P., Zhu, S.: Program logic based software plagiarism detection. In: IEEE 25th International Symposium on Software Reliability Engineering (ISSRE), pp. 66–77. IEEE (2014)Google Scholar