Advertisement

Detecting Similarity in Multi-procedure Student Programs Using only Static Code Structure

  • Karen BradshawEmail author
  • Vongai Chindeka
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1136)

Abstract

Plagiarism is prevalent in most undergraduate programming courses, including those where more advanced programming is taught. Typical strategies used to avoid detection include changing variable names and adding empty spaces or comments to the code. Although these changes affect the visual components of the source code, the underlying structure of the code remains the same. This similarity in structure can indicate the presence of plagiarism.

A system has been developed to detect the similarity in the structure of student programs. The detection system works in two phases: The first phase parses the source code and creates a syntax tree, representing the syntactical structure of each of the programs, while the second takes as inputs two program syntax trees and applies various comparison algorithms to detect their similarity. The outcome of the comparison allows the system to report a result from one of four similarity categories: identical structure, isomorphic structure, containing many structural similarities, and containing few structural similarities. Empirical tests on small sample programs show that the prototype implementation is effective in detecting plagiarism in source code, although in some cases manual checking is needed to confirm the presence of plagiarism.

Keywords

Plagiarism detection Code structure Student code 

References

  1. 1.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston (1974)zbMATHGoogle Scholar
  2. 2.
    Bryant, D.: Building trees, hunting for trees, and comparing trees: theory and methods in phylogenetic analysis. Ph.D. thesis, University of Canterbury (1997)Google Scholar
  3. 3.
    Finden, R., Gordon, A.: Obtaining common pruned trees. J. Classification 2, 255–276 (1985)CrossRefGoogle Scholar
  4. 4.
    Itokawa, Y., Wada, M., Ishii, T., Uchida, T.: Tree pattern matching algorithm using a succinct data structure. Proc. Int. MultiConf. Eng. Comput. Sci. 1, 206–211 (2011)Google Scholar
  5. 5.
    Kao, M.Y., Lam, T.W., Sung, W.K., Ting, H.F.: An even faster and more unifying algorithm for comparing trees via unbalanced bipartite matchings. J. Algorithms 40(2), 212–233 (2001).  https://doi.org/10.1006/jagm.2001.1163MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Paris, M.: Source code and text plagiarism detection strategies. In: 4th Annual Conference of the LTSN Centre for Information and Computer Sciences, pp. 74–78. LTSN Centre for Information and Computer Sciences (2003)Google Scholar
  7. 7.
    Prechelt, L., Malpohl, G., Phillippsen, M.: JPlag: finding plagiarisms among a set of programs. Technical report, Karlsruhe Institute of Technology (2000)Google Scholar
  8. 8.
    Puflović, D., Gligorijević, M.F., Stoimenov, L.: CSPlag: a source code plagiarism detection using syntax trees and intermediate language. In: Proceedings of the 52nd International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST 2017), pp. 102–105 (2017)Google Scholar
  9. 9.
    Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85. SIGMOD 2003, ACM, New York, NY, USA (2003).  https://doi.org/10.1145/872757.872770
  10. 10.
    Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: Artificial Intelligence and Statistics, pp. 488–495 (2009)Google Scholar
  11. 11.
    Terry, P.: Compiling with C# and Java. Pearson Education, London (2005)Google Scholar
  12. 12.
    Wang, B., Swenson, K.M.: A faster algorithm for computing the kernel of maximum agreement subtrees. IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2019).  https://doi.org/10.1109/TCBB.2019.2922955
  13. 13.
    Whale, G.: Identification of program similarity in large populations. Comput. J. 33(2), 140–146 (1990).  https://doi.org/10.1093/comjnl/33.2.140CrossRefGoogle Scholar
  14. 14.
    Wilson, R.J., Watkins, J.J.: Graphs: an Introductory Approach: A First Course in Discrete Mathematics. John Wiley & Sons Inc, Hoboken (1990) zbMATHGoogle Scholar
  15. 15.
    Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. SIGCSE Bull. 28(1), 130–134 (1996).  https://doi.org/10.1145/236462.236525CrossRefGoogle Scholar
  16. 16.
    Zhang, F., Wu, D., Liu, P., Zhu, S.: Program logic based software plagiarism detection. In: IEEE 25th International Symposium on Software Reliability Engineering (ISSRE), pp. 66–77. IEEE (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceRhodes UniversityGrahamstownSouth Africa

Personalised recommendations