Measuring Similarity Between Data Structures for Detecting Plagiarized Source Codes

Lee, Kihwa; Kim, Yeoneo; Woo, Gyun

doi:10.1007/978-981-13-1799-6_36

Kihwa Lee⁴⁰,
Yeoneo Kim⁴⁰ &
Gyun Woo⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 520))

316 Accesses
1 Citations

Abstract

A program consists of data structures and algorithms. However, most studies, up to now, for detecting plagiarism of source codes are suggesting lopsided analyses considering only the algorithms (or instructions) of the source codes. This paper introduces a method for measuring the similarity between data structures for detecting plagiarized source codes. The proposed method was experimented with test sets including plagiarized source codes. The experimental result shows that the similarities among the data structures of plagiarized source codes are high degree as expected. This result implies that the similarity on data structures, along with the similarity on algorithms, is also one of the main factors to the decrease false alarms by lowering the threshold for the plagiarism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Roy, C.K., Cordy, J.R., Koschke, R.: Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci. Comput. Program. 74(7), 470–495 (2009)
Article MathSciNet Google Scholar
Bellon, S., Koschke, R., Antonio, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE Trans. Softw. Eng. 33(9), 577–591 (2007)
Article Google Scholar
Lee, Y., Lim, J., Ji, J., Cho, H., Woo, G.: Plagiarism detection among source codes using adaptive methods. Trans. Internet Inf. Syst. 6(6), 1627–1648 (2012)
Google Scholar
Daly, C., Horgan, J.: A technique for detecting plagiarism in computer code. Comput. J. 48(6), 662–666 (2005)
Article Google Scholar
Ji, J., Woo, G., Cho, H.: A source code linearization technique for detecting plagiarized programs. In: ACM SIGCSE Bulletin, vol. 39, no. 3, pp. 73–77. ACM, New York (2007)
Article Google Scholar
Ji, J. Woo, G., Park, S., Cho, H.: An intelligent system for detecting source code plagiarism using a probabilistic graph model. In: Machine Learning and Data Mining in Pattern Recognition Posters, pp. 55–69 (2007)
Google Scholar
Chilowicz, M., Duris, E., Rousscl, G.: Syntax tree fingerprinting for source code similarity detection, In: 17th IEEE International Conference on Program Comprehension, pp. 243–247. IEEE (2009)
Google Scholar
Ottenstein, K.J.: An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. 8(4), 30–41 (1976)
Article Google Scholar
Ji, J.: Program Similarity analysis framework using adaptive sequence alignment technique. Ph.D. thesis, Pusan National University (2010)
Google Scholar
Ducasse, S., Nierstrasz, O., Rieger, M.: On the effectiveness of clone detection by string matching. J Softw. Maintenance Evol. Res. Pract. 18(1), 37–58 (2006)
Article Google Scholar
Falke, R., Frenzel, P., Koschke, R.: Empirical evaluation of clone detection using syntax suffix trees. Empirical Softw. Eng. 13(6), 601–643 (2008)
Article Google Scholar
Son, J., Park, S., Park, S.: Program plagiarism detection using parse tree kernels, In: Pacific Rim International Conference on Artificial Intelligence 2006: Trends in Artificial Intelligence, pp. 1000–1004. Springer Berlin Heidelberg (2006)
Google Scholar
Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones, In: 29th international conference on software Engineering, pp. 96—105. IEEE Computer Society, Washington DC (2007)
Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Kuhn, H.W.: Variants of the Hungarian method for the assignment problem. Naval Res. Logistics Q. 3(4), 253–258 (1956)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by BK21PLUS, Creative Human Resource Development Program for IT Convergence.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Pusan National University, 30 Jangjeon-Dong, Geumjeong-Gu, Busan, 609-735, Republic of Korea
Kihwa Lee & Yeoneo Kim
Department of Electrical and Computer Engineering, Smart Control Center of LG Electronics, Pusan Nat’l University, Busan, 609-735, Republic of Korea
Gyun Woo

Authors

Kihwa Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yeoneo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Gyun Woo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gyun Woo .

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Geelong, VIC, Australia
Jemal H. Abawajy
Department of Communication Technology and Network, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia
Mohamed Othman
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
Rozaida Ghazali
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
Mustafa Mat Deris
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
Hairulnizam Mahdin
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, K., Kim, Y., Woo, G. (2019). Measuring Similarity Between Data Structures for Detecting Plagiarized Source Codes. In: Abawajy, J., Othman, M., Ghazali, R., Deris, M., Mahdin, H., Herawan, T. (eds) Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015) . Lecture Notes in Electrical Engineering, vol 520. Springer, Singapore. https://doi.org/10.1007/978-981-13-1799-6_36

Download citation

DOI: https://doi.org/10.1007/978-981-13-1799-6_36
Published: 10 August 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1797-2
Online ISBN: 978-981-13-1799-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics