Skip to main content

Measuring Similarity Between Data Structures for Detecting Plagiarized Source Codes

  • Conference paper
  • First Online:
Book cover Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 520))

Abstract

A program consists of data structures and algorithms. However, most studies, up to now, for detecting plagiarism of source codes are suggesting lopsided analyses considering only the algorithms (or instructions) of the source codes. This paper introduces a method for measuring the similarity between data structures for detecting plagiarized source codes. The proposed method was experimented with test sets including plagiarized source codes. The experimental result shows that the similarities among the data structures of plagiarized source codes are high degree as expected. This result implies that the similarity on data structures, along with the similarity on algorithms, is also one of the main factors to the decrease false alarms by lowering the threshold for the plagiarism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Roy, C.K., Cordy, J.R., Koschke, R.: Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci. Comput. Program. 74(7), 470–495 (2009)

    Article  MathSciNet  Google Scholar 

  2. Bellon, S., Koschke, R., Antonio, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE Trans. Softw. Eng. 33(9), 577–591 (2007)

    Article  Google Scholar 

  3. Lee, Y., Lim, J., Ji, J., Cho, H., Woo, G.: Plagiarism detection among source codes using adaptive methods. Trans. Internet Inf. Syst. 6(6), 1627–1648 (2012)

    Google Scholar 

  4. Daly, C., Horgan, J.: A technique for detecting plagiarism in computer code. Comput. J. 48(6), 662–666 (2005)

    Article  Google Scholar 

  5. Ji, J., Woo, G., Cho, H.: A source code linearization technique for detecting plagiarized programs. In: ACM SIGCSE Bulletin, vol. 39, no. 3, pp. 73–77. ACM, New York (2007)

    Article  Google Scholar 

  6. Ji, J. Woo, G., Park, S., Cho, H.: An intelligent system for detecting source code plagiarism using a probabilistic graph model. In: Machine Learning and Data Mining in Pattern Recognition Posters, pp. 55–69 (2007)

    Google Scholar 

  7. Chilowicz, M., Duris, E., Rousscl, G.: Syntax tree fingerprinting for source code similarity detection, In: 17th IEEE International Conference on Program Comprehension, pp. 243–247. IEEE (2009)

    Google Scholar 

  8. Ottenstein, K.J.: An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. 8(4), 30–41 (1976)

    Article  Google Scholar 

  9. Ji, J.: Program Similarity analysis framework using adaptive sequence alignment technique. Ph.D. thesis, Pusan National University (2010)

    Google Scholar 

  10. Ducasse, S., Nierstrasz, O., Rieger, M.: On the effectiveness of clone detection by string matching. J Softw. Maintenance Evol. Res. Pract. 18(1), 37–58 (2006)

    Article  Google Scholar 

  11. Falke, R., Frenzel, P., Koschke, R.: Empirical evaluation of clone detection using syntax suffix trees. Empirical Softw. Eng. 13(6), 601–643 (2008)

    Article  Google Scholar 

  12. Son, J., Park, S., Park, S.: Program plagiarism detection using parse tree kernels, In: Pacific Rim International Conference on Artificial Intelligence 2006: Trends in Artificial Intelligence, pp. 1000–1004. Springer Berlin Heidelberg (2006)

    Google Scholar 

  13. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones, In: 29th international conference on software Engineering, pp. 96—105. IEEE Computer Society, Washington DC (2007)

    Google Scholar 

  14. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  15. Kuhn, H.W.: Variants of the Hungarian method for the assignment problem. Naval Res. Logistics Q. 3(4), 253–258 (1956)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by BK21PLUS, Creative Human Resource Development Program for IT Convergence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gyun Woo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, K., Kim, Y., Woo, G. (2019). Measuring Similarity Between Data Structures for Detecting Plagiarized Source Codes. In: Abawajy, J., Othman, M., Ghazali, R., Deris, M., Mahdin, H., Herawan, T. (eds) Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015) . Lecture Notes in Electrical Engineering, vol 520. Springer, Singapore. https://doi.org/10.1007/978-981-13-1799-6_36

Download citation

Publish with us

Policies and ethics