Skip to main content

Source Code Clone Search

  • Chapter
  • First Online:
Code Clone Analysis

Abstract

Identifying similarities in source code is the main challenge for reuse, plagiarism, and code clone detection. Code clone search has emerged as a new research branch in clone detection, aiming to provide similarity search functionality for code snippets. While clone search shares its fundamentals with clone detection, both its objective and requirements differ significantly. Clone search focuses on search engines that are designed to find clones of a single input code snippet (i.e., query) from a large set of code snippets (i.e., corpus). Scalability, short response time, and the ability to rank result sets among the major challenges have to be dealt with by a clone search engine. In this chapter, we identify and define major concepts related to clone search. We then present a framework that summarizes the architecture of a clone search engine and enables us to provide a systematic view of the internals of such an engine. Finally, we discuss how to benchmark and evaluate the performance of clone search engines. The discussion includes a set of measures that are helpful in evaluating clone search engines.

Iman Keivanloo—this work was done prior to joining Amazon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. H.J. Webber, New horticultural and agricultural terms. Science 18(459), 501–503 (1903)

    Article  Google Scholar 

  2. L. Barbour, H. Yuan, Y. Zou, A technique for just-in-time clone detection in large scale systems, in International Conference on Program Comprehension (2010)

    Google Scholar 

  3. I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, in Working Conference on Reverse Engineering (2011)

    Google Scholar 

  4. I. Keivanloo, Source code similarity and clone search. Ph.D. thesis, Concordia University (2013)

    Google Scholar 

  5. A. Walenstein, A. Lakhotia, Clone detector evaluation can be improved: ideas from information retrieval, in International Workshop on Detection of Software Clones (2003)

    Google Scholar 

  6. C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, 2008)

    Google Scholar 

  7. J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, Towards a big data curated benchmark of InterProject code clones, in 30th International Conference on Software Maintenance and Evolution (2014)

    Google Scholar 

  8. K.J. Ottenstein, An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. (1976)

    Google Scholar 

  9. S. Grier, A tool that detects plagiarism in Pascal programs, in SIGCSE Technical Symposium on Computer Science Education (1981)

    Google Scholar 

  10. P.S. Abrams, J.W. Myrna, Automatic control of execution: an overview, in International Conference on APL (1979)

    Google Scholar 

  11. J. Jacobsen, An automated management system for applications software, in ACM SIGUCCS Conference on User Services (1984)

    Google Scholar 

  12. P.J. Caudill, A. Wirfs-Brock, A third generation Smalltalk-80 implementation, in Conference on Object-Oriented Programming Systems, Languages and Applications (1986)

    Google Scholar 

  13. A.S. Tanenbaum, A UNIX clone with source code for operating systems courses. ACM SIGOPS Operating Systems Review (1987)

    Google Scholar 

  14. M.I. Kellner, Ten years of software maintenance: progress or promises?, in Conference on Software Maintenance (1993)

    Google Scholar 

  15. J.V. Lombardi, Computer Literacy: The Basic Concepts and Language (Indiana University Press, 1983)

    Google Scholar 

  16. S. Carter, R.J. Frank, D.S.W. Tansley, Clone detection in telecommunications software systems: a neural net approach, in International Workshop on Applications of Neural Networks to Telecommunications (1993)

    Google Scholar 

  17. T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. (2002)

    Google Scholar 

  18. M.W. Lee, J.W. Roh, S.W. Hwang, S. Kim, Instant code clone search, in International Symposium on Foundations of Software Engineering (2010)

    Google Scholar 

  19. V. Balachandran, Reducing accidental clones using instant clone search in automatic code review, in IEEE International Conference on Software Maintenance and Evolution (2020)

    Google Scholar 

  20. S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, H. Iida, SHINOBI: a tool for automatic code clone detection in the IDE, in Working Conference on Reverse Engineering (2009)

    Google Scholar 

  21. S. Bazrafshan, R. Koschke, N. Gode, Approximate code search in program histories, in Working Conference on Reverse Engineering (2011)

    Google Scholar 

  22. I. Keivanloo, C.K. Roy, J. Rilling, SeByte: scalable clone and similarity search for bytecode. Sci. Comput. Program. 426–444 (2014)

    Google Scholar 

  23. B. Hummel, E. Juergens, L. Heinemann, M. Conradt, Index-based code clone detection: incremental, distributed, scalable, in International Conference on Software Maintenance (2010)

    Google Scholar 

  24. M.F. Zibran, C.K. Roy, IDE-based real-time focused search for near-miss clones, in ACM Symposium on Applied Computing (2012)

    Google Scholar 

  25. I. Keivanloo, J. Rilling, P. Charland, SeClone-a hybrid approach to internet-scale real-time code clone search, in International Conference on Program Comprehension (2011)

    Google Scholar 

  26. C. Ragkhitwetsagul, J. Krinke, Siamese: scalable and incremental code clone search via multiple code representations. Empir. Softw. Eng. (2019)

    Google Scholar 

  27. I. Keivanloo, J. Rilling, Y. Zou, Spotting working code examples, in 36th International Conference on Software Engineering ICSE (2014)

    Google Scholar 

  28. I. Keivanloo, J. Rilling, P. Charland, Threshold-free code clone detection for a large-scale heterogeneous Java repository, in IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iman Keivanloo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Keivanloo, I., Rilling, J. (2021). Source Code Clone Search. In: Inoue, K., Roy, C.K. (eds) Code Clone Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-16-1927-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1927-4_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1926-7

  • Online ISBN: 978-981-16-1927-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics