Source Code Clone Search

Keivanloo, Iman; Rilling, Juergen

doi:10.1007/978-981-16-1927-4_9

Iman Keivanloo³ &
Juergen Rilling⁴

649 Accesses
1 Citations

Abstract

Identifying similarities in source code is the main challenge for reuse, plagiarism, and code clone detection. Code clone search has emerged as a new research branch in clone detection, aiming to provide similarity search functionality for code snippets. While clone search shares its fundamentals with clone detection, both its objective and requirements differ significantly. Clone search focuses on search engines that are designed to find clones of a single input code snippet (i.e., query) from a large set of code snippets (i.e., corpus). Scalability, short response time, and the ability to rank result sets among the major challenges have to be dealt with by a clone search engine. In this chapter, we identify and define major concepts related to clone search. We then present a framework that summarizes the architecture of a clone search engine and enables us to provide a systematic view of the internals of such an engine. Finally, we discuss how to benchmark and evaluate the performance of clone search engines. The discussion includes a set of measures that are helpful in evaluating clone search engines.

Iman Keivanloo—this work was done prior to joining Amazon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

H.J. Webber, New horticultural and agricultural terms. Science 18(459), 501–503 (1903)
Article Google Scholar
L. Barbour, H. Yuan, Y. Zou, A technique for just-in-time clone detection in large scale systems, in International Conference on Program Comprehension (2010)
Google Scholar
I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, in Working Conference on Reverse Engineering (2011)
Google Scholar
I. Keivanloo, Source code similarity and clone search. Ph.D. thesis, Concordia University (2013)
Google Scholar
A. Walenstein, A. Lakhotia, Clone detector evaluation can be improved: ideas from information retrieval, in International Workshop on Detection of Software Clones (2003)
Google Scholar
C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, 2008)
Google Scholar
J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, Towards a big data curated benchmark of InterProject code clones, in 30th International Conference on Software Maintenance and Evolution (2014)
Google Scholar
K.J. Ottenstein, An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. (1976)
Google Scholar
S. Grier, A tool that detects plagiarism in Pascal programs, in SIGCSE Technical Symposium on Computer Science Education (1981)
Google Scholar
P.S. Abrams, J.W. Myrna, Automatic control of execution: an overview, in International Conference on APL (1979)
Google Scholar
J. Jacobsen, An automated management system for applications software, in ACM SIGUCCS Conference on User Services (1984)
Google Scholar
P.J. Caudill, A. Wirfs-Brock, A third generation Smalltalk-80 implementation, in Conference on Object-Oriented Programming Systems, Languages and Applications (1986)
Google Scholar
A.S. Tanenbaum, A UNIX clone with source code for operating systems courses. ACM SIGOPS Operating Systems Review (1987)
Google Scholar
M.I. Kellner, Ten years of software maintenance: progress or promises?, in Conference on Software Maintenance (1993)
Google Scholar
J.V. Lombardi, Computer Literacy: The Basic Concepts and Language (Indiana University Press, 1983)
Google Scholar
S. Carter, R.J. Frank, D.S.W. Tansley, Clone detection in telecommunications software systems: a neural net approach, in International Workshop on Applications of Neural Networks to Telecommunications (1993)
Google Scholar
T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. (2002)
Google Scholar
M.W. Lee, J.W. Roh, S.W. Hwang, S. Kim, Instant code clone search, in International Symposium on Foundations of Software Engineering (2010)
Google Scholar
V. Balachandran, Reducing accidental clones using instant clone search in automatic code review, in IEEE International Conference on Software Maintenance and Evolution (2020)
Google Scholar
S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, H. Iida, SHINOBI: a tool for automatic code clone detection in the IDE, in Working Conference on Reverse Engineering (2009)
Google Scholar
S. Bazrafshan, R. Koschke, N. Gode, Approximate code search in program histories, in Working Conference on Reverse Engineering (2011)
Google Scholar
I. Keivanloo, C.K. Roy, J. Rilling, SeByte: scalable clone and similarity search for bytecode. Sci. Comput. Program. 426–444 (2014)
Google Scholar
B. Hummel, E. Juergens, L. Heinemann, M. Conradt, Index-based code clone detection: incremental, distributed, scalable, in International Conference on Software Maintenance (2010)
Google Scholar
M.F. Zibran, C.K. Roy, IDE-based real-time focused search for near-miss clones, in ACM Symposium on Applied Computing (2012)
Google Scholar
I. Keivanloo, J. Rilling, P. Charland, SeClone-a hybrid approach to internet-scale real-time code clone search, in International Conference on Program Comprehension (2011)
Google Scholar
C. Ragkhitwetsagul, J. Krinke, Siamese: scalable and incremental code clone search via multiple code representations. Empir. Softw. Eng. (2019)
Google Scholar
I. Keivanloo, J. Rilling, Y. Zou, Spotting working code examples, in 36th International Conference on Software Engineering ICSE (2014)
Google Scholar
I. Keivanloo, J. Rilling, P. Charland, Threshold-free code clone detection for a large-scale heterogeneous Java repository, in IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Amazon, Seattle, USA
Iman Keivanloo
Concordia University, Montreal, Canada
Juergen Rilling

Authors

Iman Keivanloo
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Rilling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iman Keivanloo .

Editor information

Editors and Affiliations

Graduate School of Information Science and Technology, Osaka University, Suita, Osaka, Japan
Katsuro Inoue
Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
Chanchal K. Roy

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Keivanloo, I., Rilling, J. (2021). Source Code Clone Search. In: Inoue, K., Roy, C.K. (eds) Code Clone Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-16-1927-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-16-1927-4_9
Published: 04 August 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1926-7
Online ISBN: 978-981-16-1927-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics