Abstract
Identifying similarities in source code is the main challenge for reuse, plagiarism, and code clone detection. Code clone search has emerged as a new research branch in clone detection, aiming to provide similarity search functionality for code snippets. While clone search shares its fundamentals with clone detection, both its objective and requirements differ significantly. Clone search focuses on search engines that are designed to find clones of a single input code snippet (i.e., query) from a large set of code snippets (i.e., corpus). Scalability, short response time, and the ability to rank result sets among the major challenges have to be dealt with by a clone search engine. In this chapter, we identify and define major concepts related to clone search. We then present a framework that summarizes the architecture of a clone search engine and enables us to provide a systematic view of the internals of such an engine. Finally, we discuss how to benchmark and evaluate the performance of clone search engines. The discussion includes a set of measures that are helpful in evaluating clone search engines.
Iman Keivanloo—this work was done prior to joining Amazon.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
H.J. Webber, New horticultural and agricultural terms. Science 18(459), 501–503 (1903)
L. Barbour, H. Yuan, Y. Zou, A technique for just-in-time clone detection in large scale systems, in International Conference on Program Comprehension (2010)
I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, in Working Conference on Reverse Engineering (2011)
I. Keivanloo, Source code similarity and clone search. Ph.D. thesis, Concordia University (2013)
A. Walenstein, A. Lakhotia, Clone detector evaluation can be improved: ideas from information retrieval, in International Workshop on Detection of Software Clones (2003)
C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, 2008)
J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, Towards a big data curated benchmark of InterProject code clones, in 30th International Conference on Software Maintenance and Evolution (2014)
K.J. Ottenstein, An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. (1976)
S. Grier, A tool that detects plagiarism in Pascal programs, in SIGCSE Technical Symposium on Computer Science Education (1981)
P.S. Abrams, J.W. Myrna, Automatic control of execution: an overview, in International Conference on APL (1979)
J. Jacobsen, An automated management system for applications software, in ACM SIGUCCS Conference on User Services (1984)
P.J. Caudill, A. Wirfs-Brock, A third generation Smalltalk-80 implementation, in Conference on Object-Oriented Programming Systems, Languages and Applications (1986)
A.S. Tanenbaum, A UNIX clone with source code for operating systems courses. ACM SIGOPS Operating Systems Review (1987)
M.I. Kellner, Ten years of software maintenance: progress or promises?, in Conference on Software Maintenance (1993)
J.V. Lombardi, Computer Literacy: The Basic Concepts and Language (Indiana University Press, 1983)
S. Carter, R.J. Frank, D.S.W. Tansley, Clone detection in telecommunications software systems: a neural net approach, in International Workshop on Applications of Neural Networks to Telecommunications (1993)
T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. (2002)
M.W. Lee, J.W. Roh, S.W. Hwang, S. Kim, Instant code clone search, in International Symposium on Foundations of Software Engineering (2010)
V. Balachandran, Reducing accidental clones using instant clone search in automatic code review, in IEEE International Conference on Software Maintenance and Evolution (2020)
S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, H. Iida, SHINOBI: a tool for automatic code clone detection in the IDE, in Working Conference on Reverse Engineering (2009)
S. Bazrafshan, R. Koschke, N. Gode, Approximate code search in program histories, in Working Conference on Reverse Engineering (2011)
I. Keivanloo, C.K. Roy, J. Rilling, SeByte: scalable clone and similarity search for bytecode. Sci. Comput. Program. 426–444 (2014)
B. Hummel, E. Juergens, L. Heinemann, M. Conradt, Index-based code clone detection: incremental, distributed, scalable, in International Conference on Software Maintenance (2010)
M.F. Zibran, C.K. Roy, IDE-based real-time focused search for near-miss clones, in ACM Symposium on Applied Computing (2012)
I. Keivanloo, J. Rilling, P. Charland, SeClone-a hybrid approach to internet-scale real-time code clone search, in International Conference on Program Comprehension (2011)
C. Ragkhitwetsagul, J. Krinke, Siamese: scalable and incremental code clone search via multiple code representations. Empir. Softw. Eng. (2019)
I. Keivanloo, J. Rilling, Y. Zou, Spotting working code examples, in 36th International Conference on Software Engineering ICSE (2014)
I. Keivanloo, J. Rilling, P. Charland, Threshold-free code clone detection for a large-scale heterogeneous Java repository, in IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Keivanloo, I., Rilling, J. (2021). Source Code Clone Search. In: Inoue, K., Roy, C.K. (eds) Code Clone Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-16-1927-4_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-1927-4_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1926-7
Online ISBN: 978-981-16-1927-4
eBook Packages: Computer ScienceComputer Science (R0)