Raters’ reliability in clone benchmarks construction

Charpentier, Alan; Falleri, Jean-Rémy; Morandat, Floréal; Ben Hadj Yahia, Elyas; Réveillère, Laurent

doi:10.1007/s10664-015-9419-z

Raters’ reliability in clone benchmarks construction

Published: 09 February 2016

Volume 22, pages 235–258, (2017)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Alan Charpentier¹,
Jean-Rémy Falleri¹,
Floréal Morandat¹,
Elyas Ben Hadj Yahia¹ &
…
Laurent Réveillère¹

544 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Cloned code often complicates code maintenance and evolution and must therefore be effectively detected. One of the biggest challenges for clone detectors is to reduce the amount of irrelevant clones they found, called false positives. Several benchmarks of true and false positive clones have been introduced, enabling tool developers to compare, assess and fine-tune their tools. Manual inspection of clone candidates is performed by raters that do not have expertise on the underlying code. This way of building benchmarks might be unreliable when considering context-dependent clones i.e., clones valid for a specific purpose. Our goal is to investigate the reliability of rater judgments about context-dependent clones. We randomly select about 600 clones from two projects and ask several raters, including experts of the projects, to manually classify these clones. We observe that judgments of non expert raters are not always repeatable. We also observe that they seldomly agree with each others and with the expert. Finally, we find that the project and the fact that a clone is a true or false positive might have an influence on the agreement between the expert and non experts. Therefore, using non experts to produce clone benchmarks could be unreliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Challenges of Low-Code/No-Code Software Development: A Literature Review

How different are different diff algorithms in Git?

Article Open access 11 September 2019

AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems

Article Open access 14 November 2022

Notes

These studies mentioned no information about any raters’ expertise of the analyzed code.
http://www.labri.fr/perso/acharpen/ese15/materials.zip
https://github.com/allr/fastr/tree/v0.168
https://github.com/jrfaller/gumtree/tree/v1.0.0

References

Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the Second Working Conference on Reverse Engineering, WCRE ’95, pp 86–. IEEE Computer Society, Washington, DC, USA
Baxter I, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: International conference on software maintenance, 1998. Proceedings, pp 368–377. doi:10.1109/ICSM.1998.738528
Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33 (9):577–591. doi:10.1109/TSE.2007.70725
Article Google Scholar
Bissyandé TF, Thung F, Wang S, Lo D, Jiang L, Réveillère L (2013) Empirical Evaluation of Bug Linking. In: Proceedings of the 17th european conference on software maintenance and reengineering (CSMR 2013), pp 1–10, Genova, Italy
Charpentier A, Falleri JR, Lo D, Réveillère L (2015) An empirical assessment of bellon’s clone benchmark. In: Proceedings of the 19th international conference on evaluation and assessment in software engineering, EASE ’15, pp 20:1–20:10. ACM, New York, NY, USA. doi:10.1145/2745802.2745821
Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: IEEE international conference on software maintenance, 1999. (ICSM ’99) Proceedings, pp 109–118. doi:10.1109/ICSM.1999.792593
Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the international conference on automated software engineering, pp –. Sweden
Faust D, Verhoef C (2003) Software product line migration and deployment. Software Practice and Experience, vol 33. Wiley, pp 933–955
Gode N, Koschke R (2009) Incremental clone detection. In: 13th european conference on software maintenance and reengineering, 2009. CSMR ’09, pp 219–228. doi:10.1109/CSMR.2009.20
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314
Google Scholar
Jiang L, Misherghi G, Su Z (2007) Deckard: Scalable and accurate tree-based detection of code clones. In: ICSE, pp 96–105
Kalibera T, Maj P, Morandat F, Vitek J (2014) A fast abstract syntax tree interpreter for R. In: 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’14, Salt Lake City, UT, USA, March 01 - 02, 2014, pp 89–102. doi:10.1145/2576195.2576205
Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670. doi:10.1109/TSE.2002.1019480
Article Google Scholar
Kapser C, Anderson P, Godfrey M, Koschke R, Rieger M, van Rysselberghe F, Weis̈gerber P (2006) Subjectivity in clone judgment: Can we ever agree?. In: Duplication, Redundancy, and Similarity in Software, no. 06301 in Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany
Kapser C, Godfrey M (2006) cloning considered harmful considered harmful. In: 13th working conference on reverse engineering, 2006. WCRE ’06, pp 19–28. doi:10.1109/WCRE.2006.1
Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: Proceedings of the 13th working conference on reverse engineering, WCRE ’06, pp 253–262. IEEE computer society, Washington, DC, USA. doi:10.1109/WCRE.2006.18
Krutz DE, Le W (2014) A code clone oracle. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 388–391. ACM, New York, NY, USA. doi:10.1145/2597073.2597127
Lague B, Proulx D, Mayrand J, Merlo EM, Hudepohl J (1997) Assessing the benefits of incorporating function clone detection in a development process. In: Proceedings of the international conference on software maintenance, ICSM ’97, pp 314–. IEEE computer society, Washington, DC, USA. http://dl.acm.org/citation.cfm?id=645545.853273
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. biometrics:159–174
Li Z, Lu S, Myagmar S, Zhou Y (2006) Cp-miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192. doi:10.1109/TSE.2006.28
Article Google Scholar
Martinetz T, Schulten K (1991) A Neural-Gas Network Learns Topologies. In: Artificial Neural Networks, vol. I, pp 397–402
Mende T, Koschke R, Beckwermert F (2009) An evaluation of code similarity identification for the grow-and-prune model. J Softw Maint Evol 21(2):143–169
Article Google Scholar
Nguyen HA, Nguyen TT, Pham N, Al-Kofahi J, Nguyen T (2012) Clone management for evolving software. IEEE Trans Softw Eng 38(5):1008–1026. doi:10.1109/TSE.2011.90
Article Google Scholar
Roy C, Cordy J (2008) Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: The 16th IEEE international conference on program comprehension, 2008. ICPC 2008, pp 172–181. doi:10.1109/ICPC.2008.41
Roy C, Cordy J (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: International conference on software testing, verification and validation workshops, 2009. ICSTW ’09, pp 157–166. doi:10.1109/ICSTW.2009.18
Selim G, Foo K, Zou Y (2010) Enhancing source-based clone detection using intermediate representation. In: 2010 17th Working Conference on Reverse Engineering (WCRE), pp 227–236. doi:10.1109/WCRE.2010.33
Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM (2014) Towards a big data curated benchmark of inter-project code clones. ICSME:5
Walenstein A, Jyoti N, Li J, Yang Y, Lakhotia A (2003) Problems creating task-relevant clone detection reference data. In: 10th working conference on reverse engineering, 2003. WCRE 2003. Proceedings, pp 285–294. doi:10.1109/WCRE.2003.1287259
Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: A rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013, pp 455–465. ACM, New York, NY, USA. doi:10.1145/2491411.2491420
Yang J, Hotta K, Higo Y, Igaki H, Kusumoto S (2014) Classification model for code clones based on machine learning. Empir Softw Eng:1–31

Download references

Author information

Authors and Affiliations

LaBRI, UMR 5800, University of Bordeaux, F-33400, Talence, France
Alan Charpentier, Jean-Rémy Falleri, Floréal Morandat, Elyas Ben Hadj Yahia & Laurent Réveillère

Authors

Alan Charpentier
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Rémy Falleri
View author publications
You can also search for this author in PubMed Google Scholar
Floréal Morandat
View author publications
You can also search for this author in PubMed Google Scholar
Elyas Ben Hadj Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Réveillère
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan Charpentier.

Additional information

Communicated by: Audris Mockus

Rights and permissions

Reprints and permissions

About this article

Cite this article

Charpentier, A., Falleri, JR., Morandat, F. et al. Raters’ reliability in clone benchmarks construction. Empir Software Eng 22, 235–258 (2017). https://doi.org/10.1007/s10664-015-9419-z

Download citation

Published: 09 February 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10664-015-9419-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Raters’ reliability in clone benchmarks construction

Abstract

Access this article

Similar content being viewed by others

Challenges of Low-Code/No-Code Software Development: A Literature Review

How different are different diff algorithms in Git?

AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Raters’ reliability in clone benchmarks construction

Abstract

Access this article

Similar content being viewed by others

Challenges of Low-Code/No-Code Software Development: A Literature Review

How different are different diff algorithms in Git?

AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation