Skip to main content
Log in

Raters’ reliability in clone benchmarks construction

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Cloned code often complicates code maintenance and evolution and must therefore be effectively detected. One of the biggest challenges for clone detectors is to reduce the amount of irrelevant clones they found, called false positives. Several benchmarks of true and false positive clones have been introduced, enabling tool developers to compare, assess and fine-tune their tools. Manual inspection of clone candidates is performed by raters that do not have expertise on the underlying code. This way of building benchmarks might be unreliable when considering context-dependent clones i.e., clones valid for a specific purpose. Our goal is to investigate the reliability of rater judgments about context-dependent clones. We randomly select about 600 clones from two projects and ask several raters, including experts of the projects, to manually classify these clones. We observe that judgments of non expert raters are not always repeatable. We also observe that they seldomly agree with each others and with the expert. Finally, we find that the project and the fact that a clone is a true or false positive might have an influence on the agreement between the expert and non experts. Therefore, using non experts to produce clone benchmarks could be unreliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. These studies mentioned no information about any raters’ expertise of the analyzed code.

  2. http://www.labri.fr/perso/acharpen/ese15/materials.zip

  3. https://github.com/allr/fastr/tree/v0.168

  4. https://github.com/jrfaller/gumtree/tree/v1.0.0

References

  • Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the Second Working Conference on Reverse Engineering, WCRE ’95, pp 86–. IEEE Computer Society, Washington, DC, USA

  • Baxter I, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: International conference on software maintenance, 1998. Proceedings, pp 368–377. doi:10.1109/ICSM.1998.738528

  • Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33 (9):577–591. doi:10.1109/TSE.2007.70725

    Article  Google Scholar 

  • Bissyandé TF, Thung F, Wang S, Lo D, Jiang L, Réveillère L (2013) Empirical Evaluation of Bug Linking. In: Proceedings of the 17th european conference on software maintenance and reengineering (CSMR 2013), pp 1–10, Genova, Italy

  • Charpentier A, Falleri JR, Lo D, Réveillère L (2015) An empirical assessment of bellon’s clone benchmark. In: Proceedings of the 19th international conference on evaluation and assessment in software engineering, EASE ’15, pp 20:1–20:10. ACM, New York, NY, USA. doi:10.1145/2745802.2745821

  • Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: IEEE international conference on software maintenance, 1999. (ICSM ’99) Proceedings, pp 109–118. doi:10.1109/ICSM.1999.792593

  • Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the international conference on automated software engineering, pp –. Sweden

  • Faust D, Verhoef C (2003) Software product line migration and deployment. Software Practice and Experience, vol 33. Wiley, pp 933–955

  • Gode N, Koschke R (2009) Incremental clone detection. In: 13th european conference on software maintenance and reengineering, 2009. CSMR ’09, pp 219–228. doi:10.1109/CSMR.2009.20

  • Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314

    Google Scholar 

  • Jiang L, Misherghi G, Su Z (2007) Deckard: Scalable and accurate tree-based detection of code clones. In: ICSE, pp 96–105

  • Kalibera T, Maj P, Morandat F, Vitek J (2014) A fast abstract syntax tree interpreter for R. In: 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’14, Salt Lake City, UT, USA, March 01 - 02, 2014, pp 89–102. doi:10.1145/2576195.2576205

  • Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670. doi:10.1109/TSE.2002.1019480

    Article  Google Scholar 

  • Kapser C, Anderson P, Godfrey M, Koschke R, Rieger M, van Rysselberghe F, Weis̈gerber P (2006) Subjectivity in clone judgment: Can we ever agree?. In: Duplication, Redundancy, and Similarity in Software, no. 06301 in Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany

  • Kapser C, Godfrey M (2006) cloning considered harmful considered harmful. In: 13th working conference on reverse engineering, 2006. WCRE ’06, pp 19–28. doi:10.1109/WCRE.2006.1

  • Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: Proceedings of the 13th working conference on reverse engineering, WCRE ’06, pp 253–262. IEEE computer society, Washington, DC, USA. doi:10.1109/WCRE.2006.18

  • Krutz DE, Le W (2014) A code clone oracle. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 388–391. ACM, New York, NY, USA. doi:10.1145/2597073.2597127

  • Lague B, Proulx D, Mayrand J, Merlo EM, Hudepohl J (1997) Assessing the benefits of incorporating function clone detection in a development process. In: Proceedings of the international conference on software maintenance, ICSM ’97, pp 314–. IEEE computer society, Washington, DC, USA. http://dl.acm.org/citation.cfm?id=645545.853273

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. biometrics:159–174

  • Li Z, Lu S, Myagmar S, Zhou Y (2006) Cp-miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192. doi:10.1109/TSE.2006.28

    Article  Google Scholar 

  • Martinetz T, Schulten K (1991) A Neural-Gas Network Learns Topologies. In: Artificial Neural Networks, vol. I, pp 397–402

  • Mende T, Koschke R, Beckwermert F (2009) An evaluation of code similarity identification for the grow-and-prune model. J Softw Maint Evol 21(2):143–169

    Article  Google Scholar 

  • Nguyen HA, Nguyen TT, Pham N, Al-Kofahi J, Nguyen T (2012) Clone management for evolving software. IEEE Trans Softw Eng 38(5):1008–1026. doi:10.1109/TSE.2011.90

    Article  Google Scholar 

  • Roy C, Cordy J (2008) Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: The 16th IEEE international conference on program comprehension, 2008. ICPC 2008, pp 172–181. doi:10.1109/ICPC.2008.41

  • Roy C, Cordy J (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: International conference on software testing, verification and validation workshops, 2009. ICSTW ’09, pp 157–166. doi:10.1109/ICSTW.2009.18

  • Selim G, Foo K, Zou Y (2010) Enhancing source-based clone detection using intermediate representation. In: 2010 17th Working Conference on Reverse Engineering (WCRE), pp 227–236. doi:10.1109/WCRE.2010.33

  • Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM (2014) Towards a big data curated benchmark of inter-project code clones. ICSME:5

  • Walenstein A, Jyoti N, Li J, Yang Y, Lakhotia A (2003) Problems creating task-relevant clone detection reference data. In: 10th working conference on reverse engineering, 2003. WCRE 2003. Proceedings, pp 285–294. doi:10.1109/WCRE.2003.1287259

  • Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: A rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013, pp 455–465. ACM, New York, NY, USA. doi:10.1145/2491411.2491420

  • Yang J, Hotta K, Higo Y, Igaki H, Kusumoto S (2014) Classification model for code clones based on machine learning. Empir Softw Eng:1–31

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Charpentier.

Additional information

Communicated by: Audris Mockus

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Charpentier, A., Falleri, JR., Morandat, F. et al. Raters’ reliability in clone benchmarks construction. Empir Software Eng 22, 235–258 (2017). https://doi.org/10.1007/s10664-015-9419-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9419-z

Keywords

Navigation