Semantic code clone detection for Internet of Things applications using reaching definition and liveness analysis

  • Rajkumar Tekchandani
  • Rajesh Bhatia
  • Maninder Singh


Knowledge extraction from existing software resources for maintenance, re-engineering and bug removal through code clone detection is an integral part of most of the internet-enabled devices. Similar code fragments which are live at different locations are called code clones. These Internet-enabled devices are used for knowledge sharing and data extraction to execute various applications related to code clone detection. However, most of the existing semantic code clone detection techniques are unable to provide heuristic solution for problems such as statement reordering, inversion of control predicates and insertion of irrelevant statements which may cause a performance bottleneck in this environment. To address these issues, we propose a novel approach that finds semantic code clones in a program or procedure using data flow analysis on the basis of reaching definition and liveness analysis. The algorithm based on reaching definition and liveness analysis is designed to find similar code fragments which are structurally divergent, but semantically equivalent. The results obtained demonstrate that the proposed approach using reaching definition and liveness analysis is effective in detection of semantic code clones for various applications running on the Internet-enabled devices. We have found 5831 semantically equivalent clone pairs on subject systems taken from DeCapo benchmark after elimination of 29,029 dead codes/statements having 2,16,579 line of code (LOC).


Code clones Control flow Data flow Reaching definition Liveness analysis 


  1. 1.
    He D, Zeadally S (2015) An analysis of RFID authentication schemes for internet of things in healthcare environment using elliptic curve cryptography. IEEE Internet Things J 2(1):72–83CrossRefGoogle Scholar
  2. 2.
    Lu J, Rosenblum DS, Bultan T, Issarny V, Dustdar S, Storey MA, Zhang D (2015) The future of software engineering for internet computing. IEEE Softw 32(1):91–97Google Scholar
  3. 3.
    Roy CK, Cordy JR (2007) A survey on software clone detection research. Technical report 541, Queens University at KingstonGoogle Scholar
  4. 4.
    Gode N, Koschke R (2013) Studying clone evolution using incremental clone detection. J Softw Evol Process 25(2):165–192CrossRefGoogle Scholar
  5. 5.
    Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering, IEEE, pp 86–95Google Scholar
  6. 6.
    Mayrand J, Leblanc C, Merlo EM (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of International Conference on Software Maintenance, IEEE, pp 244–253Google Scholar
  7. 7.
    Rattan D, Bhatia R, Singh M (2013) Software clone detection: a systematic review. Inf Softw Technol 55(7):1165–1199CrossRefGoogle Scholar
  8. 8.
    Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670CrossRefGoogle Scholar
  9. 9.
    Johnson JH (1994) Substring matching for clone detection and change tracking. In: Proceedings of International Conference on Software Maintenance, IEEE, pp 120–126Google Scholar
  10. 10.
    Li Z, Lu S, Myagmar S, Zhou Y (2006) CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192CrossRefGoogle Scholar
  11. 11.
    Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of International Conference on Software Maintenance, IEEE, pp 368–377Google Scholar
  12. 12.
    Evans WS, Fraser CW, Ma F (2009) Clone detection via structural abstraction. Softw Qual J 17(4):309–330CrossRefGoogle Scholar
  13. 13.
    Wahler V, Seipel D, von Gudenberg JW, Fischer G (2004) Clone detection in source code by frequent itemset techniques. In: IEEE explore, IEEE, pp 128–135Google Scholar
  14. 14.
    Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Measuring clone based reengineering opportunities. In: Proceedings of Sixth International Software Metrics Symposium, IEEE, pp 292–303Google Scholar
  15. 15.
    Higo Y, Kusumoto S, Inoue K (2008) A metric based approach to identifying refactoring opportunities for merging code clones in a Java software system. J Softw Maint Evol Res Practice 20(6):435–461CrossRefGoogle Scholar
  16. 16.
    Kontogiannis KA, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. In: Reverse engineering, Springer US, pp 77–108Google Scholar
  17. 17.
    Lanubile F, Mallardo T (2003) Finding function clones in web applications. In: Proceedings of Seventh European Conference on Software Maintenance and Reengineering, IEEE, pp 379–386Google Scholar
  18. 18.
    Komondoor R, Horwitz S (2000) Semantics-preserving procedure extraction. In: Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM, pp 155–169Google Scholar
  19. 19.
    Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. Static analysis, Springer, Berlin, pp 40–56Google Scholar
  20. 20.
    Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of Eighth Working Conference on Reverse Engineering, IEEE, pp 301–309Google Scholar
  21. 21.
    Liu C, Chen C, Han J, Yu PS (2006) GPLAG: detection of software plagiarism by program dependence graph analysis. In: Proceedings of the 12th ACM SIGKDD International Xonference on Knowledge Discovery and Data Mining, ACM, pp 872–881Google Scholar
  22. 22.
    Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, IEEE Computer Society, pp 96–105Google Scholar
  23. 23.
    Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: 13th Working Conference on Reverse Engineering, WCRE’06, IEEE, pp 253–262Google Scholar
  24. 24.
    Maeda K (2010) An extended line-based approach to detect code clones using syntactic and lexical information. In: 2010 Seventh International Conference on Information Technology: New Generations (ITNG), IEEE, pp 1237–1240Google Scholar
  25. 25.
    Zhang L, Liu D, Li Y, Zhong M (2012) AST-based plagiarism detection method. Internet of things, Springer, Berlin, pp 611–618Google Scholar
  26. 26.
    Jiang L, Su Z (2009) Automatic mining of functionally equivalent code fragments via random testing. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ACM, pp 81–92Google Scholar
  27. 27.
    Johnson JH (1994) Visualizing textual redundancy in legacy source. In: Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, IBM Press, p 32Google Scholar
  28. 28.
    He D, Kumar N, Lee JH (2015) Secure pseudonym-based near field communication protocol for the consumer internet of things. IEEE Trans Consum Electron 61(1):56–63CrossRefGoogle Scholar
  29. 29.
    Pate JR, Tairas R, Kraft NA (2013) Clone evolution: a systematic review. J Softw Evol Process 25(3):261–283CrossRefGoogle Scholar
  30. 30.
    Baker BS (1993) A program for identifying duplicated code. Comput Sci Stat: 1–9Google Scholar
  31. 31.
    Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, ACM, pp 253–262Google Scholar
  32. 32.
    Marcus A, Maletic JI (2001) Identification of high-level concept clones in source code. Automated Software Engineering (ASE’01), San Diego, CA, pp 107–114Google Scholar
  33. 33.
    Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: ICSE ’08: Proceedings of the 30th International Conference on Software Engineering, ACM, New York, pp 321–330Google Scholar
  34. 34.
    Choi S, Park H, Lim HI, Han T (2009) A static API birthmark for Windows binary executables. J Syst Softw 82(5):862–873CrossRefGoogle Scholar
  35. 35.
    Kim H, Jung Y, Kim S, Yi K (2011) MeCC: memory comparison-based clone detector. In: 33rd International Conference on Software Engineering (ICSE), IEEE, pp 301–310Google Scholar
  36. 36.
    Schugerl P (2011) Scalable clone detection using description logic. In: Proceedings of the 5th International Workshop on Software Clones, ACM, pp 47–53Google Scholar
  37. 37.
    Higo Y, Kusumoto S (2011) Code clone detection on specialized PDGs with heuristics. In: 15th European Conference on Software Maintenance and Reengineering (CSMR),IEEE, pp 75–84Google Scholar
  38. 38.
    Elva R, Leavens GT (2012) Semantic clone detection using method IOE-behavior. In: Proceedings of the 6th International Workshop on Software Clones, IEEE, pp 80–81Google Scholar
  39. 39.
    Kamiya T (2013) Agec: An execution-semantic clone detection tool. In: IEEE 21st International Conference on Program Comprehension (ICPC), IEEE, pp 227–229Google Scholar
  40. 40.
    Tekchandani R, Bhatia RK, Singh M (2013) Semantic code clone detection using parse trees and grammar recovery. In: Confluence 2013: The Next Generation Information Technology Summit (4th International Conference), IET, pp 41–46Google Scholar
  41. 41.
    Wang T, Wang K, Su X, Ma P (2014) Detection of semantically similar code. Front Comput Sci 8(6):996–1011MathSciNetCrossRefGoogle Scholar
  42. 42.
    Weiser M (1981) Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, IEEE Press, pp 439–449Google Scholar
  43. 43.
    Bansal G, Tekchandani R (2014) Selecting a set of appropriate metrics for detecting code clones. In: Seventh International Conference on Contemporary Computing (IC3), IEEE, pp 484–488Google Scholar
  44. 44.
    Basit HA, Jarzabek S (2009) A data mining approach for detecting higher-level clones in software. IEEE Trans Softw Eng 35(4):497–514CrossRefGoogle Scholar
  45. 45.
    Basit HA, Jarzabek S (2005) Detecting higher-level similarity patterns in programs. In: ACM SIGSOFT software engineering notes, ACM, vol. 30, no. 5, pp 156–165Google Scholar
  46. 46.
    Dang Y, Zhang D, Ge S, Chu C, Qiu Y, Xie T (2012) XIAO: tuning code clones at hands of engineers in practice. In: Proceedings of the 28th Annual Computer Security Applications Conference, ACM, pp 369–378Google Scholar
  47. 47.
    Roy CK, Cordy JR (2010) Near miss function clones in open source software: an empirical study. J Softw Maint Evol Res Pract 22(3):165–189Google Scholar
  48. 48.
    Jia S, Liu D, Zhang L, Liu C (2012) A research on plagiarism detecting method based on XML similarity and clustering. Internet of things, Springer, Berlin, pp 619–626Google Scholar
  49. 49.
    Luo L, Ming J, Wu D, Liu P, Zhu S (2014) Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, pp 389–400Google Scholar
  50. 50.
    Ekman T, Hedin G (2007) The jastadd extensible Java compiler. ACM Sigplan Not 42(10):1–18CrossRefMATHGoogle Scholar
  51. 51.
    Aho AV, Sethi R, Ullman JD (1986) Compilers, principles, rechniques. Addison wesleyGoogle Scholar
  52. 52.
    Allen FE (1970) Control flow analysis. In: ACM Sigplan notices, ACM, vol. 5, no. 7, pp 1–19Google Scholar
  53. 53.
    Soderberg E, Ekman T, Hedin G, Magnusson E (2013) Extensible intraprocedural flow analysis at the abstract syntax tree level. Sci Comput Program 78(10):1809–1827CrossRefGoogle Scholar
  54. 54.
    Blackburn SM, Garner R, Hoffmann C, Khang AM, McKinley KS, Bentzur R, Diwan A, Feinberg D, Frampton D, Guyer SZ, Hirzel M (2006) The DaCapo benchmarks: Java benchmarking development and analysis. In: ACM Sigplan Notices, ACM, vol. 41, no. 10, pp 169–190Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Rajkumar Tekchandani
    • 1
  • Rajesh Bhatia
    • 2
  • Maninder Singh
    • 1
  1. 1.Thapar UniversityPatialaIndia
  2. 2.PEC University of Science and TechnologyChandigarhIndia

Personalised recommendations