Skip to main content

Measuring Similarity of Large Software Systems Based on Source Code Correspondence

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 3547)

Abstract

It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT was developed and applied to various versions of an operating system(BSD UNIX). The resulting similarity valuations clearly revealed the evolutionary history characteristics of the BSD UNIX Operating System.

Keywords

  • Source Code
  • Code Block
  • Software Product Line
  • White Space
  • Software Maintenance

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antoniol, G., Villano, U., Merlo, E., Penta, M.D.: Analyzing cloning evolution in the linux kernel. Information and Software Technology 44, 755–765 (2002)

    CrossRef  Google Scholar 

  2. Basili, V.R., Briand, L.C., Condon, S.E., Kim, Y.M., Melo, W.L., Valett, J.D.: Understanding and predicting the process of software maintenance release. In: 18th International Conference on Software Engineering, Berlin, pp. 464–474 (1996)

    Google Scholar 

  3. Cook, S., Ji, H., Harrison, R.: Dynamic and static views of software evolution. In: The IEEE International Conference On Software Maintenance (ICSM 2001), Florence, Italy, pp. 592–601 (2001)

    Google Scholar 

  4. Kemerer, C.F., Slaughter, S.: An empirical approach to studying software evolution. IEEE Transactions on Software Engineering 25, 493–509 (1999)

    CrossRef  Google Scholar 

  5. The First Software Product Line Conference (SPLC1): The First Software Product Line Conference (SPLC1) Denver, Colorado (2000), http://www.sei.cmu.edu/plp/conf/SPLC.html

  6. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. Addison-Wesley, Reading (2001)

    Google Scholar 

  7. Baxevanis, A., Ouellette, F. (eds.) Bioinformatics, 2nd ed., pp. 323–358. John Wiley and Sons, Ltd., England (2001)

    Google Scholar 

  8. Schleimer, S., Wilkerson, D., Aiken, A.: Winnowing: Local algorithms for document fingerprinting. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 76–85 (2003)

    Google Scholar 

  9. Prechelt, L., Malpohl, G., Philippsen, M.: Jplag: Finding plagiarisms among a set of programs. Technical Report 2000-1, Fakultat fur Informatik, Universitat Karlsruhe, Germany (2000)

    Google Scholar 

  10. Wise, M.J.: YAP3: Improved detection of similarities in computer program and other texts. SIGCSEB: SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education) 28 (1996)

    Google Scholar 

  11. Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences, pp. 21–29 (1998)

    Google Scholar 

  12. Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference, San Fransisco, CA, USA, pp. 1–10 (1994)

    Google Scholar 

  13. Hunt, J.W., McIlroy, M.D.: An algorithm for differential file comparison. Technical Report 41, Computing Science, Bell Laboratories, Murray Hill, New Jersey (1976)

    Google Scholar 

  14. Miller, W., Myers, E.W.: A file comparison program. Software- Practice and Experience 15, 1025–1040 (1985)

    CrossRef  Google Scholar 

  15. Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1, 251–256 (1986)

    CrossRef  MATH  MathSciNet  Google Scholar 

  16. Ukkonen, E.: Algorithms for approximate string matching. INFCTRL: Information and Computation (formerly Information and Control) 64, 100–118 (1985)

    MATH  MathSciNet  Google Scholar 

  17. Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 654–670 (2002)

    CrossRef  Google Scholar 

  18. Gusfield, D.: Algorithms on strings, trees, and sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    CrossRef  MATH  Google Scholar 

  19. McKusick, M., Bostic, K., karels, M., Quarterman, J.: The Design and Implementation of the 4.4BSD UNIX Operating System. Addison-Wesley, Reading (1996)

    Google Scholar 

  20. Everitt, B.S.: Cluster Analysis. Edward Arnold, 3rd ed., London (1993)

    Google Scholar 

  21. Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Second Working Conference on Reverse Engineering, Toronto, Canada, pp. 86–95 (1995)

    Google Scholar 

  22. Baxter, I.D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: Proceedings of the International Conference on Software Maintenance, Bethesda, Maryland, pp. 368–378 (1998)

    Google Scholar 

  23. Ducasse, S., Rieger, M., Demeyer, S.: A language independent approach for detecting duplicated code. In: Proceedings of the International Conference on Software Maintenance, Oxford, England, UK, pp. 109–119 (1999)

    Google Scholar 

  24. Johnson, J.H.: Identifying redundancy in source code using fingerprints. In: Proceedings of CASCON 1993, Toronto, Ontario, pp. 171–183 (1993)

    Google Scholar 

  25. Johnson, J.H.: Substring matching for clone detection and change tracking. In: Proceedings of the International Conference on Software Maintenance, Victoria, British Columbia, pp. 120–126 (1994)

    Google Scholar 

  26. Kontogiannis, K.: Evaluation experiments on the detection of programming patterns using software metrics. In: Proceedings of Fourth Working Conference on Reverse Engineering, Amsterdam, Netherlands, pp. 44–54 (1997)

    Google Scholar 

  27. Mayrand, J., Leblanc, C., Merlo, E.: Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the International Conference on Software Maintenance, Monterey, California, pp. 244–253 (1996)

    Google Scholar 

  28. Halstead, M.H.: Elements of Software Science. Elsevier, New York (1977)

    MATH  Google Scholar 

  29. Ottenstein, K.J.: An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bulletin 8, 30–41 (1976)

    CrossRef  Google Scholar 

  30. Berghel, H.L., Sallach, D.L.: Measurements of program similarity in identical task environments. ACM SIGPLAN Notices 19, 65–76 (1984)

    CrossRef  Google Scholar 

  31. Donaldson, J.L., Lancaster, A.M., Sposato, P.H.: A plagiarism detection system. ACM SIGCSE Bulletin (Proc. of 12th SIGSCE Technical Symp.) 13, 21–25 (1981)

    Google Scholar 

  32. Grier, S.: A tool that detects plagiarism in pascal programs. ACM SIGCSE Bulletin (Proc. of 12th SIGSCE Technical Symp.) 13, 15–20 (1981)

    Google Scholar 

  33. Jankowitz, H.T.: Detecting plagiarism in student Pascal programs. The Computer Journal 31, 1–8 (1988)

    CrossRef  Google Scholar 

  34. Verco, K.L., Wise, M.J.: Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems. In: Rosenberg, J. (ed.) Proc. of 1st Ausutralian Conference on Computer Science Education, Sydney, Australia, pp. 86–95 (1996)

    Google Scholar 

  35. Whale, G.: Identification of program similarity in large populations. The Computer Journal 33, 140–146 (1990)

    CrossRef  Google Scholar 

  36. Choi, S.C., Scacchi, W.: Extracting and restructuring the design of large systems. IEEE Software 7, 66–71 (1990)

    CrossRef  Google Scholar 

  37. Schwanke, R.W.: An intelligent for re-engineering software modularity. In: Proceedings of the ThirteenthInternational Conference on Software Engineering, Austin, Texas, USA, pp. 83–92 (1991)

    Google Scholar 

  38. Schwanke, R.W., Platoff, M.A.: Cross references are features. In: Proceedings of the 2nd International Workshop on Software Configuration Management, pp. 86–95 (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yamamoto, T., Matsushita, M., Kamiya, T., Inoue, K. (2005). Measuring Similarity of Large Software Systems Based on Source Code Correspondence. In: Bomarius, F., Komi-Sirviö, S. (eds) Product Focused Software Process Improvement. PROFES 2005. Lecture Notes in Computer Science, vol 3547. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11497455_41

Download citation

  • DOI: https://doi.org/10.1007/11497455_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26200-8

  • Online ISBN: 978-3-540-31640-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics