Abstract
It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT was developed and applied to various versions of an operating system(BSD UNIX). The resulting similarity valuations clearly revealed the evolutionary history characteristics of the BSD UNIX Operating System.
Keywords
- Source Code
- Code Block
- Software Product Line
- White Space
- Software Maintenance
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Antoniol, G., Villano, U., Merlo, E., Penta, M.D.: Analyzing cloning evolution in the linux kernel. Information and Software Technology 44, 755–765 (2002)
Basili, V.R., Briand, L.C., Condon, S.E., Kim, Y.M., Melo, W.L., Valett, J.D.: Understanding and predicting the process of software maintenance release. In: 18th International Conference on Software Engineering, Berlin, pp. 464–474 (1996)
Cook, S., Ji, H., Harrison, R.: Dynamic and static views of software evolution. In: The IEEE International Conference On Software Maintenance (ICSM 2001), Florence, Italy, pp. 592–601 (2001)
Kemerer, C.F., Slaughter, S.: An empirical approach to studying software evolution. IEEE Transactions on Software Engineering 25, 493–509 (1999)
The First Software Product Line Conference (SPLC1): The First Software Product Line Conference (SPLC1) Denver, Colorado (2000), http://www.sei.cmu.edu/plp/conf/SPLC.html
Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. Addison-Wesley, Reading (2001)
Baxevanis, A., Ouellette, F. (eds.) Bioinformatics, 2nd ed., pp. 323–358. John Wiley and Sons, Ltd., England (2001)
Schleimer, S., Wilkerson, D., Aiken, A.: Winnowing: Local algorithms for document fingerprinting. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 76–85 (2003)
Prechelt, L., Malpohl, G., Philippsen, M.: Jplag: Finding plagiarisms among a set of programs. Technical Report 2000-1, Fakultat fur Informatik, Universitat Karlsruhe, Germany (2000)
Wise, M.J.: YAP3: Improved detection of similarities in computer program and other texts. SIGCSEB: SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education) 28 (1996)
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences, pp. 21–29 (1998)
Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference, San Fransisco, CA, USA, pp. 1–10 (1994)
Hunt, J.W., McIlroy, M.D.: An algorithm for differential file comparison. Technical Report 41, Computing Science, Bell Laboratories, Murray Hill, New Jersey (1976)
Miller, W., Myers, E.W.: A file comparison program. Software- Practice and Experience 15, 1025–1040 (1985)
Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1, 251–256 (1986)
Ukkonen, E.: Algorithms for approximate string matching. INFCTRL: Information and Computation (formerly Information and Control) 64, 100–118 (1985)
Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 654–670 (2002)
Gusfield, D.: Algorithms on strings, trees, and sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
McKusick, M., Bostic, K., karels, M., Quarterman, J.: The Design and Implementation of the 4.4BSD UNIX Operating System. Addison-Wesley, Reading (1996)
Everitt, B.S.: Cluster Analysis. Edward Arnold, 3rd ed., London (1993)
Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Second Working Conference on Reverse Engineering, Toronto, Canada, pp. 86–95 (1995)
Baxter, I.D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: Proceedings of the International Conference on Software Maintenance, Bethesda, Maryland, pp. 368–378 (1998)
Ducasse, S., Rieger, M., Demeyer, S.: A language independent approach for detecting duplicated code. In: Proceedings of the International Conference on Software Maintenance, Oxford, England, UK, pp. 109–119 (1999)
Johnson, J.H.: Identifying redundancy in source code using fingerprints. In: Proceedings of CASCON 1993, Toronto, Ontario, pp. 171–183 (1993)
Johnson, J.H.: Substring matching for clone detection and change tracking. In: Proceedings of the International Conference on Software Maintenance, Victoria, British Columbia, pp. 120–126 (1994)
Kontogiannis, K.: Evaluation experiments on the detection of programming patterns using software metrics. In: Proceedings of Fourth Working Conference on Reverse Engineering, Amsterdam, Netherlands, pp. 44–54 (1997)
Mayrand, J., Leblanc, C., Merlo, E.: Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the International Conference on Software Maintenance, Monterey, California, pp. 244–253 (1996)
Halstead, M.H.: Elements of Software Science. Elsevier, New York (1977)
Ottenstein, K.J.: An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bulletin 8, 30–41 (1976)
Berghel, H.L., Sallach, D.L.: Measurements of program similarity in identical task environments. ACM SIGPLAN Notices 19, 65–76 (1984)
Donaldson, J.L., Lancaster, A.M., Sposato, P.H.: A plagiarism detection system. ACM SIGCSE Bulletin (Proc. of 12th SIGSCE Technical Symp.) 13, 21–25 (1981)
Grier, S.: A tool that detects plagiarism in pascal programs. ACM SIGCSE Bulletin (Proc. of 12th SIGSCE Technical Symp.) 13, 15–20 (1981)
Jankowitz, H.T.: Detecting plagiarism in student Pascal programs. The Computer Journal 31, 1–8 (1988)
Verco, K.L., Wise, M.J.: Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems. In: Rosenberg, J. (ed.) Proc. of 1st Ausutralian Conference on Computer Science Education, Sydney, Australia, pp. 86–95 (1996)
Whale, G.: Identification of program similarity in large populations. The Computer Journal 33, 140–146 (1990)
Choi, S.C., Scacchi, W.: Extracting and restructuring the design of large systems. IEEE Software 7, 66–71 (1990)
Schwanke, R.W.: An intelligent for re-engineering software modularity. In: Proceedings of the ThirteenthInternational Conference on Software Engineering, Austin, Texas, USA, pp. 83–92 (1991)
Schwanke, R.W., Platoff, M.A.: Cross references are features. In: Proceedings of the 2nd International Workshop on Software Configuration Management, pp. 86–95 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yamamoto, T., Matsushita, M., Kamiya, T., Inoue, K. (2005). Measuring Similarity of Large Software Systems Based on Source Code Correspondence. In: Bomarius, F., Komi-Sirviö, S. (eds) Product Focused Software Process Improvement. PROFES 2005. Lecture Notes in Computer Science, vol 3547. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11497455_41
Download citation
DOI: https://doi.org/10.1007/11497455_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26200-8
Online ISBN: 978-3-540-31640-4
eBook Packages: Computer ScienceComputer Science (R0)
