Pattern matching for clone and concept detection

Kontogiannis, K. A.; Demori, R.; Merlo, E.; Galler, M.; Bernstein, M.

doi:10.1007/BF00126960

Pattern matching for clone and concept detection

Published: June 1996

Volume 3, pages 77–108, (1996)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

K. A. Kontogiannis¹,
R. Demori¹,
E. Merlo¹,
M. Galler¹ &
…
M. Bernstein¹

367 Accesses
113 Citations
3 Altmetric
Explore all metrics

Abstract

A legacy system is an operational, large-scale software system that is maintained beyond its first generation of programmers. It typically represents a massive economic investment and is critical to the mission of the organization it serves. As such systems age, they become increasingly complex and brittle, and hence harder to maintain. They also become even more critical to the survival of their organization because the business rules encoded within the system are seldom documented elsewhere.

Our research is concerned with developing a suite of tools to aid the maintainers of legacy systems in recovering the knowledge embodied within the system. The activities, known collectively as “program understanding”, are essential preludes for several key processes, including maintenance and design recovery for reengineering.

In this paper we present three pattern-matching techniques: source code metrics, a dynamic programming algorithm for finding the best alignment between two code fragments, and a statistical matching algorithm between abstract code descriptions represented in an abstract language and actual source code. The methods are applied to detect instances of code cloning in several moderately-sized production systems including tcsh, bash, and CLIPS.

The programmer's skill and experience are essential elements of our approach. Selection of particular tools and analysis methods depends on the needs of the particular task to be accomplished. Integration of the tools provides opportunities for synergy, allowing the programmer to select the most appropriate tool for a given task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Introduction to Bioinformatics

References

Adamov, R. “Literature review on software metrics”, Zurich: Institut fur Informatik der Universitat Zurich, 1987.
Google Scholar
Baker S. B, “On Finding Duplication and Near-Duplication in Large Software Systems” In Proceedings of the Working Conference on Reverse Engineering 1995, Toronto ON. July 1995
Biggerstaff, T., Mitbander, B., Webster, D., Program Understanding and the Concept Assignment Problem, Communications of the ACM, May 1994, Vol. 37, No.5, pp. 73–83.
Article Google Scholar
P. Brown et. al. “Class-Based n-gram Models of natural Language”, Journal of Computational Linguistics, Vol. 18, No.4, December 1992, pp.467–479.
Google Scholar
Buss, E., et. al. “Investigating Reverse Engineering Technologies for the CAS Program Understanding Project”, IBM Systems Journal, Vol. 33, No. 3, 1994, pp. 477–500.
Article Google Scholar
G. Canfora., A. Cimitile., U. Carlini., “A Logic-Based Approach to Reverse Engineering Tools Production” Transactions of Software Engineering, Vol.18, No. 12, December 1992, pp. 1053–1063.
Article Google Scholar
Chikofsky, E.J. and Cross, J.H. II, “Reverse Engineering and Design Recovery: A Taxonomy,” IEEE Software, Jan. 1990, pp. 13 - 17.
Church, K., Helfman, I., “Dotplot: a program for exploring self-similarity in millions of lines of text and code”, J. Computational and Graphical Statistics 2,2, June 1993, pp. 153–174.
C-Language Integrated Production System User's Manual NASA Software Technology Division, Johnson Space Center, Houston, TX.
Fenton, E. “Software metrics: a rigorous approach”, Chapman and Hall, 1991.
Halstead, M., H., “Elements of Software Science”, New York: Elsevier North-Holland, 1977.
MATH Google Scholar
J. Hartman., “Technical Introduction to the First Workshop on Artificial Intelligence and Automated Program Understanding” First Workshop on AI and Automated Program Understanding, AAAI'92, San-Jose, CA.
Horwitz S., “Identifying the semantic and textual differences between two versions of a program. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1990, pp. 234–245.
Jankowitz, H., T., “Detecting plagiarism in student PASCAL programs”, Computer Journal, 31.1, 1988, pp. 1–8.
Article Google Scholar
Johnson, H., “Identifying Redundancy in Source Code Using Fingerprints” In Proceedings of CASCON '93, IBM Centre for Advanced Studies, October 24 – 28, Toronto, Vol.1, pp. 171 – 183.
Kuhn, R., DeMori, R., “A Cache-Based Natural Language Model for Speech Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No.6, June 1990, pp. 570–583.
Article Google Scholar
Kontogiannis, K., DeMori, R., Bernstein, M., Merlo, E., “Localization of Design Concepts in Legacy Systems”, In Proceedings of International Conference on Software Maintenance 1994, September 1994, Victoria, BC. Canada, pp. 414–423.
Kontogiannis, K., DeMori, R., Bernstein, M., Galler, M., Merlo, E., “Pattern matching for Design Concept Localization”, In Proceedings of the Second Working Conference on Reverse Engineering, July 1995, Toronto, ON. Canada, pp. 96–103.
McCabe T., J. “Reverse Engineering, reusability, redundancy: the connection”, American Programmer 3, 10, October 1990, pp. 8–13.
Google Scholar
Moller, K., Software metrics: a practitioner's guide to improved product development”
Muller, H., Corrie, B., Tilley, S., Spatial and Visual Representations of Software Structures, Tech. Rep. TR-74. 086, IBM Canada Ltd. April 1992.
Mylopoulos, J., “Telos: A Language for Representing Knowledge About Information Systems”, University of Toronto, Dept. of Computer Science Technical Report KRR-TR-89-1, August 1990, Toronto.
J. NIng., A. Engberts., W. Kozaczynski., “Automated Support for Legacy Code Understanding”, Communications of the ACM, May 1994, Vol.37, No.5, pp.50–57.
Article Google Scholar
Paul, S., Prakash, A., “A Framework for Source Code Search Using Program Patterns”, IEEE Transactions on Software Engineering, June 1994, Vol. 20, No.6, pp. 463–475.
Article Google Scholar
Rich, C. and Wills, L.M., “Recognizing a Program's Design: A Graph-Parsing Approach”, IEEE Software, Jan 1990, pp. 82 - 89.
Tilley, S., Muller, H., Whitney, M., Wong, K., “Domain-retargetable Reverse EngineeringII: Personalized User Interfaces”, In CSM'94: Proceedings of the 1994 Conference on Software Maintenance, September 1994, pp. 336 – 342.
Viterbi, A.J, “Error Bounds for Convolutional Codes and an Asymptotic Optimum Decoding Algorithm”, IEEE Trans. Information Theory, 13(2) 1967.
Wills, L.M., “Automated Program Recognition by Graph Parsing”, MIT Technical Report, AI Lab No. 1358, 1992

Download references

Author information

Authors and Affiliations

McGill University School of Computer Science, 3480 University St., Room 318, H3A 2A7, Montréal, Canada
K. A. Kontogiannis, R. Demori, E. Merlo, M. Galler & M. Bernstein

Authors

K. A. Kontogiannis
View author publications
You can also search for this author in PubMed Google Scholar
R. Demori
View author publications
You can also search for this author in PubMed Google Scholar
E. Merlo
View author publications
You can also search for this author in PubMed Google Scholar
M. Galler
View author publications
You can also search for this author in PubMed Google Scholar
M. Bernstein
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work is in part supported by IBM Canada Ltd., Institute for Robotics and Intelligent Systems, a Canadian Network of Centers of Excellence and, the Natural Sciences and Engineering Research Council of Canada. Based on “Pattern Matching for Design Concept Localization” by K.A.Kontogiannis, R.DeMori, M.Bernstein, M.Galler, E.Merlo, which first appeared in Proceedings of the Second Working Conference on Reverse Enginering, pp.96–103, July, 1995, © IEEE, 1995

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kontogiannis, K.A., Demori, R., Merlo, E. et al. Pattern matching for clone and concept detection. Automated Software Engineering 3, 77–108 (1996). https://doi.org/10.1007/BF00126960

Download citation

Issue Date: June 1996
DOI: https://doi.org/10.1007/BF00126960

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pattern matching for clone and concept detection

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Introduction to Bioinformatics

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pattern matching for clone and concept detection

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Introduction to Bioinformatics

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation