Approximate Clone Detection in Repositories of Business Process Models

  • Chathura C. Ekanayake
  • Marlon Dumas
  • Luciano García-Bañuelos
  • Marcello La Rosa
  • Arthur H. M. ter Hofstede
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7481)


Evidence exists that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication may stem from the fact that the repository describes variants of the same processes and/or because of copy/pasting activity throughout the lifetime of the repository. Previous work has put forward techniques for identifying duplicate fragments (clones) that can be refactored into shared subprocesses. However, these techniques are limited to finding exact clones. This paper analyzes the problem of approximate clone detection and puts forward two techniques for detecting clusters of approximate clones. Experiments show that the proposed techniques are able to accurately retrieve clusters of approximate clones that originate from copy/pasting followed by independent modifications to the copied fragments.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Deissenboeck, F., Hummel, B., Jürgens, E., Schätz, B., Wagner, S., Girard, J.-F., Teuchert, S.: Clone Detection in Automotive Model-based Development. In: ICSE (2008)Google Scholar
  2. 2.
    Dijkman, R., Dumas, M., García-Bañuelos, L.: Graph Matching Algorithms for Business Process Model Similarity Search. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 48–63. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Dijkman, R.M., Dumas, M., van Dongen, B.F., Käärik, R., Mendling, J.: Similarity of business process models: Metrics and evaluation. Inf. Syst. 36(2), 498–516 (2011)CrossRefGoogle Scholar
  4. 4.
    Dijkman, R.M., Gfeller, B., Küster, J.M., Völzer, H.: Identifying refactoring opportunities in process model repositories. Information & Software Technology 53(9), 937–948 (2011)CrossRefGoogle Scholar
  5. 5.
    Jung, J.-Y., Bae, J.: Workflow Clustering Method Based on Process Similarity. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3981, pp. 379–389. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Keller, G., Teufel, T.: SAP R/3 Process Oriented Implementation: Iterative Process Prototyping. Addison-Wesley (1998)Google Scholar
  7. 7.
    Koschke, R.: Identifying and Removing Software Clones. In: Software Evolution. Springer (2008)Google Scholar
  8. 8.
    La Rosa, M., Reijers, H.A., van der Aalst, W.M.P., Dijkman, R.M., Mendling, J., Dumas, M., García-Bañuelos, L.: APROMORE: An Advanced Process Model Repository. Expert Systems With Applications 38(6) (2011)Google Scholar
  9. 9.
    Li, C., Reichert, M., Wombacher, A.: The minadept clustering approach for discovering reference process models out of process variants. IJCIS 19(3-4), 159–203 (2010)Google Scholar
  10. 10.
    Melcher, J., Seese, D.: Visualization and clustering of business process collections based on process metric values. In: SYNASC. IEEE (2008)Google Scholar
  11. 11.
    Messmer, B.T.: Efficient Graph Matching Algorithms. PhD thesis, Switzerland (1995)Google Scholar
  12. 12.
    Pham, N.H., Nguyen, H.A., Nguyen, T.T., Al-Kofahi, J.M., Nguyen, T.N.: Complete and Accurate Clone Detection in Graph-based Models. In: ICSE, pp. 276–286. IEEE (2009)Google Scholar
  13. 13.
    Polyvyanyy, A., Vanhatalo, J., Völzer, H.: Simplified Computation and Generalization of the Refined Process Structure Tree. In: WSFM (2010)Google Scholar
  14. 14.
    Storrle, H.: Towards clone detection in UML domain models. Software and Systems Modeling (2011) (on-line)Google Scholar
  15. 15.
    Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2005)Google Scholar
  16. 16.
    Uba, R., Dumas, M., García-Bañuelos, L., La Rosa, M.: Clone Detection in Repositories of Business Process Models. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 248–264. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Vanhatalo, J., Völzer, H., Koehler, J.: The Refined Process Structure Tree. Data Knowl. Eng. 68(9), 793–818 (2009)CrossRefGoogle Scholar
  18. 18.
    Weber, B., Reichert, M., Mendling, J., Reijers, H.A.: Refactoring large process model repositories. Computers in Industry 62(5), 467–486 (2011)CrossRefGoogle Scholar
  19. 19.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: CIKM, pp. 515–524. ACM (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Chathura C. Ekanayake
    • 1
  • Marlon Dumas
    • 2
  • Luciano García-Bañuelos
    • 2
  • Marcello La Rosa
    • 1
  • Arthur H. M. ter Hofstede
    • 1
    • 3
  1. 1.Queensland University of TechnologyAustralia
  2. 2.University of TartuEstonia
  3. 3.Eindhoven University of TechnologyThe Netherlands

Personalised recommendations