Skip to main content
Log in

Business Process Modeling Abstraction Based on Semi-Supervised Clustering Analysis

  • Research Paper
  • Published:
Business & Information Systems Engineering Aims and scope Submit manuscript

Abstract

The most prominent Business Process Model Abstraction (BPMA) use case is the construction of the process “quick view” for rapidly comprehending a complex process. Some researchers propose process abstraction methods to aggregate the activities on the basis of their semantic similarity. One important clustering technique used in these methods is traditional k-means cluster analysis which so far is an unsupervised process without any priori information, and most of the techniques aggregate the activities only according to business semantics without considering the requirement of an order-preserving model transformation. The paper proposes a BPMA method based on semi-supervised clustering which chooses the initial clusters based on the refined process structure tree and designs constraints by combining the control flow consistency of the process and the semantic similarity of the activities to guide the clustering process. To be more precise, the constraint function is discovered by mining from a process model collection enriched with subprocess relations. The proposed method is validated by applying it to a process model repository in use. In an experimental validation, the proposed method is compared to the traditional k-means clustering (parameterized with randomly chosen initial clusters and an only semantics-based distance measure), showing that the approach closely approximates the decisions of the involved modelers to cluster activities. As such, the paper contributes to the development of modeling support for effective process model abstraction, facilitating the use of business process models in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Alves de Medeiros AK, van der Aalst WMP, Pedrinaci C (2008) Semantic process mining tools: core building blocks. In: Proceedings of the 16th European conference on information systems, Galway, pp 475–478

  • Bar-Hillel A, Hertz T, Shental N (2003) Learning distance functions using equivalence relations. In: Proceedings of the twentieth international conference on machine learning, pp 11–18

  • Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. In: Proceedings of the nineteenth international conference on machine learning, pp 19–26

  • Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68. doi:10.1145/1014052.1014062

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning, pp 81–88. doi:10.1145/1015330.1015360

  • Bobrik R, Reichert M, Bauer T (2007a) View-based process visualization. In: International conference on business process management, Brisbane, Australia. LNCS 4714. Springer, Heidelberg, pp 88–95

  • Bobrik R, Reichert M, Bauer T (2007b) Parameterizable views for process visualization. Technical report TR-CTIT-07-37, Centre for Telematics and Information Technology, University of Twente, Enschede

  • Bose RPJC, van der Aalst WMP (2009) Abstractions in process mining: a taxonomy of patterns. In: Proceedings of the 7th international conference on business process management. LNCS 5701. Springer, Heidelberg, pp 159–175

  • Bose RPJC, Verbeek EHMW, van der Aalst WMP (2012) Discovering hierarchical process models using ProM. In: Nurcan S (ed) IS Olympics: information systems in a diverse world, vol 107. LNBIP, pp 33–48

  • Casati F, Shan M-C (2002) Semantic analysis of business process executions. Proceedings of the 8th international conference on extending database technology: advances in database technology. Springer, Heidelberg, pp 287–296

    Google Scholar 

  • Cohn D, Caruana R, McCallum A (2009) Semi-supervised clustering with user feedback. In: Basu S Davidson I, Wagstaff K (eds) Constrained clustering: advances in algorithms, theory, and applications. Data Mining and Knowledge Discovery Series, chapter 2. CRC, Boca Raton, pp 17–31

    Chapter  Google Scholar 

  • Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. In: Proceedings of the artificial neural networks in engineering conference, pp 809–814

  • Dumas M, Luciano García-Bañuelos L, Polyvyanyy A et al (2010) Aggregate quality of service computation for composite services. In: ICSOC 2010, San Francisco, 7–10 December. LNCS 6470. Springer, Heidelberg, pp 213–227

    Chapter  Google Scholar 

  • Eshuis R, Grefen P (2008) Constructing customized process views. Data Knowl Eng 64(2):419–438

    Article  Google Scholar 

  • Euzenat J, Shvaiko P (2007) Ontology matching. Springer, Heidelberg

    Google Scholar 

  • Fahland D, Favre C, Koehler J et al (2011) Analysis on demand: instantaneous soundness checking of industrial business process models. Data Knowl Eng 70(5):448–466

    Article  Google Scholar 

  • Francescomarino CD, Marchetto A, Tonella P (2013) Cluster-based modularization of processes recovered from web applications. J Softw Maint Evol Res Pract 25(2):113–138

    Article  Google Scholar 

  • Gao Y, Liu DY, Qi H (2008) Semi-supervised k-means clustering algorithm for multi-type relational data. J Softw 19(11):2814–2821 (in Chinese with English abstract)

    Article  Google Scholar 

  • Gaynor S, Bair E (2013) Identification of biologically relevant subtypes via preweighted sparse clustering. ArXiv e-prints 2013. arXiv:1304.3760. http://biostats.bepress.com/cgi/viewcontent.cgi?article=1032&context=uncbiostat. Accessed 14 Aug 2014

  • Gschwind T, Koehler J, Wong J (2008) Applying patterns during business process modeling. In: International conference on business process management. LNCS 5240. Springer, Heidelberg, pp 4–19

    Google Scholar 

  • Günther CW, van der Aalst WMP (2006) Mining activity clusters from low-level event logs. BETA working paper series, WP 165

  • Günther CW, van der Aalst WMP (2007) Fuzzy mining: adaptive process simplification based on multi-perspective metrics. In: International conference on business process management, Brisbane, LNCS 4714. Springer, Heidelberg, pp 328–343

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Heidelberg

    Book  Google Scholar 

  • Hepp M, Leymann F, Domingue J et al. (2005) Semantic business process management: a vision towards using semantic web services for business process management. In: IEEE international conference on e-business engineering (ICEBE’05), Beijing. IEEE Computer Society, pp 535–540

  • Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, pp 561–566

  • Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the nineteenth international conference on machine learning. Morgan Kaufmann, Burlington, pp 307–314

  • Kolb J, Reichert M (2013a) A flexible approach for abstracting and personalizing large business process models. ACM Sigapp Appl Comput Rev 13(1):6–17

    Article  Google Scholar 

  • Kolb J, Reichert M (2013b) Data flow abstractions and adaptations through updatable process views. In: Proceedings of 28th ACM symposium on applied computing. ACM, pp 1447–1453

  • Lau JM, Iochpe C, Thom L et al (2009) Discovery and analysis of activity pattern co-occurrences in business process models. In: Proceedings of the international conference on enterprise information systems, Milan, vol Isas, pp 83–88

  • Li J, Bose RPJC, van der Aalst WMP (2010) Mining context-dependent and interactive business process maps using execution patterns. In: zur Muehlen M, Su J (eds) BPM 2010 Workshops, LNBIP 66. Springer, Heidelberg, pp 109–121

    Chapter  Google Scholar 

  • Liu D, Shen M (2003) Workflow modeling for virtual processes: an order- preserving process-view approach. Inf Syst 28(6):505–532

    Article  Google Scholar 

  • Mendling J, Verbeek H, van Dongen BF et al (2008) Detection and prediction of errors in EPCs of the SAP reference model. Data Knowl Eng 64(1):312–329

    Article  Google Scholar 

  • Nan W, Shanwu S, Ying L et al (2015) Business process model abstraction based on structure and semantics. ICIC Express Lett 2(9):557–563

    Google Scholar 

  • Polyvyanyy A, Smirnov S, Weske M (2008) Reducing complexity of large EPCs. In: EPK 2008 GI-Workshop, Saarbrücken

  • Polyvyanyy A, Smirnov S, Weske M (2009a) On application of structural decomposition for process model abstraction. Business Process, Services Computing and Intelligent Service Management, Leipzig, pp 110–122

    Google Scholar 

  • Polyvyanyy A, Smirnov S, Weske M (2009b) The triconnected abstraction of process models. In: International Conference on Business Process Management, Ulm, LNCS 5701. Springer, Heidelberg, pp 229–24

    Google Scholar 

  • Polyvyanyy A, Vanhatalo J, Völzer H (2010) Simplified computation and generalization of the refined process structure tree. In: Proceedings of the WS-FM 2010, LNCS 6551. Springer, Heidelberg, pp 25–41

    Google Scholar 

  • Porter MF (1980) An algorithm for suffix stripping. Progr 14(3):130–137

    Article  Google Scholar 

  • Qu Y, Hu W, Cheng G (2006) Constructing virtual documents for ontology matching. In: Proceedings of the 15th international conference on World Wide Web. ACM, New York, pp 23–31. doi:10.1145/1135777.1135786

  • Reijers HA, Mendling J, Dijkman RM (2010) On the usefulness of subprocesses in business process models. BPM center report BPM-10-03. http://www.BPMcenter.org. Accessed 18 Sept 2013

  • Ruiz C, Spiliopoulou M, Menasalvas E (2007) C-DBSCAN: density-based clustering with constraints. In: Proceedings of the Rough sets, fuzzy sets, data mining and granular computing. LNCS 4482, pp 216–223. doi:10.1007/978-3-540-72530-5_25

    Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  • Schaeffer S (2007) Graph clustering—survey. Comput Sci Rev 1:27–64

    Article  Google Scholar 

  • Schultz M, Joachims T (2003) Learning a distance metric from relative comparisons. Adv Neural Inf Process Syst 16:40–47

    Google Scholar 

  • Sharp A, McDermott P (2008) Workflow modeling: tools for process improvement and applications development. Artech House, London

    Google Scholar 

  • Smirnov S (2012) Business process model abstraction. Doctor Dissertation, University of Potsdam

  • Smirnov S, Weidlich M, Mendling J et al (2009) Action patterns in business process models. Comput Ind 63(2):115–129

    Google Scholar 

  • Smirnov S, Weidlich M, Mendling J (2010) Business process model abstraction based on behavioral profiles. Service-Oriented Computing. LNCS 6470. Springer, Heidelberg, pp 1–16

  • Smirnov S, Weidlich M, Mendling J (2010a) Object-sensitive action patterns in process model repositories. In: zur Muehlen M et al. (eds) Business Process Management Workshops, vol 66. Springer, Heidelberg, pp 251–263

    Chapter  Google Scholar 

  • Smirnov S, Dijkman R, Mendling J et al. (2010b) Meronymy-based aggregation of activities in business process models. In: Conceptual Modeling – ER 2010. 29th international conference on conceptual modeling, Vancouver, Canada. LNCS 6412. Springer, Heidelberg, pp 1–14

    Google Scholar 

  • Smirnov S, Reijers HA, Weske M (2011) A semantic approach for business process model abstraction. International Conference on Advanced Information Systems Engineering, LNCS 6741. Springer, Heidelberg, pp 497–511

    Google Scholar 

  • Smirnov S, Reijers HA, Weske MH et al (2012) Business process model abstraction: a definition, catalog, and survey. Distrib Parallel Databases 30(1):63–99

    Article  Google Scholar 

  • Tang W, Xiong H, Zhong S et al. (2007) Enhancing semi-supervised clustering: A feature projection perspective. In: Proceedings of the thirteenth international conference on knowledge discovery and data mining, pp 707–716, doi: 10.1145/1281192.1281268

  • van der Aalst WMP, Basten T (1997) Life-cycle inheritance: a petri-net-based approach. Proceedings of the 18th international conference on application and theory of Petri Nets, LNCS 1248. Springer, Heidelberg, pp 62–81

    Google Scholar 

  • van der Aalst WMP, ter Hofstede AHM, Kiepuszewski B et al (2003) Workflow patterns. Distrib Parallel Databases 14:5–51

    Article  Google Scholar 

  • van der Aalst W, Weijters A, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142

    Article  Google Scholar 

  • Vanhatalo J, Völzer H, Leymann F (2007) Faster and more focused control-flow analysis for business process models through SESE decomposition. In: ICSOC 2007, Vienna, LNCS 4749, pp 43–55

  • Vanhatalo J, Völzer H, Koehler J (2009) The refined process structure tree. Data Knowl Eng 68(9):793–818. doi:10.1016/j.datak.2009.02.015

    Article  Google Scholar 

  • Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: ICML’00 proceedings of the seventeenth international conference on machine learning, pp 1103–1110

  • Wagstaff K, Cardie C, Rogers S et al. (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, pp 577–584

  • Wang L, Bo LF, Jiao LC (2007) Density-sensitive semi-supervised spectral clustering (in Chinese with English abstract). J Softw 18(10):2412-2422. doi:10.1360/jos182412. http://www.jos.org.cn/1000-9825/18/2412.html. Accessed 21 Mar 2013

    Article  Google Scholar 

  • Weidlich M, Dijkman R, Mendling J (2010) The ICoP framework-identification of correspondences between process models. In: Proceedings of the 22nd international conference on advanced information systems engineering, LNCS 6051, pp 483–498

  • Weidlich M, Mendling J, Weske M (2011) Efficient consistency measurement based on behavioural profiles of process models. IEEE Transact Softw Eng 37(3):410–429

    Article  Google Scholar 

  • Xing EP, Ng AY, Jordan MI et al (2003) Distance metric learning with application to clustering with side-information. Adv Neural Inf Process Syst 15:505–512

    Google Scholar 

  • Xu QJ, Desjardins M, Wagstaf K (2005) Constrained spectral clustering under a local proximity structure assumption. In: Proceedings of the eighteenth international Florida artificial intelligence research society conference, Clearwater Beach, pp 866–867

  • Yin X, Chen S, Hu E et al (2010) Semi-supervised clustering with metric learning: an adaptive kernel method. Pattern Recognit 43(4):1320–1333. doi:10.1016/j.patcog.2009.11.005

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by NSFC under Grant Nos. 61402193, 61272208, 61133011, 60973089, 61003101, 61170092, by the Jilin Province Science and Technology Development Plan under Grant Nos. 20130522177JH, by the Jilin Provincial Department of Education “Twelfth/Thirteenth Five Year Plan” Science and Technology Development Plan under Grant Nos. 2014160, 2016105, and by the Jilin Province Education Science “Twelfth Five Year Plan” under Grant Nos. GH150285, GH16249.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanwu Sun.

Additional information

Accepted after three revisions by Prof. Dr. Becker.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, N., Sun, S. & OuYang, D. Business Process Modeling Abstraction Based on Semi-Supervised Clustering Analysis. Bus Inf Syst Eng 60, 525–542 (2018). https://doi.org/10.1007/s12599-016-0457-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12599-016-0457-x

Keywords

Navigation