Abstract
Variability models are used to build configurators, for guiding users through the configuration process to reach the desired setting that fulfils user requirements. The same variability model can be used to design different configurators employing different techniques. One of the design options that can change in a configurator is the configuration workflow, i.e., the order and sequence in which the different configuration elements are presented to the configuration stakeholders. When developing a configurator, a challenge is to decide the configuration workflow that better suits stakeholders according to previous configurations. For example, when configuring a Linux distribution the configuration process starts by choosing the network or the graphic card and then, other packages concerning a given sequence. In this paper, we present COnfiguration workfLOw proceSS mIning (COLOSSI), a framework that can automatically assist determining the configuration workflow that better fits the configuration logs generated by user activities given a set of logs of previous configurations and a variability model. COLOSSI is based on process discovery, commonly used in the process mining area, with an adaptation to configuration contexts. Derived from the possible complexity of both logs and the discovered processes, often, it is necessary to divide the traces into small ones. This provides an easier configuration workflow to be understood and followed by the user during the configuration process. In this paper, we apply and compare four different techniques for the traces clustering: greedy, backtracking, genetic and hierarchical algorithms. Our proposal is validated in three different scenarios, to show its feasibility, an ERP configuration, a Smart Farming, and a Computer Configuration. Furthermore, we open the door to new applications of process mining techniques in different areas of software product line engineering along with the necessity to apply clustering techniques for the trace preparation in the context of configuration workflows.
Similar content being viewed by others
Notes
BPMN: Business Process Model and Notation
Dendrogram that is a branching diagram which represents the arrangement of the clusters produced by the corresponding analyses
References
Alférez M, Acher M, Galindo JA, Baudry B, Benavides D (2019) Modeling variability in the video domain: language and experience report. Softw Qual J 27(1):307–347
Astromskis S, Janes A, Mairegger M (2015) A process mining approach to measure how users interact with software: an industrial case study. In: Proceedings of the 2015 international conference on software and system process. ICSSP 2015. ACM, New York, pp 137–141
Augusto A, Conforti R, Dumas M, Rosa ML, Maggi FM, Marrella A, Mecella M, Soo A (2019) Automated discovery of process models from event logs: review and benchmark. IEEE Trans Knowl Data Eng 31(4):686–705. https://doi.org/10.1109/TKDE.2018.2841877
Baker FB, Hubert LJ (1975) Measuring the power of hierarchical cluster analysis. J Am Stat Assoc 70(349):31–38
Ball GH, Hall DJ (1965) Isodata a novel method of data analysis and pattern classification. Tech. rep. Stanford Research Inst, Menlo Park
Bosch J (2018) The three layer product model: an alternative view on spls and variability. In: Proceedings of the 12th international workshop on variability modelling of software-intensive systems, VAMOS 2018, Madrid, Spain, February 7–9, 2018, p 1. https://doi.org/10.1145/3168365.3168366
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat-Theory Methods 3(1):1–27
Cardoso J (2005) Control-flow complexity measurement of processes and weyuker’s properties. In: 6th International enformatika conference, vol 8, pp 213–218
Cheng H, Kumar A (2015) Process mining on noisy logs—can log sanitization help to improve performance? Decis Support Syst 79:138–149. https://doi.org/10.1016/j.dss.2015.08.003
Conforti R, Rosa ML, ter Hofstede AHM (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314. https://doi.org/10.1109/TKDE.2016.2614680
Dakic D, Stefanovic D, Cosic I, Lolic T, Medojevic M (2018) Business application: a literature review. In: 29th DAAAM international symposium on intelligent manufacturing and automation. https://doi.org/10.2507/29th.daaam.proceedings.125
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell (2):224–227
de Leoni M, van der Aalst WMP, Dees M (2016) A general framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst 56:235–257. https://doi.org/10.1016/j.is.2015.07.003
de Medeiros AKA, Guzzo A, Greco G, van der Aalst WMP, Weijters AJMM, van Dongen BF, Saccà D (2007) Process mining based on clustering: a quest for precision. In: Business process management workshops, BPM 2007 international workshops, BPI, BPD, CBP, ProHealth, RefMod, semantics4ws, Brisbane, Australia, September 24, 2007, Revised Selected Papers, pp 17–29. https://doi.org/10.1007/978-3-540-78238-4_4
De Weerdt J, vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Duda RO, Hart PE et al (1973) Pattern classification and scene analysis, vol 3. Wiley, New York
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104
Durán A, Benavides D, Segura S, Trinidad P, Ruiz-Cortés A (2017) Flame: a formal framework for the automated analysis of software product lines validated by automated specification testing. SOSYM 16(4):1049–1082. https://doi.org/10.1007/s10270-015-0503-z
Felfernig A, Walter R, Galindo JA, Benavides D, Erdeniz SP, Atas M, Reiterer S (2018) Anytime diagnosis for reconfiguration. J Intell Inf Syst 51(1):161–182. https://doi.org/10.1007/s10844-017-0492-1
Fernández-Cerero D, Varela-Vaca ÁJ, Fernández-Montes A, Gómez-López MT, Alvárez-Bermejo JA (2019) Measuring data-centre workflows complexity through process mining: the google cluster case. J Supercomput. https://doi.org/10.1007/s11227-019-02996-2
Ferreira DR, Alves C (2011) Discovering user communities in large event logs. In: Daniel F, Barkaoui K, Dustdar S (eds) Business process management workshops—BPM 2011 international workshops, Clermont-Ferrand, France, August 29, 2011, Revised Selected Papers, Part I, Springer, Lecture Notes in Business Information Processing, vol 99, pp 123–134. https://doi.org/10.1007/978-3-642-28108-2_11
Frey T, Van Groenewoud H (1972) A cluster analysis of the d2 matrix of white spruce stands in saskatchewan based on the maximum-minimum principle. J Ecol 60(3):873–886
Galindo J, Turner H, Benavides D, White J (2014a) Testing variability-intensive systems using automated analysis: an application to android. Softw Qual J 1–41. https://doi.org/10.1007/s11219-014-9258-y
Galindo JA, Alférez M, Acher M, Baudry B, Benavides D (2014b) A variability-based testing approach for synthesizing video sequences. In: International symposium on software testing and analysis, ISSTA ’14, San Jose, CA, USA—July 21–26, 2014, pp 293–303
Galindo J, Dhungana D, Rabiser R, Benavides D, Botterweck G, Grünbacher P (2015) Supporting distributed product configuration by integrating heterogeneous variability modeling approaches. Inf Softw Technol 62 (1):78–100
Galindo JA, Benavides D, Trinidad P, Gutiérrez-Fernández AM, Ruiz-Cortés A (2018) Automated analysis of feature models: Quo vadis?. Computing 101:387–433
Ghionna L, Greco G, Guzzo A, Pontieri L (2008) Outlier detection techniques for applications. In: Foundations of intelligent systems. Springer, Berlin, pp 150–159
Grabusts P, et al. (2011) The choice of metrics for clustering algorithms. In: Proceedings of the 8th international scientific and practical conference, vol 2, pp 70–76
Greco G, Guzzo A, Pontieri L, Sacca D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18 (8):1010–1027
Halkidi M, Vazirgiannis M, Batistakis Y (2000) Quality scheme assessment in the clustering process. In: European conference on principles of data mining and knowledge discovery. Springer, pp 265–276
Hartigan JA (1975) Clustering algorithms, 99th, John Wiley & Sons, Inc., USA
Hompes BFA, Verbeek HMW, van der Aalst WMP (2015) Finding suitable activity clusters for decomposed process discovery. In: Ceravolo P, Russo B, Accorsi R (eds) Data-driven process discovery and analysis. Springer International Publishing, Cham, pp 32–57
Hompes BFA, Buijs JCAM, van der Aalst WMP, Dixit PM, Buurman J (2017) Detecting changes in process behavior using comparative case clustering. In: Ceravolo P, Rinderle-Ma S (eds) Data-driven process discovery and analysis. Springer International Publishing, pp 54–75
Hubaux A, Classen A, Heymans P (2009) Formal modelling of feature configuration workflows. In: Proceedings of the 13th international software product line conference, Carnegie Mellon University, Pittsburgh, PA, USA, SPLC ’09, pp 221–230. http://dl.acm.org/citation.cfm?id=1753235.1753266
Hubaux A, Heymans P, Schobbens PY, Deridder D, Abbasi E (2013) Supporting multiple perspectives in feature-based configuration. SOSYM 12 (3):641–663. https://doi.org/10.1007/s10270-011-0220-1. http://www.scopus.com/inward/record.url?eid=2-s2.0-84879788174&partnerID=40&md5=dee1ff6a27f859c32d424a1528d81ada
Hubert L (1974) Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures. J Am Stat Assoc 69 (347):698–704
Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull 83(6):1072
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Kobren A, Monath N, Krishnamurthy A, McCallum A (2017) A hierarchical algorithm for extreme clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’17. ACM, New York, pp 255–264
Krzanowski WJ, Lai Y (1988) A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44(1):23–34
Kuiper FK, Fisher L (1975) 391: a Monte Carlo comparison of six clustering procedures 777–783. Biometrics 31(3):777–783
Lebart L, Morineau A, Piron M (2000) Statistique exploratoire multidimensionnelle, Dunod, Paris, France
Leemans SJJ, Fahland D, van der Aalst WMP (2014) Discovering block-structured process models from incomplete event logs. In: Petri Nets, Springer, Lecture Notes in Computer Science, vol 8489, pp 91–110
Leemans SJJ, Fahland D, van der Aalst WMP (2015) Scalable process discovery with guarantees. In: Gaaloul K, Schmidt R, Nurcan S, Guerreiro S, Ma Q (eds) Enterprise, business-process and information systems modeling. Springer International Publishing, Cham, pp 85–101
Lettner M, Rodas-Silva J, Galindo JA, Benavides D (2019) Automated analysis of two-layered feature models with feature attributes. J Comput Lang 51:154–172
Ly LT, Indiono C, Mangler J, Rinderle-Ma S (2012) Data transformation and semantic log purging for process mining. In: CAiSE, Springer, Lecture notes in computer science, vol 7328, pp 238–253
MacKay DJC (2002) Information theory inference & learning algorithms. Cambridge University Press, New York
Makanju A, Brooks S, Zincir-Heywood AN, Milios EE, Safavi-Naini R (2008) Logview: visualizing event log clusters. In: Korba L, Marsh S (eds) Sixth annual conference on privacy, security and trust, PST 2008, October 1–3, 2008. IEEE Computer Society, Fredericton, pp 99–108. https://doi.org/10.1109/PST.2008.17
Makanju A, AN Zincir-Heywood, Milios EE (2009) Clustering event logs using iterative partitioning. In: IV J F E, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28–July 1, 2009. ACM, pp 1255–1264. https://doi.org/10.1145/1557019.1557154
Mans RS, Schonenberg MH, Song M, van der Aalst WMP, Bakker PJM (2009) Application of process mining in healthcare—a case study in a dutch hospital. In: Fred A, Filipe J, Gamboa H (eds) Biomedical engineering systems and technologies. Springer, Berlin, pp 425–438
Măruşter L, van Beest NRTP (2009) Redesigning business processes: a methodology based on simulation and techniques. Knowl Inf Syst 21(3):267. https://doi.org/10.1007/s10115-009-0224-0
Maruster L, Weijters AJMM, van der Aalst WMP, van den Bosch A (2002) Process mingin: discovering direct successors in process logs. In: Discovery Science, 5th international conference, DS 2002, Lübeck, Germany, November 24–26, 2002, Proceedings, pp 364–373. https://doi.org/10.1007/3-540-36182-0_37
Maruster L, Weijters AJMM, van der Aalst WMP, van den Bosch A (2006) A rule-based approach for process discovery: dealing with noise and imbalance in process logs. Data Min Knowl Discov 13(1):67–87
McClain JO, Rao VR (1975) Clustisz: a program to test for the quality of clustering of a set of objects. JMR. J Market Res (pre-1986) 12(000004):456
Mendling J (2008) Metrics for business process models. Springer, Berlin, pp 103–133
Milligan GW (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3):325–342
Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359. https://doi.org/10.1093/comjnl/26.4.354. http://oup.prod.sis.lan/comjnl/article-pdf/26/4/354/1072603/26-4-354.pdf
Pereira JA, Matuszyk P, Krieter S, Spiliopoulou M, Saake G (2016a) A feature-based personalized recommender system for product-line configuration. In: Proceedings of the international conference on generative programming: concepts and experiences. ACM, pp 120–131
Pereira JA, Matuszyk P, Krieter S, Spiliopoulou M, Saake G (2016b) A feature-based personalized recommender system for product-line configuration. In: Proceedings of the international conference on generative programming: concepts and experiences. ACM, pp 120–131
Pereira JA, Schulze S, Figueiredo E, Saake G (2018a) N-dimensional tensor factorization for self-configuration of software product lines at runtime. In Proceedings of the 22nd International Systems and Software Product Line Conference - Volume 1 (SPLC ’18). Association for Computing Machinery, New York, NY, USA, 87–97. https://doi.org/10.1145/3233027.3233039
Pereira JA, Matuszyk P, Krieter S, Spiliopoulou M, Saake G (2018b) Personalized recommender systems for product-line configuration processes. Comput Lang Syst Struct 54:451–471
Pérez-Álvarez JM, Maté A, López MTG, Trujillo J (2018) Tactical business-process-decision support based on kpis monitoring and validation. Comput Ind 102:23–39
Pérez-Castillo R, Fernéndez-Ropero M, Piattini M (2019) Business process model refactoring applying ibuprofen. An industrial evaluation. J Syst Softw 147:86–103
Perimal-Lewis L, Teubner D, Hakendorf P, Horwood C (2016) Application of process mining to assess the data quality of routinely collected time-based performance data sourced from electronic health records by validating process conformance. Health Inform J 22(4):1017–1029
Ratkowsky D, Lance G (1978) Criterion for determining the number of groups in a classification Vol. 44, No. 1, pages 23-34
Rodas-Silva J, Galindo JA, García-Gutiérrez J, Benavides D (2019) Selection of software product line implementation components using recommender systems: an application to wordpress. IEEE Access 7:69226–69245
Rohlf FJ (1974) Methods of comparing classifications. Annu Rev Ecol System 5(1):101–113
Rozinat A, de Jong ISM, Günther C W, van der Aalst WMP (2009) Process mining applied to the test process of wafer scanners in ASML. IEEE Trans Syst Man Cybern Part C 39(4):474–479
Rubin V, Günther C W, van der Aalst WMP, Kindler E, van Dongen BF, Schäfer W (2007) Process mining framework for software processes. In: Wang Q, Pfahl D, Raffo DM (eds) Software process dynamics and agility. Springer, Berlin, pp 169–181
Rubin VA, Mitsyuk AA, Lomazova IA, van der Aalst WMP (2014) Process mining can be applied to software too!. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement. ESEM ’14. ACM, New York, pp 57:1–57:8
Sahlabadi M, Muniyandi R, Shukur Z (2014) Detecting abnormal behavior in social network websites by using a process mining technique. J Comput Sci 10(3):393–402. https://doi.org/10.3844/jcssp.2014.393.402
Sani MF, van Zelst SJ, van der Aalst WMP (2017) Improving process discovery results by filtering outliers using conditional behavioural probabilities. In: Business process management workshops—BPM 2017 international workshops, Barcelona, Spain, September 10–11, 2017, Revised Papers. https://doi.org/10.1007/978-3-319-74030-0∖_16, pp 216–229
Sani MF, Boltenhagen M, van der Aalst W (2019) Prototype selection based on clustering and conformance metrics for model discovery. https://arxiv.org/pdf/1912.00736.pdf
Schobbens P, Heymans P, Trigaux J, Bontemps Y (2007) Generic semantics of feature diagrams. Comput Netw 51(2):456–479. https://doi.org/10.1016/j.comnet.2006.08.008
She S, Lotufo R, Berger T, Wasowski A, Czarnecki K (2010) The variability model of the linux kernel. In: VAMOS, vol 10, pp 45–51
Song M, Günther CW, van der Aalst WMP (2008) Trace clustering in process mining. In: Ardagna D, Mecella M, Yang J (eds) Business process management workshops, BPM 2008 international workshops, Milano, Italy, September 1–4, 2008. Revised Papers, Springer, Lecture Notes in Business Information Processing. https://doi.org/10.1007/978-3-642-00328-8∖_11, vol 17, pp 109–120
Song M, Günther C W, van der Aalst WMP (2009) Trace clustering in. In: Ardagna D, Mecella M, Yang J (eds) Business Process Management Workshops. Springer, Berlin, pp 109–120
Tax N, Sidorova N, van der Aalst WMP (2019) Discovering more precise process models from event logs by filtering out chaotic activities. J Intell Inf Syst 52(1):107–139. https://doi.org/10.1007/s10844-018-0507-6
Thüm T, Apel S, Kästner C, Schaefer I, Saake G (2014) A classification and survey of analysis strategies for software product lines. ACMCS 47(1). https://doi.org/10.1145/2580950
Valencia-Parra A, Ramos-Gutiérrez B, Varela-Vaca AJ, López MTG, Bernal AG (2019a) Enabling process mining in aircraf manufactures: extracting event logs and discovering processes from complex data. In: Proceedings of the industry forum at BPM 2019 co-located with 17th international conference on business process management (BPM 2019), Vienna, Austria, September 1–6, 2019, pp 166–177
Valencia-Parra Á, Varela-Vaca ÁJ, Gómez-López MT, Ceravolo P (2019b) CHAMALEON: framework to improve data wrangling with complex data. In: Proceedings of the 40th international conference on information systems, ICIS 2019, Munich, Germany, December 15–18, 2019
van der Aalst WMP (2011) Analyzing “spaghetti processes”. Springer, Berlin
van der Aalst WMP (2016) Process mining–data science in action, 2nd edn. Springer, Berlin
van Dongen BF, de Medeiros AKA, Verbeek HMW, Weijters AJMM, van der Aalst WMP (2005) The prom framework: a new era in process mining tool support. In: Applications and theory of Petri nets 2005, 26th international conference, ICATPN 2005, Miami, USA, June 20–25, 2005, Proceedings, pp 444–454. https://doi.org/10.1007/11494744_25
vanden Broucke SKLM, Weerdt JD (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118. https://doi.org/10.1016/j.dss.2017.04.005
Varela-Vaca AJ, Gasca RM (2013) Towards the automatic and optimal selection of risk treatments for business processes using a constraint programming approach. Inf Softw Technol 55(11):1948–1973
Varela-Vaca ÁJ, Galindo JA, Ramos-Gutiérrez B, Gómez-López MT, Benavides D (2019a) Process mining to unleash variability management: discovering configuration workflows using logs. In: Proceedings of the 23rd International Systems and Software Product Line conference, SPLC 2019, Volume A, Paris, France, September 9–13, 2019, pp 37:1–37:12
Varela-Vaca ÁJ, Gasca RM, Ceballos R, Gómez-López MT, Torres PB (2019b) Cyberspl: a framework for the verification of cybersecurity policy compliance of system configurations using software product lines. Applied Sciences 9(24). https://doi.org/10.3390/app9245364. https://www.mdpi.com/2076-3417/9/24/5364
Wang Y, Tseng MM (2011) Adaptive attribute selection for configurator design via shapley value. Artif Intell Eng Des Anal Manuf 25(2):185–195. https://doi.org/10.1017/S0890060410000624
Wang Y, Tseng M (2014) Attribute selection for product configurator design based on gini index. Int J Prod Res 52(20):6136–6145. https://doi.org/10.1080/00207543.2014.917216
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Weijters AJMM, Ribeiro JTS (2011) Flexible heuristics miner (FHM). In: CIDM. IEEE, pp 310–317
Wilcoxon F (1946) Individual comparisons of grouped data by ranking methods. J Econ Entomol 39(2):269–270
XES (2016) IEEE Standard for eXtensible Event Stream (XES) for achieving interoperability in event logs and event streams. IEEE Std 1849-2016 pp 1–50. https://doi.org/10.1109/IEEESTD.2016.7740858
Acknowledgements
This work has been partially by the Ministry of Science and Technology of Spain through ECLIPSE (RTI2018-094283-B-C33) and OPHELIA (RTI2018-101204-B-C22) projects; the TASOVA network (MCIU-AEI TIN2017-90644-REDT); and the Junta de Andalucía via METAMORFOSIS projects, the European Regional Development Fund (ERDF/FEDER), and the MINECO Juan de la Cierva postdoctoral program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Laurence Duchien, Thomas Thüm and Paul Grünbacher
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Configurable Systems
Appendix: Quality metrics results
Appendix: Quality metrics results
This appendix contains in the Tables 9, 10, 11, 12, and 13, the metric data represented in Figs. 10, 11, 12. To facilitate the interpretation of the data, the values of the metrics have been normalised in each metric, so that, all the results are between 0 and 1, allowing comparisons to be made. In addition, Table 8 is included to show the metric values for the original logs. With this, it can be seen how, in most cases, their values are closer to 0 after clustering, meaning that the resulting configuration workflows are also less complex. Still, it is important to note that it is very difficult to determine a generalisation regarding this data, since they are too domain-specific.
Rights and permissions
About this article
Cite this article
Ramos-Gutiérrez, B., Varela-Vaca, Á.J., Galindo, J.A. et al. Discovering configuration workflows from existing logs using process mining. Empir Software Eng 26, 11 (2021). https://doi.org/10.1007/s10664-020-09911-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-020-09911-x