Abstract
Process mining aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, can successfully deal with the aforementioned challenges. In this paper, we reduce the computation time by using a distributed setting. The population is distributed between the islands of a computer network (e.g. a grid). To further accelerate the method we use sample-based fitness evaluations, i.e. we evaluate the individuals on a sample of the event log instead of the entire event log, gradually increasing the sample size if necessary. Our experiments show that both sampling and distributing the event log significantly improve the performance. The actual speed-up is highly dependent of the combination of the population size and sample size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
van der Aalst, W.M.P., Ter Hofstede, A.H.M., Kiepuszewski, B., Barros, A.P.: Workflow patterns. Distrib. Parallel Databases 14(1), 5–51 (2003)
van der Aalst, W.M.P., Weijters, A.J.M.M., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1128–1142 (2004)
Alves de Medeiros, A.K.: Genetic Process Mining. PhD thesis, Technische Universiteit Eindhoven, Eindhoven, The Netherlands (2006)
Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process mining: An experimental evaluation. Data Mining and Knowledge Discovery 14(2), 245–304 (2007)
Bratosin, C.C., Sidorova, N., van der Aalst, W.M.P.: Distributed genetic miner. In: Proceedings of the 2010 IEEE World Congress on Computational Intelligence (IEEE CEC 2010), Barcelona, Spain, July 18-23, pp. 1951–1958. IEEE, Los Alamitos (2010)
Cantú-Paz, E.: A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux et Systems Repartis 10(2), 141–171 (1998)
Chen, J.-H., Goldberg, D.E., Ho, S.-Y., Sastry, K.: Fitness inheritance in multi-objective optimization. In: GECCO, pp. 319–326 (2002)
Fitzpatrick, J.M., Grefenstette, J.J.: Genetic algorithms in noisy environments. Machine Learning 3, 101–120 (1988)
Günther, C.W., Rozinat, A., van der Aalst, W., van Uden, K.: Monitoring deployed application usage with process mining. Technical report, BPM Center Report BPM-08- 11, BPMcenter.org (2008)
Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing 9(1), 3–12 (2005)
Jin, Y., Branke, J.: Evolutionary optimization in uncertain environments-a survey. IEEE Trans. Evolutionary Computation 9(3), 303–317 (2005)
Kivinen, J., Mannila, H.: The power of sampling in knowledge discovery. In: PODS 1994: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 77–85. ACM, New York (1994)
Rozinat, A., de Jong, I.S.M., Günther, C.W., van der Aalst, W.M.P.: Process mining applied to the test process of wafer scanners in asml. IEEE Tran. on Syst., Man, and Cybernetics 39(4), 474–479 (2009)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)
Tomassini, M.: Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time. Springer-Verlag New York, Inc., Secaucus (2005)
Weijters, A.J.M.M., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integr. Comput.-Aided Eng. 10(2), 151–162 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bratosin, C., Sidorova, N., van der Aalst, W. (2011). Distributed Genetic Process Mining Using Sampling. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2011. Lecture Notes in Computer Science, vol 6873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23178-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-23178-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23177-3
Online ISBN: 978-3-642-23178-0
eBook Packages: Computer ScienceComputer Science (R0)