Skip to main content

Distributed Genetic Process Mining Using Sampling

  • Conference paper
Parallel Computing Technologies (PaCT 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6873))

Included in the following conference series:

Abstract

Process mining aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, can successfully deal with the aforementioned challenges. In this paper, we reduce the computation time by using a distributed setting. The population is distributed between the islands of a computer network (e.g. a grid). To further accelerate the method we use sample-based fitness evaluations, i.e. we evaluate the individuals on a sample of the event log instead of the entire event log, gradually increasing the sample size if necessary. Our experiments show that both sampling and distributing the event log significantly improve the performance. The actual speed-up is highly dependent of the combination of the population size and sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. van der Aalst, W.M.P., Ter Hofstede, A.H.M., Kiepuszewski, B., Barros, A.P.: Workflow patterns. Distrib. Parallel Databases 14(1), 5–51 (2003)

    Article  Google Scholar 

  2. van der Aalst, W.M.P., Weijters, A.J.M.M., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1128–1142 (2004)

    Article  Google Scholar 

  3. Alves de Medeiros, A.K.: Genetic Process Mining. PhD thesis, Technische Universiteit Eindhoven, Eindhoven, The Netherlands (2006)

    Google Scholar 

  4. Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process mining: An experimental evaluation. Data Mining and Knowledge Discovery 14(2), 245–304 (2007)

    Article  Google Scholar 

  5. Bratosin, C.C., Sidorova, N., van der Aalst, W.M.P.: Distributed genetic miner. In: Proceedings of the 2010 IEEE World Congress on Computational Intelligence (IEEE CEC 2010), Barcelona, Spain, July 18-23, pp. 1951–1958. IEEE, Los Alamitos (2010)

    Google Scholar 

  6. Cantú-Paz, E.: A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux et Systems Repartis 10(2), 141–171 (1998)

    Google Scholar 

  7. Chen, J.-H., Goldberg, D.E., Ho, S.-Y., Sastry, K.: Fitness inheritance in multi-objective optimization. In: GECCO, pp. 319–326 (2002)

    Google Scholar 

  8. Fitzpatrick, J.M., Grefenstette, J.J.: Genetic algorithms in noisy environments. Machine Learning 3, 101–120 (1988)

    Google Scholar 

  9. Günther, C.W., Rozinat, A., van der Aalst, W., van Uden, K.: Monitoring deployed application usage with process mining. Technical report, BPM Center Report BPM-08- 11, BPMcenter.org (2008)

    Google Scholar 

  10. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing 9(1), 3–12 (2005)

    Article  Google Scholar 

  11. Jin, Y., Branke, J.: Evolutionary optimization in uncertain environments-a survey. IEEE Trans. Evolutionary Computation 9(3), 303–317 (2005)

    Article  Google Scholar 

  12. Kivinen, J., Mannila, H.: The power of sampling in knowledge discovery. In: PODS 1994: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 77–85. ACM, New York (1994)

    Chapter  Google Scholar 

  13. Rozinat, A., de Jong, I.S.M., Günther, C.W., van der Aalst, W.M.P.: Process mining applied to the test process of wafer scanners in asml. IEEE Tran. on Syst., Man, and Cybernetics 39(4), 474–479 (2009)

    Article  Google Scholar 

  14. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)

    Google Scholar 

  15. Tomassini, M.: Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time. Springer-Verlag New York, Inc., Secaucus (2005)

    MATH  Google Scholar 

  16. Weijters, A.J.M.M., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integr. Comput.-Aided Eng. 10(2), 151–162 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bratosin, C., Sidorova, N., van der Aalst, W. (2011). Distributed Genetic Process Mining Using Sampling. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2011. Lecture Notes in Computer Science, vol 6873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23178-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23178-0_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23177-3

  • Online ISBN: 978-3-642-23178-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics