DB-XES: Enabling Process Discovery in the Large
Abstract
Dealing with the abundance of event data is one of the main process discovery challenges. Current process discovery techniques are able to efficiently handle imported event log files that fit in the computer’s memory. Once data files get bigger, scalability quickly drops since the speed required to access the data becomes a limiting factor. This paper proposes a new technique based on relational database technology as a solution for scalable process discovery. A relational database is used both for storing event data (i.e. we move the location of the data) and for pre-processing the event data (i.e. we move some computations from analysis-time to insertion-time). To this end, we first introduce DB-XES as a database schema which resembles the standard XES structure, we provide a transparent way to access event data stored in DB-XES, and we show how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques. Secondly, we show how to move the computation of intermediate data structures to the database engine, to reduce the time required during process discovery. The work presented in this paper is implemented in ProM tool, and a range of experiments demonstrates the feasibility of our approach.
Keywords
Process discovery Process mining Big event data Relational databaseReferences
- 1.Azzini, A., Ceravolo, P.: Consistent process mining over big data triple stores. In: 2013 IEEE International Congress on Big Data, pp. 54–61, June 2013Google Scholar
- 2.Calvanese, D., Montali, M., Syamsiyah, A., van der Aalst, W.M.P.: Ontology-driven extraction of event logs from relational databases. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 140–153. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42887-1_12CrossRefGoogle Scholar
- 3.Di Ciccio, C., Maggi, F.M., Mendling, J.: Efficient discovery of target-branched declare constraints. Inf. Syst. 56, 258–283 (2016)CrossRefGoogle Scholar
- 4.Di Ciccio, C., Mecella, M.: On the discovery of declarative control flows for artful processes. ACM Trans. Manage. Inf. Syst. 5(4), 24:1–24:37 (2015)CrossRefGoogle Scholar
- 5.Di Ciccio, C., Mecella, M.: Mining constraints for artful processes. In: Abramowicz, W., Kriksciuniene, D., Sakalauskas, V. (eds.) BIS 2012. LNBIP, vol. 117, pp. 11–23. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30359-3_2CrossRefGoogle Scholar
- 6.Günther, C.W.: XES Standard Definition (2014). www.xes-standard.org
- 7.Hernández, S., van Zelst, S.J., Ezpeleta, J., van der Aalst, W.M.P.: Handling big(ger) logs: connecting prom 6 to apache hadoop. In: BPM Demo Session 2015, pp. 80–84 (2015)Google Scholar
- 8.Jans, M., Alles, M., Vasarhelyi, M.A.: Process mining of event logs in internal auditing: a case study. In: ISAIS (2012)Google Scholar
- 9.Jans, M., Alles, M., Vasarhelyi, M.A.: Process Mining of Event Logs in Auditing: Opportunities and Challenges. Available at SSRN 2488737 (2010)Google Scholar
- 10.Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17CrossRefGoogle Scholar
- 11.Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013. LNBIP, vol. 171, pp. 66–78. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06257-0_6CrossRefGoogle Scholar
- 12.Maggi, F.M., Burattin, A., Cimitile, M., Sperduti, A.: Online process discovery to detect concept drifts in LTL-based declarative process models. In: Meersman, R., Panetto, H., Dillon, T., Eder, J., Bellahsene, Z., Ritter, N., De Leenheer, P., Dou, D. (eds.) OTM 2013. LNCS, vol. 8185, pp. 94–111. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41030-7_7CrossRefGoogle Scholar
- 13.Mannhardt, F.: XESLite Managing Large XES Event Logs in ProM. BPM Center Report BPM-16-04 (2016)Google Scholar
- 14.Mans, R.S., Schonenberg, M.H., Song, M., van der Aalst, W.M.P., Bakker, P.J.M.: Application of process mining in healthcare – a case study in a Dutch hospital. In: Fred, A., Filipe, J., Gamboa, H. (eds.) BIOSTEC 2008. CCIS, vol. 25, pp. 425–438. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-92219-3_32CrossRefGoogle Scholar
- 15.Paszkiewicz, Z.: Process mining techniques in conformance testing of inventory processes: an industrial application. In: Abramowicz, W. (ed.) BIS 2013. LNBIP, vol. 160, pp. 302–313. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41687-3_28CrossRefGoogle Scholar
- 16.Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Spaccapietra, S. (ed.) Journal on Data Semantics X. LNCS, vol. 4900, pp. 133–173. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77688-8_5CrossRefMATHGoogle Scholar
- 17.Puchovsky, M., Di Ciccio, C., Mendling, J.: A case study on the business benefits of automated process discovery. In: SIMPDA, pp. 35–49 (2016)Google Scholar
- 18.Reguieg, H., Benatallah, B., Nezhad, H.R.M., Toumani, F.: Event correlation analytics: scaling process mining using mapreduce-aware event correlation discovery techniques. IEEE Trans. Serv. Comput. 8(6), 847–860 (2015)CrossRefGoogle Scholar
- 19.Rozinat, A., de Jong, I.S.M., Günther, C.W., van der Aalst, W.M.P.: Process mining applied to the test process of wafer scanners in ASML. IEEE Trans. Syst. Man Cybern. Part C 39(4), 474–479 (2009)CrossRefGoogle Scholar
- 20.Schönig, S., Rogge-Solti, A., Cabanillas, C., Jablonski, S., Mendling, J.: Efficient and customisable declarative process mining with SQL. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 290–305. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_18CrossRefGoogle Scholar
- 21.Sharma, V., Dave, M.: SQL and NoSQL databases. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(8), 20–27 (2012)Google Scholar
- 22.van der Aalst, W.M.P.: Distributed process discovery and conformance checking. In: de Lara, J., Zisman, A. (eds.) FASE 2012. LNCS, vol. 7212, pp. 1–25. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28872-2_1CrossRefGoogle Scholar
- 23.van der Aalst, W.M.P.: Decomposing petri nets for process mining: a generic approach. Distrib. Parallel Databases 31(4), 471–507 (2013)CrossRefGoogle Scholar
- 24.van der Aalst, W.M.P.: Process Mining: Data Science in Action. Springer, Berlin (2016). https://doi.org/10.1007/978-3-662-49851-4CrossRefGoogle Scholar
- 25.van der Aalst, W.M.P., Damiani, E.: Processes meet big data: connecting data science with process science. IEEE Trans. Serv. Comput. 8(6), 810–819 (2015)CrossRefGoogle Scholar
- 26.van der Aalst, W.M.P., Reijers, H.A., Song, M.: Discovering social networks from event logs. Comput. Support. Coop. Work (CSCW) 14(6), 549–593 (2005)CrossRefGoogle Scholar
- 27.van der Spoel, S., van Keulen, M., Amrit, C.: Process prediction in noisy data sets: a case study in a Dutch hospital. In: Cudre-Mauroux, P., Ceravolo, P., Gašević, D. (eds.) SIMPDA 2012. LNBIP, vol. 162, pp. 60–83. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40919-6_4CrossRefGoogle Scholar
- 28.van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. In: van Hee, K.M., Valk, R. (eds.) PETRI NETS 2008. LNCS, vol. 5062, pp. 368–387. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68746-7_24CrossRefGoogle Scholar
- 29.van Dongen, B.F.: BPI Challenge 2017 (2017)Google Scholar
- 30.van Dongen, B.F., Shabani, S.: Relational XES: data management for process mining. In: CAiSE 2015, pp. 169–176 (2015)Google Scholar
- 31.van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Know what you stream: generating event streams from CPN models in ProM 6. In: BPM Demo Session 2015, pp. 85–89 (2015)Google Scholar
- 32.Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17722-4_5CrossRefGoogle Scholar
- 33.Vogelgesang, T., Appelrath, H.-J.: A relational data warehouse for multidimensional process mining. In: Ceravolo, P., Rinderle-Ma, S. (eds.) SIMPDA 2015. LNBIP, vol. 244, pp. 155–184. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53435-0_8CrossRefGoogle Scholar
- 34.Zhou, Z., Wang, Y., Li, L.: Process mining based modeling and analysis of workflows in clinical care - a case study in a Chicago outpatient clinic. In: ICNSC, pp. 590–595 (2014)Google Scholar