Advertisement

Towards Event Log Querying for Data Quality

Let’s Start with Detecting Log Imperfections
  • Robert AndrewsEmail author
  • Suriadi Suriadi
  • Chun Ouyang
  • Erik Poppe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11229)

Abstract

Process mining is, by now, a well-established discipline focussing on process-oriented data analysis. As with other forms of data analysis, the quality and reliability of insights derived through analysis is directly related to the quality of the input (garbage in - garbage out). In the case of process mining, the input is an event log comprised of event data captured (in information systems) during the execution of the process. It is crucial then that the event log be treated as a first-class citizen. While data quality is an easily understood concept little effort has been directed towards systematically detecting data quality issues in event logs. Analysts still spend a large proportion of any project in ‘data cleaning’, often involving manual and ad hoc tasks, and requiring more than one tool. While there are existing tools and languages that query event logs, the problem of different approaches for different log imperfections remains. In this paper we take the first steps to developing QUELI (Querying Event Log for Imperfections) a log query language that provides direct support for detecting log imperfections. We develop an approach that identifies capabilities required of QUELI and illustrate the approach by applying it to 5 of the 11 event log imperfection patterns described in [29]. We view this as a first step towards operationalising systematic, automated support for log cleaning.

Keywords

Process mining Event log query language Data quality Event log imperfection patterns 

Notes

Acknowledgement

The contributions to this paper of Robert Andrews and Chun Ouyang were supported through ARC Discovery Grant DP150103356.

References

  1. 1.
    ISO/IEC 25010:2011: Systems and software engineering - Systems and software product Quality Requirements and Evaluation (SQuaRE) - System and software quality models (2011)Google Scholar
  2. 2.
    van der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28108-2_19CrossRefGoogle Scholar
  3. 3.
    van der Aalst, W.: Process Mining: Discovery Conformance and Enhancement of Business Processes. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-19345-3CrossRefzbMATHGoogle Scholar
  4. 4.
    Batini, C., Palmonari, M., Viscusi, G.: Opening the closed world: a survey of information quality research in the wild. In: Floridi, L., Illari, P. (eds.) The Philosophy of Information Quality. SL, vol. 358, pp. 43–73. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07121-3_4CrossRefGoogle Scholar
  5. 5.
    Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006).  https://doi.org/10.1007/3-540-33173-5CrossRefzbMATHGoogle Scholar
  6. 6.
    Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R.: Scalable graph-based OLAP analytics over process execution data. Distrib. Parallel Datab. 34(3), 379–423 (2016)CrossRefGoogle Scholar
  7. 7.
    Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R., Sakr, S.: A query language for analyzing business processes execution. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 281–297. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23059-2_22CrossRefGoogle Scholar
  8. 8.
    Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03848-8_12CrossRefGoogle Scholar
  9. 9.
    Jagadeesh Chandra Bose, R.P., Mans, R.S., van der Aalst, W.M.: Wanna improve process mining results? CIDM 2013, 127–134 (2013)Google Scholar
  10. 10.
    Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-31164-2CrossRefGoogle Scholar
  11. 11.
    CrowdFlower: 2017 Data Scientist Report (2017). https://visit.crowdflower.com. Accessed 25 July 2018
  12. 12.
    Dijkman, R., Gao, J., Grefen, P., ter Hofstede, A.: Relational algebra for in-database process mining. arXiv preprint arXiv:1706.08259 (2017)
  13. 13.
    Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-91563-0_17CrossRefGoogle Scholar
  14. 14.
    Durand, J., Cho, H., Moberg, D., Woo, J.: XTemp: event-driven testing and monitoring of business processes. In: Proceedings of Balisage, The Markup Conference 2011, vol. 7. Balisage Series on Markup Technologies (2011)Google Scholar
  15. 15.
    Günther, C.W., Rozinat, A.: Disco: discover your processes. BPM (Demos) 940, 40–44 (2012)Google Scholar
  16. 16.
    Laranjeiro, N., Soydemir, S.N., Bernardino, J.: A survey on data quality: classifying poor data. In: PRDC 2015, pp. 179–188. IEEE (2015)Google Scholar
  17. 17.
    Leemans, M., van der Aalst, W.M.P.: Discovery of frequent episodes in event logs. In: Ceravolo, P., Russo, B., Accorsi, R. (eds.) SIMPDA 2014. LNBIP, vol. 237, pp. 1–31. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-27243-6_1CrossRefGoogle Scholar
  18. 18.
    Lohr, S.: For big-data scientists, ‘janitor work’ is key hurdle to insights. New York Times, 17 August 2014Google Scholar
  19. 19.
    Lu, X., et al.: Semi-supervised log pattern detection and exploration using event concurrence and contextual information. In: Panetto, H., et al. (eds.) OTM On the Move to Meaningful Internet Systems, pp. 154–174. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-69462-7_11CrossRefGoogle Scholar
  20. 20.
    Mannhardt, F., de Leoni, M., Reijers, H.A., van der Aalst, W.M.P., Toussaint, P.J.: From low-level events to activities - a pattern-based approach. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 125–141. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45348-4_8CrossRefGoogle Scholar
  21. 21.
    Mans, R.S., van der Aalst, W.M., Vanwersch, R., Moleman, A.: Process Support and Knowledge Representation in Health Care. LNCS, vol. 7738, pp. 140–153. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-36438-9CrossRefGoogle Scholar
  22. 22.
    González López de Murillas, E., Reijers, H.A., van der Aalst, W.M.P.: Everything you always wanted to know about your process, but did not know how to ask. In: Dumas, M., Fantinato, M. (eds.) BPM 2016. LNBIP, vol. 281, pp. 296–309. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58457-7_22CrossRefGoogle Scholar
  23. 23.
    Perez-Alvarez, J.M., Gomez-Lopez, M.T., Parody, L., Gasca, R.M.: Process instance query language to include process performance indicators in DMN. In: EDOCW 2016, pp. 1–8. IEEE (2016)Google Scholar
  24. 24.
    Prud‘hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C recommendation, January 2008 (2008)Google Scholar
  25. 25.
    Schönig, S., Rogge-Solti, A., Cabanillas, C., Jablonski, S., Mendling, J.: Efficient and customisable declarative process mining with SQL. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 290–305. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39696-5_18CrossRefGoogle Scholar
  26. 26.
    Shabani, S., et al.: Relational XES: data management for process mining. In: CAiSE 2015. CEUR-WS. org (2015)Google Scholar
  27. 27.
    Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)CrossRefGoogle Scholar
  28. 28.
    Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Commun. ACM 40(5), 103–110 (1997)CrossRefGoogle Scholar
  29. 29.
    Suriadi, S., Andrews, R., ter Hofstede, A., Wynn, M.: Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf. Syst. 64, 132–150 (2017)CrossRefGoogle Scholar
  30. 30.
    Suriadi, S., Wynn, M.T., Ouyang, C., ter Hofstede, A.H.M., van Dijk, N.J.: Understanding process behaviours in a large insurance company in australia: a case study. In: Salinesi, C., Norrie, M.C., Pastor, Ó. (eds.) CAiSE 2013. LNCS, vol. 7908, pp. 449–464. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38709-8_29CrossRefGoogle Scholar
  31. 31.
    Vázquez-Barreiros, B., Mucientes, M., Lama, M.: Mining duplicate tasks from discovered processes. In: ATAED@ Petri Nets/ACSD, pp. 78–82 (2015)Google Scholar
  32. 32.
    Verhulst, R.: Evaluating quality of event data within event logs: an extensible framework. Ph.D. thesis, Technische Universiteit Eindhoven (2016)Google Scholar
  33. 33.
    Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)CrossRefGoogle Scholar
  34. 34.
    Wang, R.Y., Storey, V., Firth, C.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Robert Andrews
    • 1
    Email author
  • Suriadi Suriadi
    • 1
  • Chun Ouyang
    • 1
  • Erik Poppe
    • 1
  1. 1.Queensland University of TechnologyBrisbaneAustralia

Personalised recommendations