Visualizing and exploring event databases: a methodology to benefit from process analytics

Abstract

Events, routinely broadcasted by news media all over the world, are captured and get recorded to event databases in standardized formats. This wealth of information can be aggregated and get visualized with several ways, to result in alluring illustrations. However, existing aggregation techniques tend to consider that events are fragmentary, or that they are part of a strictly sequential chain. Nevertheless, events’ occurrences may appear with varying structures (i.e., others than sequence), reflecting elements of a larger, implicit process. In this work, we propose a methodology that will support analysts to get richer insights from event datasets by enabling a process perspective. Through a case study about a political phenomenon, we provide concrete recommendations on data reviewing, process discovery, and visually facilitated interpretations. We furthermore discuss the methodological and epistemological aspects that are needed to make our approach applicable for event analytics.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view. Springer, Berlin

    Google Scholar 

  2. Adriansyah A, Buijs JCAM (2012) Mining process performance from event logs: the BPI challenge 2012. Case Study BPM Center Report BPM-12-15. BPMcenter.org

  3. Best RH, Carpino C, Crescenzi MJ (2013) An analysis of the TABARI coding system. Confl Manag Peace Sci 30(4):335–348

    Article  Google Scholar 

  4. Bose RJC, van der Aalst WM (2009) Context aware trace clustering: towards improving process mining results. In: SDM, SIAM, pp 401–412

  5. Bose RJC, van der Aalst WM (2012) Process diagnostics using trace alignment: opportunities, issues, and challenges. Information Systems 37(2):117–141 (Management and engineering of process-aware information systems)

    Article  Google Scholar 

  6. Broström G (2012) Event history analysis with R. CRC Press, Boca Raton

    Google Scholar 

  7. Celonis (2017) Academic cloud. https://academiccloud.celonis.com. Accessed 25 Sept 2017

  8. Ching WK, Huang X, Ng MK, Siu TK (2013) Higher-order markov chains. Springer, Boston, pp 141–176

    Google Scholar 

  9. De Leoni M, van der Aalst WM, Dees M (2014) A general framework for correlating business process characteristics. In: International conference on business process management, Springer, pp 250–266

  10. Delias P, Kazanidis I (2017) Process analytics through event databases: potentials for visualizations and process mining. In: Linden I, Liu S, Colot C (eds) Decision support systems VII. Data, information and knowledge visualization in decision support systems, vol 282, Springer International Publishing, Cham, pp 88–100. https://doi.org/10.1007/978-3-319-57487-5_7

  11. Delias P, Doumpos M, Matsatsinis N (2015a) Business process analytics: a dedicated methodology through a case study. EURO J Decis Process 3(3–4):357–374. https://doi.org/10.1007/s40070-015-0050-4

    Article  Google Scholar 

  12. Delias P, Grigori D, Mouhoub ML, Tsoukias A (2015b) Discovering characteristics that affect process control flow. In: Decision support systems IV—information and knowledge management in decision processes, Springer, pp 51–63

  13. Fails JA, Karlson A, Shahamat L, Shneiderman B (2006) A visual interface for multivariate temporal data: finding patterns of events across multiple histories. In: 2006 IEEE symposium on visual analytics science and technology, IEEE, pp 167–174

  14. Galili T (2015) dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics 31:3718–3720

    Article  Google Scholar 

  15. Gerner DJ, Schrodt PA, Francisco RA, Weddle JL (1994) Machine coding of event data using regional and international sources. Int Stud Q 38(1):91–119

    Article  Google Scholar 

  16. Gerner DJ, Schrodt PA, Yilmaz O, Abu-Jabr R (2002) Conflict and mediation event observations (cameo): a new event data framework for the analysis of foreign policy interactions. International Studies Association, New Orleans

    Google Scholar 

  17. Glaser BG (1978) Theoretical sensitivity: advances in the methodology of grounded theory. Sociology Press, Mill Valley (oCLC: 926199357)

    Google Scholar 

  18. Gotz D, Stavropoulos H (2014) DecisionFlow: visual analytics for high-dimensional temporal event sequence data. IEEE Trans Vis Comput Graph 20(12):1783–1792

    Article  Google Scholar 

  19. Gotz D, Wongsuphasawat K (2012) Interactive intervention analysis. In: AMIA annual symposium proceedings, American Medical Informatics Association, Washington, DC, USA 2012, pp 274–280

  20. Gotz D, Wang F, Perer A (2014) A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. J Biomed Inf 48:148–159

    Article  Google Scholar 

  21. Günther CW, Rozinat A, van der Aalst WM (2009) Activity mining by global trace segmentation. In: International conference on business process management, Springer, pp 128–139

  22. Gupta A, Jain R (2011) Managing event information: modeling, retrieval, and applications. Synth Lect Data Manag 3(4):1–141

    Article  Google Scholar 

  23. Jiang L, Mai F (2014) Discovering bilateral and multilateral causal events in GDELT. In: International conference on social computing, behavioral-cultural modeling, and prediction

  24. Keertipati S, Savarimuthu BTR, Purvis M, Purvis M (2014) Multi-level analysis of peace and conflict data in GDELT. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis, ACM, p 33

  25. Kwak H, An J (2016) Two tales of the world: Comparison of widely used world news datasets GDELT and EventRegistry. arXiv preprint arXiv:1603.01979

  26. Leetaru K, Schrodt PA (2013) GDELT: global data on events, location and tone, 1979–2012. resreport, International Studies Association, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Champaign, USA. http://data.gdeltproject.org/documentation/ISA.2013.GDELT.pdf. Accessed 25 Sept 2017

  27. Liu Z, Wang Y, Dontcheva M, Hoffman M, Walker S, Wilson A (2017) Patterns and sequences: interactive exploration of clickstreams to understand common visitor paths. IEEE Trans Vis Comput Graph 23(01):321–330

    Article  Google Scholar 

  28. Maggi FM, Mooij AJ, van der Aalst WM (2011) User-guided discovery of declarative process models. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), IEEE, pp 192–199

  29. Mannhardt F, de Leoni M, Reijers HA, van der Aalst WM, Toussaint PJ (2016) From low-level events to activities-a pattern-based approach. In: International conference on business process management, Springer, pp 125–141

  30. Martjushev J, Bose RJC, van der Aalst WM (2015) Change point detection and dealing with gradual and multi-order dynamics in process mining. In: International conference on business informatics research, Springer, pp 161–178

  31. McClelland CA (1961) The acute international crisis. World Polit 14(01):182–204

    Article  Google Scholar 

  32. McClelland CA (1976) World event/interaction survey codebook. ICPSR, Ann Arbor

    Google Scholar 

  33. Nguyen H, Dumas M, La Rosa M, Maggi FM, Suriadi S (2014) Mining business process deviance: a quest for accuracy. In: OTM confederated international conferences “On the move to meaningful internet systems”, Springer, pp 436–445

  34. Nguyen H, Dumas M, ter Hofstede AH, La Rosa M, Maggi FM (2016) Business process performance mining with staged process flows. In: International conference on advanced information systems engineering, Springer, pp 167–185

  35. O’Brien SP (2010) Crisis early warning and decision support: contemporary approaches and thoughts on future research. Int Stud Rev 12(1):87–104

    Article  Google Scholar 

  36. Pesic M, Schonenberg H, van der Aalst WM (2007) Declare: full support for loosely-structured processes. In: Enterprise distributed object computing conference, 2007. EDOC 2007. 11th IEEE international, IEEE, pp 287–287

  37. Peuquet DJ, Robinson AC, Stehle S, Hardisty FA, Luo W (2015) A method for discovery and analysis of temporal patterns in complex event data. Int J Geogr Inf Sci 29(9):1588–1611

    Article  Google Scholar 

  38. Phua C, Feng Y, Ji J, Soh T (2014) Visual and predictive analytics on singapore news: experiments on GDELT, wikipedia, and \(^{\wedge }\)sti. CoRR arXiv:1404.1996

  39. Roy B (1994) On operational research and decision aid. Eur J Oper Res 73(1):23–26

    Article  Google Scholar 

  40. Scholz M (2016) R package clickstream: analyzing clickstream data with markov chains. J Stat Softw 74(4):1–17

    Article  Google Scholar 

  41. Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11(2):33

    Article  Google Scholar 

  42. Song M, Günther CW, van der Aalst WM (2008) Trace clustering in process mining. In: International conference on business process management, Springer, pp 109–120

  43. Studer M, Ritschard G (2015) What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J R Stat Soc Ser A 179(2):481–511

    Article  Google Scholar 

  44. Tax N, Sidorova N, van der Aalst WM, Haakma R (2016a) Heuristic approaches for generating local process models through log projections. In: 2016 IEEE symposium series on computational intelligence (SSCI), IEEE

  45. Tax N, Sidorova N, Haakma R, van der Aalst WM (2016b) Mining local process models. J Innov Digit Ecosyst 3(2):183–196

    Article  Google Scholar 

  46. Thaler T, Ternis SF, Fettke P, Loos P (2015) A comparative analysis of process instance cluster techniques. In: Wirtschaftsinformatik proceedings 2015, Osnabrück, pp 423–437

  47. van Beest NR, Dumas M, García-Bañuelos L, La Rosa M (2015) Log delta analysis: interpretable differencing of business process event logs. In: International Conference on Business Process Management, Springer, pp 386–405

  48. van Dongen B, Weber B, Ferreira D, De Weerdt J (2013) Proceedings of the 3rd business process intelligence challenge (co-located with 9th international business process intelligence workshop, BPI 2013, Beijing, China, August 26, 2013)

  49. van der Aalst WM (2016) Process mining: data science in action, 2nd edn. Springer, Berlin. https://doi.org/10.1007/978-3-662-49851-4

    Google Scholar 

  50. van der Aalst WM, Schonenberg MH, Song M (2011) Time prediction based on process mining. Inf Syst 36(2):450–475

    Article  Google Scholar 

  51. van der Aalst WM, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192

    Article  Google Scholar 

  52. van der Aalst WM, Low WZ, Wynn MT, ter Hofstede AH (2015) Change your history: learning from event logs to improve processes. In: 2015 IEEE 19th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 7–12

  53. van der Heijden T (2012) Process mining project methodology: developing a general approach to apply process mining in practice. Master Thesis, Technische Universiteit Eindhoven, Eindhoven. http://alexandria.tue.nl/extra2/afstversl/tm/van_der_Heijden_2012.pdf. Accessed 25 Sept 2017

  54. Venkatachalam B, Apple J, St John K, Gusfield D (2010) Untangling tanglegrams: comparing trees by their drawings. IEEE/ACM Trans Comput Biol Bioinform 7(4):588–597

    Article  Google Scholar 

  55. Vrotsou K, Johansson J, Cooper M (2009) Activitree: interactive visual exploration of sequences in event-based data using graph similarity. IEEE Trans Vis Comput Graph 15(6):945–952

    Article  Google Scholar 

  56. Ward MD, Beger A, Cutler J, Dickenson M, Dorff C, Radford B (2013) Comparing GDELT and ICEWS event data. Analysis 21:267–297

    Google Scholar 

  57. Wiesche M, Jurisch MC, Yetton PW, Krcmar H (2017) Grounded theory methodology in information systems research. MIS Q 41(3):685–701

    Article  Google Scholar 

  58. Wongsuphasawat K, Gotz D (2012) Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Trans Vis Comput Graph 18(12):2659–2668

    Article  Google Scholar 

  59. Wongsuphasawat K, Plaisant C, Taieb-Maimon M, Shneiderman B (2012) Querying event sequences by exact match or similarity search: design and empirical evaluation. Interact Comput 24(2):55–68

    Article  Google Scholar 

  60. Xu J, Wickramarathne TL, Chawla NV (2016) Representing higher-order dependencies in networks. Sci Adv 2(5):e1600028

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank our graduate students Zafeiris Papavaritis and Christianna Pantermali who spent many hours in checking every event of the original dataset for relevance, and who manually filtered them out.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pavlos Delias.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Delias, P., Zoumpoulidis, V. & Kazanidis, I. Visualizing and exploring event databases: a methodology to benefit from process analytics. Oper Res Int J 19, 887–908 (2019). https://doi.org/10.1007/s12351-018-00447-z

Download citation

Keywords

  • Event data
  • Process mining
  • Process analytics