## Abstract

The increasing volume and value of data is an important enabler for data science. In this study, we consider the event data, i.e. information on things that happen in organizations, machines, systems and people’s lives. Each event refers to a well-defined activity in a certain business process execution, the resource (i.e. person or device) executing or initiating the activity, the timestamp of the event, as well as to various data elements recorded with the event (e.g. the geo-location of an activity). Process mining aims to analyze event data, in order to mine knowledge that can contribute to improving a business process behavior. In particular, the focus of this study is on organizational mining, that is a sub-field of process mining that aims at understanding the life cycle of a dynamic organizational structure (i.e. a configuration of organization units) and the interactions among co-workers (resources) arising from the analysis of real-world event logs. The innovative contribution of this study is that the organizational mining goal is here achieved by combining concepts from process mining, stream mining and social network analysis. This combination is an original contribution of this study, not still explored in organizational mining field. In an assessment, benchmark event data are explored, in order to understand how the presented solution allows us to identify the life cycle a dynamic organizational structure.

### Similar content being viewed by others

### Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.## Notes

Alternative metrics, e.g. handover of work or subcontracting (Song and van der Aalst 2008), can be equally considered to determine relationships between resources, without changing the general contribution of the theory described in this study.

Song and van der Aalst (2008) describe the use of several distance/similarity measures (e.g. Minkowski distance, Hamming distance, Pearson’s correlation coefficient), in order to quantify the “weight” associated with the arcs of a resource social network.

The idea of discovering overlapping communities by processing the linear network associated with a social network is mainly based on the considerations reported in Evans and Lambiotte (2010), which can be easily applied to the setting in this study. In fact, although resources may also belong to various organization units simultaneously, the arcs between them represent, in this formulation, a single type of interaction. This makes it reasonable to discover disjoint communities when the focus is on networks representing these interactions in the nodes. On the other hand, transforming detected linear communities into resource communities would naturally represent overlaps.

The life cycle discovery algorithm is independent of the algorithm used on-line, in order to discover instantaneous organizational structures of the business process under analysis.

Similar behavior can be observed by performing this pairwise comparison for the overlapping and disjoint organization structures discovered with

*γ*= 0.5,1,1.5 and 2.

## References

Appice, A., & Malerba, D. (2015). A co-training strategy for multiple view clustering in process mining.

*IEEE Transactions on Services Computing*PP(99).Appice, A., Pietro, M.D., Greco, C., & Malerba, D. (2016). Discovering and tracking organizational structures in event logs. In M. Ceci, C. Loglisci, G. Manco, E. Masciari & Z.W. Ras (Eds.),

*New Frontiers in Mining Complex Patterns - 4th International Workshop, NFMCP 2015, Held in Conjunction with ECML-PKDD 2015, Revised Selected Papers, Springer, Lecture Notes in Computer Science*(Vol. 9607, pp. 46–60).Aynaud, T., Blondel, V.D., Guillaume, J.L., & Lambiotte, R. (2013).

*Multilevel Local Optimization of Modularity*(pp. 315–345). John Wiley and Sons, Inc.Blondel, V., Guillaume, J.L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks.

*Journal of Statistical Mechanics: Theory and Experiment*,*10*, P10008.Clauset, A., Newman, M.EJ., & Moore, C. (2004). Finding community structure in very large networks.

*Physical Review E*,*70*(6), 1–6.Dhouioui, Z., & Akaichi, J. (2014). Tracking dynamic community evolution in social networks. In X. Wu, M. Ester & G. Xu (Eds.),

*2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, IEEE Computer Society*(pp. 764–770).Evans, T., & Lambiotte, R. (2010). Line graphs of weighted networks for overlapping communities.

*The European Physical Journal B*,*77*(2), 265–272.Ferreira, D.R., & Alves, C. (2012). Discovering user communities in large event logs. In F. Daniel, K. Barkaoui & S. Dustdar (Eds.),

*Business Process Management Workshops - BPM 2011 International Workshops, Revised Selected Papers, Part I, Springer, Lecture Notes in Business Information Processing*(Vol. 99, pp. 123–134).Gaber, M.M., Zaslavsky, A., & Krishnaswamy, S. (2005). Mining data streams: a review.

*ACM SIGMOD Record*,*34*(2), 18–26.Greene, D., Doyle, D., & Cunningham, P. (2010). Tracking the evolution of communities in dynamic social networks,

*ASONAM 2010 (pp. 176–183).*Hilbert, M., & Lopez, P. (2011). The world’s technological capacity to store, communicate, and compute information. science.

*Science*,*332*(6025), 60–65.Lei, T., & Huan, L (2010).

*Community Detection and Mining in Social Media*. Morgan and Claypool Publishers.Nguyen, N.P., Dinh, T.N., Shen, Y., & Thai, M.T. (2014). Dynamic social community detection and its applications.

*PLOS One*, 9(4):Open Access.Oliveira, M.DB., Guerreiro, A., & Gama, J. (2014). Dynamic communities in evolving customer networks: an analysis using landmark and sliding windows.

*Social Netw Analys Mining*,*4*(1), 208.Palla, G., Pollner, P., Barabási, A. L., & Vicsek, T. (2009). Social group dynamics in networks. In T. Gross & H. Sayama (Eds.),

*Adaptive Networks: Theory, Models and Applications*(pp. 11–38). Springer Berlin Heidelberg.Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection.

*Physical Review E*,*74*(1), 016,110.Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.

*J Comput Appl Math*,*20*(1), 53–65.Saravanan, M., & Rama Sree, R. (2011). Process mining in dyeing unit using control flow perspective: A case study.

*Data Mining and Knowledge Engineering*,*3*(6), 351–356.Shen, H., Cheng, X., Cai, K., & Hu, M. (2009). Detect overlapping and hierarchical community structure in networks.

*Physica A*,*388*(2009), 3888:1706–1712.Song, M., & van der Aalst, W.M.P. (2008). Towards comprehensive support for organizational mining.

*Decision Support Systems*,*46*(1), 300–317.Song, M., G˙unther, C.W.,amp; van der Aalst, W.M.P. (2009). Trace clustering in process mining. In D. Ardagna, M. Mecella & J. Yang (Eds.),

*Business Process Management Workshops, BPM 2008 International Workshops, Revised Papers, Springer, Lecture Notes in Business Information Processing*(Vol. 17, pp. 109–120).Spiliopoulou, M (2011). Evolution in social networks: A survey,

*Social Network Data Analytics, Springer US (pp. 149–175)*.Sunindyo, W.D., Moser, T., Winkler, D., & Biffl, S (2010).

*Process analysis and organizational mining in production automation systems engineering*. Tech. rep.van der Aalst, W.M.P. (2011).

*Process mining - discovery, conformance and enhancement of business processes*. Springer.van der Aalst, W.M.P. (2014). No knowledge without processes - process mining as a tool to find out what people and organizations really do. In J. Filipe, J.L.G. Dietz & D. Aveiro (Eds.),

*Proceedings of the International Conference on Knowledge Engineering and Ontology Development, KEOD 2014, SciTePress*(pp IS–11).van der Aalst, W.M.P. (2016).

*Process mining - data science in action*, 2nd Edition. Springer.van der Aalst, W.M.P., & Song, M (2004). Mining social networks: Uncovering interaction patterns in business processes,

*BPM 2004 (Vol. 3080, pp. 244–260). Springer: LNCS.*van der Aalst, W.M.P., Reijers, H.A., & Song, M (2005). Discovering social networks from event logs.

*Computer Supported Cooperative Work*,*14*(6), 549–593.van Zelst, S.J., van Dongen, B.F., & van der Aalst, W.M.P. (2015). Know what you stream: Generating event streams from CPN models in prom 6. In F. Daniel & S. Zugal (Eds.), P

*roceedings of the BPM Demo Session 2015 Co-located with the 13th International Conference on Business Process Management (BPM 2015), CEUR-WS.org, CEUR Workshop Proceedings*(Vol. 1418, pp. 85–89.Ward, J. Jr. (1963). Hierarchical grouping to optimize an objective function.

*Journal of the American Statistical Association*,*58*(301), 236–244.

## Acknowledgments

This work fulfills the research objectives of the the project MAESTRA “Learning from Massive, Incompletely annotated, and Structured Data” (Grant number ICT-2013-612944) funded by the European Commission, as well as the ATENEO 2014 project “Mining of network data” funded by the University of Bari Aldo Moro. The authors wish to thank Marco Di Pietro and Claudio Greco for their support in developing the software and Lynn Rudd for her help in reading the manuscript.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix A

### Appendix A

Let us start from the formulation of Reichardt-Bornholdt measure \(\mathcal {RB}(\mathcal {C_{L}})=\frac {1}{2m} \sum \nolimits _{i,j\in \mathcal {N_{L}}}{\left [(A_{ij}-\gamma \frac {deg^{in}(i)deg^{out}(j)}{2m})\delta (C_{\mathcal {L}}^{i},C_{\mathcal {L}}^{j})\right ]} \) as reported in Formula 5. Let us consider that the Kronecker function \(\delta (C_{\mathcal {L}}^{i},C_{\mathcal {L}}^{j})\) can also be written as \(\delta \left (C_{\mathcal {L}}^{i},C_{\mathcal {L}}^{j}\right )= \sum \limits _{C_{\mathcal {L}}^{h} \in \mathcal {C_{L}}}{\delta \left (C_{\mathcal {L}}^{i}, C_{\mathcal {L}}^{h}\right )\delta \left (C_{\mathcal {L}}^{j}, C_{\mathcal {L}}^{h}\right )}\), where *δ*(*X*,*Y*)=1 1 iff *X* = *Y*, 0 otherwise. Therefore, \(\mathcal {RB}(\mathcal {C_{L}})\) can be rewritten as follows:

Introducing the following notation:

the Reichardt-Bornholdt measure can be written as \(\mathcal {RB}= \sum \limits _{C_{\mathcal {L}}^{h} \in \mathcal {C}_{L}}{\left (e_{hh}-\gamma a_{h}^{in}a_{h}^{out}\right )}\), as reported in Formula 6.

## Rights and permissions

## About this article

### Cite this article

Appice, A. Towards mining the organizational structure of a dynamic event scenario.
*J Intell Inf Syst* **50**, 165–193 (2018). https://doi.org/10.1007/s10844-017-0451-x

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10844-017-0451-x