Analyzing Digital Trace Data to Promote Discovery – The Case of Heatmapping

. Business Process Management and Routine Dynamics are two streams of research that both explore process. To this end, Business Process Management has developed a rich array of methods that can be used to analyze digital trace data. Routine Dynamics has put less emphasis on the analysis of digital trace data, but it has advanced a methodological approach that promotes discovery, i.e., the process that actors perform and experience as they develop novel insights. This paper argues that the analysis of digital trace data can promote the process of discovery. It uses heatmapping as a speciﬁc example to show how analyzing digital trace data can promote discovery. The paper thus emphasizes a speciﬁc way how Business Process Management and Routine Dynamics can fertilize each other.


Introduction
Organizational processes have gained vast attention by scholars in different scientific fields. Two streams of research in which process has played a key role are Routine Dynamics and Business Process Management. Routine Dynamics (RD) is a stream of research that takes a processual perspective on organizational routines [1,2], and focuses on the role of repetitive action patterns in organizations. These scholars seek to understand how routines matter for central organizational phenomena, such as change [3,4] and innovation [5]. Similarly, Business Process Management (BPM) is concerned with the identification, analysis, redesign, implementation, and monitoring of business processes [6]. In this vein, these two research streams have been described as "two islands of process research" [7,8].
However, from a methodological point of view, we can observe some differences between these islands. On one hand, BPM has developed sophisticated methods to analyze digital trace data. Trace data consist of a series of discrete events that are ordered sequentially [9]. Due to the increasing use of software tools in organizations, digital trace data are more and more available, providing many opportunities to study organizational processes. While the analysis of digital trace data also plays a role in RD research [10], most scholars in this domain have conducted ethnographic studies. These ethnographic studies typically follow a process of discovery [11][12][13]. Rather than focusing on the application of a rigorous scientific method aiming at the generation of universally "valid" insights, scholars strive to follow surprises and hunches and, thus, deviate from pre-established conceptions.
In this paper, I argue that joining the forces of BPM and RD can lead to methodological advances in analyzing digital trace data. Business Process Management provides a rich array of methods that can be used to analyze digital trace data, whereas Routine Dynamics can contribute a methodological orientation towards discovery. This paper suggests that it is valuable to bring these aspects together, i.e., promoting discovery through the analysis of digital trace data. To illustrate this argument, the paper draws on "heatmapping" as a specific method of analyzing digital trace data. A heatmap is a graphical visualization of a matrix that assigns different colors to areas according to a certain logic. For instance, higher values can take darker colors while lower values take lighter colors. In this vein, heatmaps can visualize patterns in the data and facilitate the discovery process of the researcher. In this paper, heatmapping serves as a "boundary object" to show how both streams of research can join their forces.
The data set used for illustration is derived from a larger study of software development teams in a medium sized high-tech manufacturing company, which develop software tools to control complex cabin-like machines [14,15]. The software development teams enact routines based on the Scrum framework, which splits software development into distinct phases of two weeks (i.e., sprints), and defines events, rules, and roles that support the sprints [16]. The study combines ethnographic field work (i.e., observation of the teams in conjunction with interviews and documents) with digital trace data extracted from a software tool that the teams use to organize their work. This paper uses a sub-set of the digital trace data that covers a period of two years. The data set consists of approximately 41.000 actions. The aim of this paper is not to produce novel insights into organizational processes, but to illustrate how the analysis of digital trace can promote discovery.

Discovery in Routine Dynamics and Business Process Management Research
From a methodological point of view, detailed, ethnographic studies play a key role in RD research. Scholars have performed such ethnographic studies in a variety of different contexts such as student housing organizations [17], start-ups [18] or artificial intelligence labs [19]. In this vein, Dittrich [11] identifies 45 different ethnographic papers in the RD field. The analysis of ethnographic data does not only entail the application of suitable methods to pre-specified questions, but it involves a process of discovery [13]. By discovery I refer to the process that actors perform and experience as they develop novel insights. Dittrich [11, p. 3] characterizes the process of discovery, stating that " [e]thnographers are open to surprises in the field and they work with these surprising observations to develop new ideas, concepts and/ or explanations." As scholars engage with a phenomenon, they see things that they did not expect, and they are subsequently drawn towards these surprising or puzzling things. While most Routine Dynamics scholars have performed ethnographic studies, some scholars have also analyzed (digital) trace data. Pentland and Feldman [20] introduce the notion of narrative networks. In a narrative network, the nodes represent actions, and the edges represent sequential relations between those actions. Narrative networks can be used to compare organizational processes across time and space. While narrative networks can be used to analyze ethnographic data [18,21], they are also useful for analyzing (digital) trace data [22][23][24]. Based on the notion of narrative networks, Goh and Pentland [25] analyze the paths of how routines are and could be performed. Identifying such paths enables unpacking the dynamics of a process by tracing how paths change over time [26].
In Business Process Management the analysis of digital trace data has been a more central focus. As Wurm, Grisold, Mendling and vom Brocke [7] note, this stream of research is closely connected to the emergence and development of software tools that accumulate digital trace data. As a consequence, a variety of methods, such as algorithms [27] or process modelling [28], has been developed that can be used to analyze digital trace data. BPM research, however, has put less emphasis on how using such methods can drive the process of discovery as defined above. In BPM, the term "discovery" seems to be used in a different sense. The business process management lifecycle [6], for instance, includes "process discovery" as a key phase, but here discovery means the identification of "descriptive information about how the process is performed" [29, p. 5494], and not the process of following hunches and surprises. In BPM, discovery refers to the technical methods that can inductively be used to identify a "unique underlying process model" [30, p. 56] in a complex data set, while in RD discovery refers to the process that actors perform and experience as they develop novel insights, which is shaped by hunches and surprises. This paper follows the latter definition of the term. Notably, both aspects are interrelated, as BPM methods can be used to promote the discovery process as conceived in RD.
Visualizations play a key role in analyzing digital trace data. Even though puzzles can become apparent from observation per se (e.g., observing a situation that does not fit the theory during fieldwork) or coding [12], visualizations are useful because they can bring a specific point of view to the foreground and reduce complexity. While visualizations can also be constructed from ethnographic data, they are a key method of identifying patterns in processes represented by digital trace data. Goh and Pentland [25], for instance, visualize how paths change over the sprints of an agile project. This enabled them to identify different "phases of patterning" in the project. Similarly, BPM scholars have developed various visualizations such as different kinds of process diagrams [31], petri nets [32] or heatmaps [33][34][35][36].

Heatmapping
A heatmap is a graphical visualization of a matrix that assigns different colors to areas according to a certain logic. For instance, higher values attain darker colors while lower values attain lighter colors. Figure 1 shows such a heatmap. In this example, the matrix represents transition probabilities between actions that make up the Scrum routines of a software development team. The library (i.e., the set of actions considered) consists of 51 distinct actions. Each cell represents the probability that a specific action X follows another action Y. As shown in the legend on the right-hand side, transition probabilities of 10% and below are colored light blue, probabilities of 30% and above are colored dark blue, and probabilities between 10 and 30% are distributed on the color spectrum ranging from light to dark blue. For instance, the red-framed row indicates actions that follow the assignment of an issue (e.g., a software bug or a new feature requirement) to a specific software developer in the software tool. The heatmap shows that there are two actions that typically follow assigning, i.e., developing a resolution for the issue and indicating that the issue has been resolved. This insight can facilitate discovery, as we strive to understand these patterns. In this case, the reason why assigning an issue was followed by resolving it was that the developers assigned their own account to an issue before they started to resolve it, so that other developers knew that the issue was "reserved". This prevented that multiple developers simultaneously worked on the same issue. When a developer was done resolving an issue, he assigned a tester that set the resolution of the issue to "fixed" if it satisfied his requirements. This explains why the assignment of an actor (in this case a tester) was oftentimes followed by the indication of the resolution.

How to Use Heatmapping to Promote Discovery?
As the prior example reveals, heatmapping (as any other type of visualization) requires a profound sensitivity towards the data. In the following, I describe how to use heatmapping in studying organizational processes. Complementary instructions for the analysis of digital trace data have been provided elsewhere [6,10,37].
Step 1: Defining the Lexicon. The first step in the analysis is the definition of the set of actions to be considered (i.e., lexicon). The 51 distinct actions shown above are a choice of the researcher. I selected a lexicon that closely resembles how the actors in the field see the Scrum routines. This was possible due to the deep contextualization of the digital trace data through fieldwork and interviews. A larger lexicon has more potential to facilitate the discovery process, because it provides more opportunities to ask questions about the relationships between actions. However, when we compare multiple heatmaps across time and space this process can easily be overwhelming. A smaller heatmap reduces complexity and helps to mitigate this challenge. Hence, there is a need to balance richness and complexity in a way that it promotes discovery.
Step 2: Calculating the Transition Matrix. Heatmapping requires a matrix that can be transformed into a heatmap visualization. I used a transition matrix that shows how actions follow each other. To compute the matrix I used the "TraMineR" package in R [38], and to create the heatmaps I used the "ComplexHeatmap" package [39]. However, we could also apply heatmapping to other applications, such as a matrix that shows how different actors or roles interact with each other. Yeshchenko et al. [33], for example, use heatmaps to detect process drift. Burattin, Kaiser, Neurauter and Weber [35] use heatmaps to visualize eye tracking results in process modelling.
Another important decision is whether to use transition probabilities or absolute transitions. Absolute transitions sum up how often an action X follows an action Y, while transition probabilities represent the probability that an action X follows an action Y. Transition probabilities can be biased through rare values. For example, when an action only occurs one times in the data set, there is only one other action that follows, leading to a transition probability of 100%. On the other side, absolute transitions make it difficult to compare processes across time and space when the compared matrices differ substantially in the total number of actions. Figure 2 shows two heatmaps for the same data set. The heatmap on the left-hand side is based on transition probabilities and the heatmap on the right-hand side is based on absolute transitions. As shown by the red frame on the left-hand side, rare values-in this case changing the project assigned to an issue, which only occurs 4 times in the data set-signify high transitions probabilities. On the right-hand side, the absolute count of describing an issue (n = 2928) leads to highlights that do not occur according to probabilities. However, we can also use these sensitivities to ask new questions about the process, for instance, whether transition probabilities of low or high frequency events are significant for process dynamics.
Step 3: Defining the Color Scheme. The definition of the color scheme is another crucial step in the process, because the heatmap visualization is influenced by the color thresholds. Figure 3 illustrates three different thresholds for the same process. In the heatmap on the left-hand side, colors become darker starting from transition probabilities of 1%, in the middle 10%, and on the left-hand side 20%. A low threshold is sensitive towards small transition probabilities but can also be overwhelmingly complex. A higher threshold supports the identification of more typical transitions in the action pattern. Here, heatmapping plays out one of its core strengths because it is possible to adjust the sensitivity of the heatmap in a way that accounts for the question at hand. I selected only one color with different shades, which is helpful to highlight frequent transitions, but it is also possible to use a color gradient across multiple colors, which may lead to different insights.
Step 4: Comparing Across Time and Space. Finally, we can compare heatmaps across time and space to generate new insights into process dynamics. For example, we could split the data set into different time frames and create heatmaps for each time frame. Subsequently, it is possible to compare how the heatmaps (and thus the process) change over time. In this sense, heatmapping can be used to better understand how processes change over time-a key question in routine dynamics research [40,41].
Alternatively, we can compare how a process is enacted in different places. This can be relevant for different questions such as the role of context for routine enactment [42] or routine replication [43,44]. BPM scholars have also emphasized the need of considering  [7,45]. Figure 4 shows the differences between transition probabilities of an old and a new software development team. The matrix reveals that the routines in both teams were not similar, even though the novel team replicated the routines of the old team. We can use the heatmap to examine how the routines in both teams differed. For instance, the red highlight in the figure signifies a difference in how creating a task issue (a specific kind of issue as opposed to bugs and stories) was followed by describing the issue. Following this surprising puzzle revealed that only the new team used this specific type of issue. This leads to further questions, such as why this was the case. In this vein, the heatmap drives discovery.

Conclusion
This paper suggests that there is ample potential for research on Business Process Management and Routine Dynamics to learn from each other. It has focused on a specific aspect, that is, the combination of methods for analyzing digital trace data with a methodological orientation toward discovery. Put differently, this paper suggests the fusion of BPM methods and RD methodology. In this vein, it addresses scholars that have emphasized the synergetic potential of both streams of research [7,29,46,47].

Promoting Discovery in Business Process Management
BPM research is "interested in designing, analyzing, monitoring and improving business process work in organizations" [7, p. 4]. However, the methods developed in this stream of research can also be used to facilitate discovery. This paper shows how this is possible with heatmapping, but most other methods can serve similar functions. Considering discovery can potentially shift the questions asked when developing new methods and tools to manage business processes. While the key questions in BPM seem to focus on methods that produce valid results, this paper suggests adding another question: how can we support discovery through BPM methods and tools?
A holistic approach to BPM, both, involves a technical and a social dimension. On one hand, scholars need to develop deep insights into methods and tools that can be used to identify and manage business processes. On the other hand, BPM scholars need to consider how people (e.g., managers and researchers) apply these methods and tools in their everyday work practices. This argumentation aligns with scholars who have suggested a social approach to BPM [48,49]. This paper advances this line of argumentation, by emphasizing the need to connect the technical methods that have been at the core of BPM with the notion of discovery, which represents a process that unfolds in social practice. There are many possibilities to further expand on this line of argumentation, for example, by analyzing which BPM methods managers and scholars use to promote discovery, how they iterate between different methods as they engage in discovery, or whether different methods can promote discovery in different ways.

Enriching the Repertoire of Methods in Routine Dynamics
Discovery is a key process in RD research, but it has mostly been emphasized in the context of ethnographic studies. This paper suggest that the analysis of digital trace data can also be a driver of discovery; like coding field notes and interviews it is possible to apply methods for the analysis of digital trace data. The resulting metrics and visualizations can then lead to new questions that lead to the application of different methods, propelling the discovery process forward. The analysis of digital trace data can most likely lead to different questions than the analysis of fieldnotes and interview data because digital trace data oftentimes extend across longer periods of time and across multiple locations [10]. When we apply multiple methods (e.g., analysis of digital trace data, field notes, and interview data) in conjunction, it is possible to further reinforce the momentum of discovery.
RD research has developed some methods that can help analyze (digital) trace data in a way that promotes discovery, such as narrative networks [20,22], path-based analysis [25], or analysis based on clock-and event-time [50]. BPM research also provides many tools that could be used to analyze digital trace data in a way that promotes discovery [6,31,33]. This paper complements this set of methods by adding heatmapping to the repertoire of possible methods.
A common way of visualizing the results are network graphs, that display the nodes and edges of a narrative network. However, there are notable differences between network graphs and heatmaps. Network graphs play out their strengths when we strive to understand the overall structure of a routine pattern and how this structure changes over time, or when we want to get a visual impression of the complexity of a routine. By contrast, heatmaps (how they have been applied in this paper) foreground differences in transitions between actions. This can be useful for Routine Dynamics research, because (a) heatmaps help to see more likely transitions (i.e., which action does typically come next?), (b) the nuances of color differences help generate a more fine-grained picture of the routine pattern, and (c) heatmaps also visualize transitions that have not happened, i.e., the "in-between" or "white" space [23]. Moreover, heatmaps could be used beyond visualizing transitions between actions. For example, we could visualize transitions between routines or degrees of variation of routines in a routine ecology. Depending on how heatmaps are applied, they may guide the discovery process in different directions.

Driving Discovery Through Visualization
This paper has elaborated heatmapping as a specific way of analyzing and visualizing organizational processes to promote the process of discovery. The specific examples from the data reveal that discovery depends on how visualizations are crafted. Hence, visualizations play an important role for the discovery process, that goes beyond structuring and representation of results. This argument echoes Feldman [51], who argues that making process visible requires different approaches to visualization. She also explains that different visualizations offer different points of view on a phenomenon and may influence theorizing in different ways. This paper adds to this line of argumentation by emphasizing that visualizations can play a fundamental role in driving discovery.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.