To investigate the existence of different modeling styles, we apply cluster analysis to the collected PPM instances and analyze whether groups of PPM instances exhibiting similar characteristics can be identified. The applied clustering procedure is described in Sects. 6.1 and 6.2. The identified clusters are then visualized and analyzed to determine whether they indeed represent different modeling styles. To check whether the identified modeling styles persist over tasks with different characteristics, clustering is applied to two tasks with different characteristics. Results of clustering the Pre-Flight task are discussed in Sect. 6.3, while the clustering results of the NFL Draft task are discussed in Sect. 6.4.
PPM profile for clustering
First and foremost, we need a representation suited for clustering for all collected PPM instances. Based on our previous experience, we decided to focus on four aspects: the addition of content, the removal of content, reconciliation of the model, and comprehension time, i.e., the time when the modeler does not work on the process model. To also reflect that modeling is a time-dependent process, we do not just look at the total amount of modeling actions and comprehension, but on their distribution over time. We sampled every process into segments of 10 s length. For each segment, we compute its profile
\((a,d,r,c)\), i.e., the numbers \(a, d\), and \(r\) of add, delete, and reconciliation events, and the time \(c\) spent on comprehension. The profile of one PPM is the sequence \((a_1,d_1,r_1,c_1)(a_2,d_2,r_2,c_2)\ldots \) of its segments’ profiles. The \(a, d\), and \(r\) are obtained per segment by classifying each event according to Table 1. Adding a condition to an edge was considered being part of creating an edge. The comprehension time \(c\) was computed as follows. First, events were grouped to intervals, i.e., sequence of events where two consecutive events are \(\le \)1 s apart. Second, the interval duration was calculated as the time difference between its first and its last event (intervals of one activity got a duration of 1 s). Comprehension time \(c\) is calculated as the length of the segment (10 s) minus the duration of all intervals in the segment. For example, if the modeler moved activity A after 3 s, activity B after 3.5 s, and activity C after 4.2 s the comprehension time would be 8.8 s. To give all PPM profiles equal length, we normalized profiles by extending them with segments of no interaction.
Table 1 Classification of CEP’s user interactions
Performing the clustering
The PPM profiles were exported from CEP [35] and subsequently clustered using Weka.Footnote 2 The K-Means algorithm [18] utilizing an Euclidean distance measure was chosen for clustering as it constitutes a well-known means for cluster analysis. As K-Means might converge in a local minimum [13], the obtained clustering has to be validated. If the identified clusters exhibit significant differences with regard to the measures described in Sect. 3, we conclude that different modeling styles were identified. K-Means requires the number of clusters to be known a priori. Thus, we started with two expected clusters, gradually increasing the number of expected clusters. Similarly, several different values for the seed of the clustering were investigated.
Clustering of Pre-Flight task
For the first modeling task, we start by presenting the result of clustering. Then, we illustrate the clusters visually, conduct a statistical validation of the clustering, interpret their differences, and report on findings from replaying representative PPM instances.
Result of clustering
Setting the number of expected clusters to 2 resulted in only one major cluster. For a value of 3, we obtained two major clusters and one cluster of 2 PPM instances. Most promising results were achieved with a number of expected clusters of 4 and a seed of 10, returning three major clusters and one small cluster of 2 PPM instances. We considered these three major clusters for further analysis; increasing the number of expected clusters only generated further small clusters. The three major clusters comprise 42, 22, and 49 instances, called C1, C2, and C3 in the sequel.
Cluster visualization
In order to visualize the obtained clusters, we calculate the average number of adding, of deleting, and of reconciliation operations per segment for each cluster. To have a smooth representation, we also calculate the moving average of six segments, presented in Figs. 1, 2, and 3 for clusters C1, C2, and C3. The horizontal axis denotes the segments derived by sampling the PPM instances. The vertical axis indicates the average number of operations that were performed per segment. For example, a value of 0.8 for segment 9 (cf. Fig. 2) indicates that all modelers in C2 averaged 0.8 adding operations within this 10 s segment.
C1 (cf. Fig. 1) is characterized by long PPM instances, as the first time the adding series reaches 0 is after about 205 segments. Additionally, the delete series indicates more delete operations compared to the other clusters. Several fairly large spikes of reconciliation activity can be observed, the most prominent one after about 117 segments.
C2, as illustrated in Fig. 2, is characterized by a fast start as a peak in adding activity is reached after 13 segments. In general, the adding series is most of the time between 0.5 and 0.9 operations, which is higher compared to the other two clusters. The fast modeling behavior results in short PPM instances as the adding series is 0 for the first timer after about 110 segments.
At first sight, C3 (cf. Fig. 3) seems to be somewhere between C1 and C2. The adding curve is mostly situated between 0.4 and 0.7, a littler lower than for C2, but still higher compared to C1. Similar values can be observed for the reconciliation curve. The deleting curve remains below 0.1. The duration of the PPM instances is also between the duration of C1 and C2 as the adding series is 0 for the first time after about 137 segments.
Cluster validation
Next, we validated the clusters by testing whether they indeed expose significant differences.
Table 2 presents general statistics on the number of adding operations, the number of deleting operations, and the number of reconciliation operations for each cluster. Modelers in C1 carried out more add and delete operations and, most notable, almost twice as many reconciliation operations compared to C2 and C3. The numbers for C2 and C3 appear to be similar.
Table 2 Statistics per cluster Pre-Flight task
We conducted the statistical analysis as follows. If the data were normally distributed and homogeneity of variances was given, we used one-way ANOVA to test for differences between the groups. Pairwise comparisons were done using the Bonferroni post hoc test. Note that the Bonferroni post hoc test uses an adapted significance level, so that \(p\) values \(<\) 0.05 are considered to be significant; i.e., there is no need to divide the significance level by the number of groups. In case a normal distribution or homogeneity of variance was not given, a nonparametric alternative to ANOVA, i.e., Kruskall–Wallis, was utilized to test for differences between the groups. Pairwise comparisons were done using the t test for (un)equal variances (depending on the data) if a normal distribution was given. If no normal distribution could be identified, the Mann–Whitney test was utilized. In either case, i.e., t test or Mann–Whitney test, the Bonferroni correction was applied; i.e., the significance level was divided by the number of clusters.
As shown in Table 3, we observe significant differences between C1 and C2 and C1 and C3, but not between C2 and C3. Only significant differences are reported in this and all following tables.
Table 3 Significant differences for statistics Pre-Flight task
To further distill the properties of the three clusters, we calculated the measures described in Sect. 3.4 for each PPM instance. Table 4 provides an overview of the obtained average values. As indicated in Fig. 1, C1 constitutes the highest number of PPM iterations. Tightly connected to this observation is the average iteration chunk size. Modelers in C2 added by far the most content per iteration to the process model. Also, the number of iterations containing delete iterations is higher for C1 than for the other clusters. The amount of time spent on comprehending the task description and developing the plan on how to incorporate them into the process model seems to be far larger for C1 compared to C2, which has the lowest share of comprehension, but also larger compared to C3. When considering reconciliation breaks C3 sets itself apart, posting the lowest number of reconciliation breaks. C1 has the highest number of reconciliation breaks.
Table 4 Measures per cluster Pre-Flight task
The results of an statistical analysis of the differences between the groups are presented in Table 5. In contrast to the statistics presented in Table 3, we were able to identify significant differences between C2 and C3.
Table 5 Significant differences for measures Pre-Flight task
Interpretation of clusters
Our results clearly indicate that C1 can be distinguished from C2 and C3. Modelers in C1 had rather long PPM instances (cf. number of PPM iterations), spent more time on comprehension compared to C2, started rather slowly (cf. number of adding operations and chunk size), and showed a high amount of delete and reconciliation operations. This suggest that modelers in C1 were not as goal-oriented as their colleagues in other clusters, since they spent a great amount of time on comprehension, added more modeling elements which were subsequently removed, and put significantly more effort into improving the visual appearance of the model.
Focusing on C2, we observe a very steep start of the adding curve in Fig. 2, indicating that modelers started creating the process model right away. The measures described in Sect. 3.4 further indicate high chunk sizes, a low number of PPM iterations, and little comprehension time. Thus, modelers of C2 appear to be focused and goal-oriented when creating the model. They are quick in making decisions about how to proceed and only slow down from time to time for some reconciliation.
The PPM instances of C3 are shorter compared to C1 and longer compared to C2. The reconciliation curve is close to the adding curve. Notably, there is no reconciliation spike once the number of adding operations decreases. Albeit close to C2, C3 is characterized by slower and more balanced model creation (larger chunk size, higher number of iterations, more comprehension time). Thus, C3 follows a rather structured approach to modeling.
Analysis of cluster representatives
We gained further insights in the cluster differences by manually comparing representative PPM instances. Clustering with K-Means yields cluster centroids, the mean for add, delete, reconciliation, and comprehension over all PPM profiles inside a cluster. For each cluster, we have chosen the PPM instance with the smallest distance to this centroid as a representative and compared them using the replay functionality of CEP [35]. Then, we repeated the procedure with the PPM instances showing the second-smallest distance to the centroids.
The representative for C1 is very volatile in terms of speed and locality of modeling. Adding elements is done in an unsteady way with intermediate layouting, conducted in short phases. The aspect of locality relates primarily to reconciliation. The modeler frequently touched not only the last elements added, but also distant parts of the process model. These observations are largely confirmed by the second representative for C1, which further shows long reconciliation phases to gain space on the canvas.
The representative for C2 follows a rather straight, steady, and quick modeling approach. A group of elements is placed first and only later connected by edges. There is little reconciliation since the layout appears to be considered when adding elements. If applied, reconciliation refers to the last added elements only. The second representative follows the same approach until two-thirds of the model have been created. Then, it deviates with a relayouting the model to gain space on the canvas.
For C3, the representative PPM instance is also steady, but slower than those investigated for C2. At most two elements are added at a time before they get connected. Reconciliation is done continuously, but restricted locally. Model parts that are distant from the last added elements are not changed. These observations are confirmed by the second representative.
In essence, the representatives of the clusters appear to be distinguished by two aspects in particular, the steadiness of the PPM instance in terms of adding elements, and the characteristics of the reconciliation phases. The latter are characterized by their length and their locality.
Clustering of NFL Draft task
To test whether the identified clusters persist over different modeling task, we repeated the cluster analysis procedure for the second modeling task.
Result of clustering
Again, we conducted the clustering by gradually increasing the number of expected clusters and investigating different seeds. The most promising results were obtained with a seed of 30 and 5 expected clusters. We obtained three major clusters of 30, 31, and 42 PPM instances. Two smaller clusters, 4 and 8 PPM instances, were not further considered.
Cluster visualization
The cluster visualizations are presented in Figs. 4, 5, and 6, respectively.
Figure 4 pictures Cluster C1, which is characterized by long PPM instances, exhibiting a slow start and a low adding curve. The adding curve is closely followed by a reconciliation curve indicating several spikes of reconciliation and much reconciliation after the adding curve starts to decrease. The deleting curve is generally higher compared to the other clusters.
Cluster C2 (cf. Fig. 5) shows short PPM instances and a high adding curve, showing a decrease after 60 segments before reaching 0 after 77 segments. Also, there is a fast increase right at the beginning of the modeling process. The reconciliation curve follows the adding curve with some additional reconciliation at the end. The deleting curve is rather low.
Cluster C3 (cf. Fig. 6) seems to be situated between cluster C1 and cluster C2. It does not exhibit the fast start of cluster C2, but shares similarities for the deleting curve. The PPM instances in C3 are considerably shorter than those in C1, but not as short as in cluster C2. Modelers in C3 show a rather slow start. After 10 segments, the adding curve is close to 0.2, which is similar to C1, but not to C2. Afterward, C3 outperforms C1 in terms of adding elements to the process model. The reconciliation curve follows the adding curve, not showing any major spikes in reconciliation activity.
Cluster validation
The average number of adding operations, the average number of deleting operations, and the average number of reconciliation operations are presented in Table 6. As for the first modeling task, cluster C2 and cluster C3 exhibit similar values, while cluster C1 sets itself apart by the adding, deleting, and reconciliation operations. The statistical analysis illustrated in Table 7 supports this observation by indicating significant differences between C1 and C2 and C1 and C3, but not between C2 and C3.
Table 6 Statistics per cluster NFL Draft task
Table 7 Significant differences for statistics NFL Draft task
The average values retrieved by calculating the measures introduced in Sect. 3.4 are listed in Table 8. The three clusters seem to be different when it comes to chunk size and the number of PPM iterations. C2 has the lowest number of PPM iterations and the highest chunk size. C1 is on the opposite side of the spectrum posting the highest number of PPM iterations and the lowest chunk size. The average share of comprehension is similar for all clusters. In terms of reconciliation breaks, C1 has the highest value and C2 posts the lowest value. Delete iterations do not hint at any difference.
Table 8 Measures per cluster NFL Draft task
The corresponding statistical analysis is illustrated in Table 9, revealing significant differences between all clusters in terms of the number of PPM iterations. Similarly, chunk size is significantly different when comparing C1 and C2 and when comparing C2 and C3.
Table 9 Significant differences for measures NFL Draft task
Interpretation of clusters
Similar to the clusters identified for the Pre-Flight process, C1 can be distinguished from C2 and C3 (adding operations, reconciliation operations, number of PPM iterations). Again, modelers in C1 seem to be less goal-oriented and spent a lot of time on reconciliation. However, we could not identify the significant differences in terms of share of comprehension we have observed for the first modeling task.
As for cluster C2, we do obtain significant differences regarding C3 only for iteration chunk size and the number of PPM iterations. This is in line with the first modeling task and suggests that modelers in C2 were very focused on executing the modeling task.
The PPM instances in C3 are longer compared to C2, but not as long as the PPM instance in C1. Modelers in C3 do not share the high number of reconciliation operations and the high number of deleting operations with C1. The overall picture drawn for C3 is similar to the Pre-Flight task. Thus, modelers in C3 can be seen as following a balanced modeling approach that is situated between the other two clusters.
Analysis of cluster representatives
Analyzing the representative PPM instance for C1 showed that it is structured by phases in which a certain model part is added and phases in which parts of a model are reconciled. We observed long phases of layouting that mainly relate to edges. Also, at the end, the model is refactored and layouting is improved. Long adding and reconciliation phases are also visible in the second representative.
The representative for C2 showed a very quick model creation. Also, the process was steady and the rate of adding elements appears to be constant. The PPM instance features only sparse reconciliation. Reconciliation seems to be avoided by considering the model layout when adding an element. If applied, layouting focuses on the elements last added. The second representative for C2 shows very similar characteristics. The only difference is that large sets of elements are added before they get connected.
For C3, the representative PPM instance follows a steady approach, but slower than the one for cluster C2. Also, reconciliation is more prominent than for C2, whereas the reconciliation phases are shorter than observed for C1. Also, reconciliation relates to a rather large area of the canvas. The second representative follows the same approach.
These observations are largely in line with those obtained for cluster representatives for the Pre-Flight process. Again, the locality of operations appears to be important.
In sum, we were able to identify three significantly different clusters representing different modeling styles for each modeling task. Further, the cluster characteristics were similar in terms of number of adding operations, number of deleting operations, and the number of reconciliation operations for the two modeling tasks. Differences among the clusters in the number of iterations and chunk size were consistent over both modeling tasks.