The Influence of Using Collapsed Sub-processes and Groups on the Understandability of Business Process Models

Many factors influence the creation of business process models which are understandable for a target audience. Understandability of process models becomes more critical when size and complexity of the models increase. Using vertical modularization to decompose such models hierarchically into modules is considered to improve their understandability. To investigate this assumption, two experiments were conducted. The experiments involved 2 large-scale real-life business process models that were modeled using BPMN v2.0 (Business Process Model and Notation) in the form of collaboration diagrams. Each process was modeled in 3 modularity forms: fully-flattened, flattened where activities are clustered using BPMN groups, and modularized using separately viewed BPMN sub-processes. The objective was to investigate if and how different forms of modularity representation (used for vertical modularization) in BPMN collaboration diagrams influence the understandability of process models. In addition to the forms of modularity representation, the presentation medium (paper vs. computer) and model reader’s level of business process modeling competency were investigated as factors that potentially influence model comprehension. 60 business practitioners from a large organization and 140 graduate students participated in our experiments. The results indicate that, when these three modularity representations are considered, it is best to present the model in a ‘flattened’ form (with or without the use of groups) and in the ‘paper’ format in order to optimally understand a BPMN model. The results also show that the model reader’s business process modeling competency is an important factor of process model comprehension.

make use of process models, their target users should be able to comprehend them .
Process model understandability (or comprehension) can be defined as the degree to which information contained in a process model can be easily understood by a reader of that model . It is typically associated with the ease of use and effort required for reading and correctly interpreting a process model (Houy et al. 2014). Correct interpretation of business process models is particularly important when they are used for supporting communication and creating a collective understanding of the processes and functionality of software systems supporting them (Krogstie 2016).
The increasing complexity of real-life processes leads to an increase in size and complexity of the models that represent them. These two factors are known to impair understandability (Sanchez-Gonzalez et al. 2012;Recker 2012). Hierarchy resulting from the use of sub-processes has widely been considered as a practical means of dealing with the size and complexity of models (Reijers and Mendling 2008;Zugal et al. 2013), as sub-processes reduce the size and complexity of top-level process models by abstracting the details. This is referred to as vertical modularization (La Rosa et al. 2011b). Hierarchical structuring or vertical modularization in business process models that is achieved by means of sub-processes is considered to have many advantages. It may foster reuse of process models and increase maintainability (Leymann and Roller 1997;van der Aalst and van Hee 2002;Koschmider and Blanchard 2007), provide concurrent development possibilities (Leymann and Roller 1997;van der Aalst and van Hee 2002), and enable scalability as each sub-process can be deployed to a different BPM engine (Leymann and Roller 1997).
Many modeling languages allow for the design of hierarchical structures (e.g., vertical modularization through the use of sub-processes in BPMN and EPCs). The use of sub-models to hide less relevant information is expected to decrease the mental effort (cognitive load) needed to understand the model (Moody 2004). On the other hand, fragmentation due to modularization increases the mental effort by forcing the readers to divide their attention between different fragments [the so-called split attention effect (Zugal et al. 2013)]. In consequence, the discussions about the proper way of using modularity and its implications for the understandability of process models are not conclusive Figl et al. 2013;Zugal et al. 2013). This also leads to a lack of theoretically grounded guidelines for modularizing process models into sub-processes. In particular, the influence of using different forms of vertical modularization in BPMN v2.0 (e.g., subprocesses, groups) on the understandability of process models has not been investigated.
Another factor that has not been addressed in the literature is the medium used to present the models to their audience. Although 'paper' is usually the preferred medium of presentation in practice (Reijers and Mendling 2008), the models are typically designed using software applications (particularly when the objective is process automation), and communicated through an online environment (e.g., web portal, company intranet) across the organization and beyond. Therefore, it is important to explore if using paper or a computer environment has any effect on model understandability.
Accordingly, the objective of this study is to investigate the influence of using different forms of vertical modularization and presentation medium on the understandability of processes modeled in BPMN. In addition, we aim to investigate the relation between model readers' level of competency regarding process modeling and notation, and their level of understanding of a process model. To achieve these goals, we conducted two experiments; the first experiment was conducted in a large organization with 60 participants, and the second experiment was conducted in a university with 140 graduate students who were enrolled in a business process management course. For the experiments, we used models of two real-life business processes of an organization, which are of comparable size and structure and can be considered large in scale.
We have presented a part of our initial findings from the first experiment in a prior publication in Turetken et al. (2016). In this paper, we present our findings for an extended set of factors and from both experiments to strengthen our conclusions. The additional experiment was a replication performed in a different setting, which also helped us to draw conclusions regarding the difference between practitioners and students. In this paper, we also investigate and provide evidence for the significant role that personal factors play in the understandability of process models.
The results from our study provide significant contributions to the body of knowledge of empirical BPM research, in particular of the factors influencing the understandability of business process models in BPMN. The BPMN has gained significant attention and broad acceptance by users in recent years (Chinosi and Trombetta 2012), and it is currently the most widely used process modeling language in practice (Harmon and Wolf 2016). The wide use of BPMN makes the research on the understandability of processes modeled in BPMN critically important.
The remainder of the paper is structured as follows. Section 2 discusses briefly the related work on the factors influencing process model understandability, focusing on the use of modularity in process models. Section 3 presents the research design including the research model that we tested, and Sect. 4 describes the design and execution of the experiments. In Sect. 5, we report and discuss the results, which are followed by the conclusions, limitations, implications, and future research directions.

Related Work
Understandability of process models has been investigated not only in the BPM field but also in the conceptual modeling research from a broader perspective, and as a core component of a number of conceptual modeling quality frameworks (Lindland et al. 1994;Nelson et al. 2012;Krogstie 2016). For example, the SIQ framework (Reijers et al. 2010) refers to three categories of process model quality: semantic, pragmatic, and syntactic quality. The pragmatic quality relates to whether a process model can be easily and correctly understood by people.
Although modularity in business process models is considered to possess benefits in various dimensions (Leymann and Roller 1997;van der Aalst and van Hee 2002), its influence on understandability has not been well understood (Zugal et al. 2011;Houy et al. 2012;Figl 2017;Dikici et al. 2018). We can distinguish three forms of modularization, each capturing different ways in which a process model is decomposed into modules (La Rosa et al. 2011b). The vertical modularization involves decomposing a model into modules at different hierarchical levels. The horizontal modularization partitions a process model into peer modules, while orthogonal modularization decomposes a model along the crosscutting concerns of the modeling domain, such as security, or privacy. In this work, we focus our attention on vertical modularization, which targets at increasing understandability of large process models by 'hiding' process details into sublevels (La Rosa et al. 2011b). However, the findings of empirical studies that investigate the effect of vertical modularization on understandability hardly converge into a validated set of practical guidelines for applying modularization in process modeling.
The works by Reijers and Mendling (2008) and  test the influence of using sub-processes on the understandability of two real-life processes that are modeled using Workflow Nets in two forms: modular and flattened. The participants (28 consultants) were asked to answer a set of (control-flow related) comprehension questions regarding these models (to measure effectiveness). For the first process model, the experiment did not result in a significant difference between the modular and flattened versions, but a positive influence of modularity on understandability was found for the second model. The authors attribute this to the difference in the degree of modularization applied in these models. As the second model had more sub-processes, they sparingly conclude that 'modularity appears to have a positive connection with process understanding'. Zugal et al. (2013) test the effect of modularization on the understandability of declarative process models. Four processes were modeled in two forms (modular and flattened) using the declarative language ConDec. The results suggest that modularization decreases perceived mental effort but has no influence with respect to the number of correct answers given to the comprehension questions. The limited number of participants (9 respondents) is reported to be a threat to the validity of the findings.
The technique used for modularizing process models also plays a role in the effect of modularity on understandability . Applying different modularization methods could yield different structures and in turn different levels of influence on comprehension. The study by Johannsen et al. (2014) uses eEPC process models and tests the use of Wand and Weber's five decomposition conditions (Wand and Weber 1989), which are considered to yield well-decomposed models. The models are modularized in three forms with respect to their level of adherence to these conditions. The results indicate that models that are structured in full adherence to these conditions are more understandable than those that violate them. However, the study does not compare the performance of modularized models against their flattened counterparts. Figl et al. (2013) used an expert evaluation approach (with 15 process modeling experts) to determine whether some visualization strategies provide a better fit for representing process model hierarchies than others. Accordingly, the experts prefer to navigate in the hierarchy with the help of an overview ? detail strategy (where sub-processes are shown as separate models detached from the context of the higher-level model) instead of a focus ? context strategy (where sub-processes are expanded in the higher-level model directly within their context). The 'overview ? detail' view was considered to simplify the design and provide undistorted views of focus and context.
In the closely related domain of software modeling, Cruz-Lemus et al. (2009) present a family of experiments investigating the effect of hierarchy on the understandability of UML statechart diagrams. The results indicate insignificant or varied effects of hierarchy on understandability. Moreover, the understandability worsens with the increase of the nesting level (depth of hierarchy). The studies by Sanchez-Gonzalez et al. (2010) and Figl and Laue (2015) confirm this finding.
This diversity in the results can be attributed to the outcome of two opposing effects of modularization: abstraction (information hiding) and split-attention effect (browsing costs) Zugal et al. 2012). Using sub-processes might increase a reader's understanding of a complex model by abstracting less relevant information (and thereby reducing complexity). However, additional costs (increased cognitive load) incurred by browsing through and integrating fragmented pieces of models can counter-balance this gain (Figl et al. 2013).
The existing research, as discussed above, calls for further empirical studies to contribute to a better understanding of the impact of modularization. In particular, there is a lack of studies on the effect of modularity that involve BPMN -the de-facto process modeling notation in practice (Harmon and Wolf 2016). BPMN v2.0 has specific elements and techniques for representing modularity (e.g., collapsed/expanded sub-processes, groups) which have not been addressed in the research concerning process model understandability.
In addition, to the best of our knowledge, no empirical work has studied the effect of the presentation medium on the understandability of process models. Yet, the medium that is used to present modularized models, for instance, may differ significantly. When the models are presented in paper form, sub-processes are typically presented separately on different paper sheets, and the user has to physically locate the relevant sub-process model among other models. In a computer environment, on the other hand, the model reader can be provided with a sub-process model using various ways (e.g., in a pop-up window when the user hovers over or clicks on the collapsed task on the main model). In this case, the reader typically spends less effort in retrieving the right process model, which can influence the efficiency of using the resources (e.g., time) for understanding the model.

Research Model and Hypotheses
Aligned with our research objective, we developed a research model as depicted in Fig. 1. The model proposes that the understandability of process models (in terms of 'understandability task effectiveness' and 'understandability task efficiency', as well as 'perceived usefulness' and 'perceived ease of understanding') is influenced by the vertical modularity technique applied in modeling the process, the medium used for its presentation, and the model reader's level of BP modeling competency.
Accordingly, we propose two independent variables, namely the use of vertical modularity in modeling the processes, and the medium used to present the models to its readers. As for the dependent variables, we distinguish two categories of factors that are applied to refer to the concept of process model understandability. The first category indicates the objectively measured understandability and comprises two factors, namely understandability task effectiveness and understandability task efficiency, which are the most commonly used indicators of model understandability (Figl et al. 2013). The second category is the perceived understandability and involves the factors of perceived usefulness for understandability and perceived ease of understanding (Dikici et al. 2018). The research model also incorporates the personal factor of model reader's business process (BP) modeling competency as a confounding variable. This variable is assumed to influence the dependent variables, but unlike independent variables, it is not controlled in our experiments. Later in this section, we describe these variables in detail including the way they are operationalized.
Based on the research model, we can draw our hypotheses regarding the effects of independent and confounding variables. Our first group of hypotheses relates to the use of vertical modularity. As the research on the influence of the use of modularity in process models is not conclusive, we adapt an exploratory approach and do not indicate a direction (positive or negative) for the potential influence of the use of modularity. Accordingly, we formulate the following group of hypotheses: H1 The use of vertical modularity will have a significant impact on process model understandability, i.e., (a) understandability task effectiveness, (b) understandability task efficiency, (c) perceived usefulness for understandability, and (d) perceived ease of understanding.
Our second group of hypotheses addresses the presentation medium. Similar to the first group of hypotheses, we do not assume a particular direction for the influence of the medium used to present the models to the model readers. Accordingly, the second group of hypotheses can be stated as:

H2
The presentation medium will have a significant impact on process model understandability, i.e., (a) understandability task effectiveness, (b) understandability task efficiency, (c) perceived usefulness for understandability, and (d) perceived ease of understanding.
Finally, we consider a model reader's level of BP modeling competency as an important factor for model understandability. The literature supports a positive influence of this factor on understandability Mendling et al. 2012;Turetken et al. 2017). Hence, we hypothesize that the model readers with higher levels of theoretical knowledge on BP modeling and related notations will achieve higher levels of understanding of the process models presented to them. When reading the process models, this prior knowledge will reduce the cognitive load required for interpreting the models, and will ease and improve their understanding of the models . Here, we are interested only in the objectively measured understandability (model readers' effectiveness and efficiency in understanding a model, rather than their perception of how understandable a model is). Accordingly, we draw the following group of hypotheses: H3 The model readers with higher levels of BP modeling competency will have significantly higher (a) understandability task effectiveness and (b) understandability task efficiency.
In the sections that follow, we explain the details regarding the design of the experiments including the process models used for the experiments, the independent variables (forms of vertical modularity representation and presentation medium), the confounding variable (BP modeling competency), the dependent variables of model understandability, and the operationalization of these variables.

Experiment Design and Execution
To test our hypotheses, we conducted two experiments following the established guidelines for designing and executing experiments and reporting results (Field and Hole 2003). The first experiment involved 60 practitioners working in a large corporation. The second one was a replication of the first and involved 140 graduate students of a university. The experiments were designed in such a way that the participants acted as model readers who were given a number of process models and a set of related comprehension questions that can be answered based on the process models. The participants were also expected to answer additional questions regarding their perception of the process models' understandability and to take a test to identify their level of BP modeling competency. We used a between-group design for the experiments, where separate groups of participants for each of the different conditions in the experiments were tested once only. This is mainly to avoid experimental bias in participants and test multiple variables simultaneously. Due to these advantages, the between-group design is widely used in experiments in several fields including management, social, and natural sciences (Field and Hole 2003).

Process Models Used for the Experiments
We used two process models as the objects of our experiment. The processes that were modeled took place in a large corporation headquartered in Europe, which employs over 115,000 staff and operates in over 100 countries worldwide. Among several processes in the quality management system of the company, two processes of similar size and nature were selected by the company representatives on grounds of their critical importance for the business domain in which the company operates. The processes cover several divisions and departments of the company and can be considered as large and rich in terms of the interaction taking place between these units.
The selected processes were initially modeled in BPMN v2.0 using sub-processes where applicable (based on the existing process documentation and interviews with process owners and participants). The resulting models are BPMN collaboration diagrams, where the interaction between process participants is explicitly modeled using message flows (Signavio v10.11 was used for modeling these processes, however only static images of the models were used for the experiment, as explained in Sect. 4.7). The models were subsequently reviewed by two process modeling experts for syntactical correctness and validated for their correctness (including the choice of modularization) by two domain experts of the company. The basic metrics used to measure the structural properties of process models show that these models are comparable in terms of size and complexity (see Table 1). (Figure 3 shows different versions of these models.)

Independent Variable: Use of Modularity
The verified and validated models were subsequently restructured into two other forms, leading to three forms of vertical modularity representations to be tested (in the remaining of this paper, we use the term modularity representations to indicate the representation types in BPMN that we use for vertical modularization). Figure 2 illustrates these forms. The first form (Repr1) is the fully-flattened representation of the process models. This type acts as the reference model which offers the possibility to draw conclusions about whether the use of any modularity technique has an influence on the understandability. (Note (c) Repr3: Sub-processes collapsed and shown in separate models that re-structuring models does not change the business logic in a semantic sense but may influence the extent of information provided in the models. For instance, the subprocess information disappears in the fully-flattened models.) The second form of representation (Repr2) combines the fully-flattened form with groups, which are used in BPMN to visually (and informally) cluster a set of logically related model elements (La Rosa et al. 2011a). We used groups in a way similar to the practice of 'expanded sub-processes' in BPMN (but without the additional start/end events for each sub-process). This form shows some characteristics of a 'focus ? context' view [as in Figl et al. (2013)], which is considered to require less cognitive load of the user, who usually has to integrate model parts again when sub-processes are extracted from the main model as separate models (i.e., in 'overview ? detail' view). However, in this form, the complexity of the fully-flattened model is inherited and amplified by the additional information on process groupings.
The third form (Repr3) is the original representation, which uses collapsed sub-processes in BPMN. The subprocesses are hidden in the higher level (main) process model, but can be accessed as a separate model whenever the user is interested in the information it contains. Figure 3 shows example models of the processes A and B in two representation forms (Repr2 and Repr3), respectively. (Note that the figure is provided to give an indication of the size and structure of the models, and that labels of all process elements that existed in the experiment are removed.)

Independent Variable: Presentation Medium
We experimented with two alternative presentation mediums: paper and computer. Half of the participants were provided with the models on A3 size sheets of paper, which allowed for adequate readability (A4 sized sheets were inadequate for providing sufficiently readable models). The sub-processes in Repr3 were also printed on separate A3 size sheets with 6 sub-processes on each. The other half of the participants received the models in a computer environment through an online website developed for the experiment (see also Sect. 4.7 for the details of the questionnaire). The models with Repr1 and Repr2 (fully-flattened, and flattened with groups) were displayed as images, which can be zoomed and navigated in all directions. For the models with Repr3 (with separate sub-process models), the sub-process models pop up when the mouse pointer hoovers on the collapsed sub-process element in the main model.

Confounding Variable: BP Modeling Competency
To investigate participants' level of (theoretical) knowledge of business process modeling, we constructed the Business Process Modeling Competency (BPMC) test. In constructing the test, we employed the questions used in ) as the basis. The original questions follow a notation-agnostic view and involve only controlflow related aspects of process models. We took 4 questions from this original set and incorporated additional 8 questions that are related to the common process modeling practices and basic constructs of BPMN 2.0 (e.g., how basic gateways work, how loops can be defined). Ultimately, we developed 12 questions that relate also to other process perspectives and to BPMN.
The participants of the experiments were expected to answer each question by selecting one of the three options: 'true', 'false', or 'I don't know'. Their level was measured as the total of correctly answered questions and categorized into 6 groups with the following scheme: level 1 with 0, 1, or 2 correct answers, level 2: 3 or 4, level 3: 5 or 6, level 4: 7 or 8, level 5: 9 or 10, and finally level 6 with 11 or 12 correct answers. Figure 4 shows two questions from the test (the complete set of questions is available at https:// goo.gl/eDw5zh).

Comprehension Questions
In order to evaluate participants' level of understanding of the processes, we developed 9 questions for each process by following an iterative approach with the domain experts working in the company where the processes were executed. The expert involvement is assumed to assure that each question can be used in a representative and valid way to assess someone's understanding of the processes.
Since the quality of these questions has significant influence on the validity of the findings (Laue and Gadatsch 2010), particular attention was paid to develop a set of questions that is balanced in relation to different process perspectives (i.e., control flow, resource, and information/data), and different scopes (i.e., global and local). Accordingly, a local question can be answered within the scope of a single sub-process, while information available in the modularized (high-level) main model is sufficient to answer a global question. The third type are the global-local questions which require information available not only in the modularized model but also in one or more sub-processes. The use of these three types of questions is important particularly for the investigation of the potential influence of vertical modularity. Out of 9 questions (for each process), there were 3 global, 3 local, and 3 global-local questions.
The distribution of questions with regard to process perspectives is as follows: for process A, out of 9 questions 3 relate to all process perspectives, 2 only to the control flow, 1 both to the control flow and resource, and 3 both to the resource and information perspectives. A very similar configuration is maintained also for process B.
Each question has a multiple-choice design, where respondents are provided with 5 choices -the last one Fig. 3 The process models in two forms of representation: a Process A in Repr2 (flattened with groups of activities), b process B in Repr3 (with collapsed sub-processes), c Few of the sub-process models of process B in Repr3. (The process models used for the experiment are available online at http://goo.gl/MwFqMG). The questionnaire is given in the Appendix (available online via http://springerlink.com) always being 'I don't know' (i.e., unable to tell). An example question for process A is given below. For instance, this question is a global-local question that relates to both resource and information/data perspectives. In total, we developed 18 comprehension questions (9 for each process model, A and B). These questions are presented in the Appendix, Part 2 and 4 (available online via http://springerlink.com).

Dependent Variables
As illustrated in our research model (in Fig. 1), we identified four dependent variables concerning process model understandability. The first two relate to the (objectively measurable) level of understanding that the participants can demonstrate with respect to each process model. These are the most commonly used indicators in this research field Houy et al. 2012): • Understandability Task Effectiveness is operationalized by the understandability test score, i.e., the number of correctly answered comprehension questions (Dikici et al. 2018). Each correctly answered question counts as 1 point for the score, totaling 9 points max for each process model. • Understandability Task Efficiency indicates the degree of cognitive resources employed by the reader for understanding the model . It is operationalized by dividing the test score by the total time spent by a participant for the questions that he/she correctly answered. This formulation relies on the view that a better understanding may be compromised by a faster understanding (Bodart et al. 2001). From this perspective, understandability task efficiency can be considered as a productivity measure (Poels 2011).
The remaining two variables are based on the two constructs of the Technology Acceptance Model (TAM) (Davis 1989) and concern users' perception of the models in terms of their usefulness for understandability and ease of understanding: • Perceived Usefulness for Understandability (PUU) indicates users' perception of the utility of a process model structured in a particular form in providing gains to the user in terms of understandability. • Perceived Ease of Understanding (PEU) indicates the degree to which a person believes that understanding a model is free of mental effort [as also used in Houy et al. (2012)].
TAM and its derivatives (e.g., Venkatesh et al. 2003) are theories commonly referred to that predict and explain the Fig. 4 Example questions from the test on the (theoretical) knowledge on BP modeling and BPMN 2.0 acceptance and use of design artifacts, such as IS methods and models (Moody 2003;Recker et al. 2011). In TAM, the two constructs (perceived usefulness and ease of use) are believed to be strong determinants of users' intentions to apply a design artifact. For the experiment, the adopted variables are operationalized using multiple indicators (scale items), which have been evaluated for reliability and validity in previous research (Davis 1989;Moody 2003;Turetken et al. 2018). Following (Venkatesh et al. 2003), we used 4 items for each construct, with modified wording of the items to accommodate this research. The participants expressed their level of agreement with each statement on a 7-point Likert scale, ranging from 1 (strongly disagree) to 7 (strongly agree). The scale items for each factor are given in the Appendix (Part 3: User Perception).

Experiment Blocks and Questionnaire
The between-groups experiment is designed to contain six blocks (as shown in Table 2). Each participant goes through a single block, where he/she is given one variant of two process models (A and B) in sequence. In each block, the models were shown using different forms of representation, i.e. either on paper or in a computer environment.
The questionnaire for the experiment was provided through an online web environment, which was developed using a software application available for creating online surveys (Sawtooth Software SSI WEB 8.4.8). The questionnaire consisted of 6 parts (depicted in Fig. 5). In the first part of the questionnaire (P0), we asked participants to indicate their experience in process modeling, the frequency in which they encounter process models (intensity), their view on the level of knowledge they have of process modeling and BPMN 2.0, and their familiarity with the domain and relevant processes used in the experiment. For the first two factors (experience and intensity), we adopted the questions from Mendling et al. (2012).
The second part of the questionnaire (P1) is the BP Modeling Competency Test, to objectively assess participants' level of knowledge on process modeling and BPMN 2.0. As discussed in Sect. 4.4, the test was developed based on the questions in ). However, this part was not available in the first experiment and only the participants of the second experiment went through this test. In the first experiment, instead of this test, we asked the participants generic question to obtain their view on the level of experience and knowledge they have on process modeling and BPMN.
Parts 2 and 4 of the questionnaire were designed to measure participants' level of model understanding for two process models (A and B, respectively). In these parts, the participants were expected to answer 9 comprehension questions related to each of these models. Each question was placed on a separate online webpage. In the blocks where computers were used, the process models were embedded in the questionnaire environment in such a way that the question and model were presented on the same page.
Parts 3 and 5 of the questionnaire ascertain participants' perceptions of the particular representation form and medium used to represent the model for process A and B, respectively.
The questionnaire items for all parts are presented in the Appendix.
All participants (whether they received the models on paper or on computer) received the questions through the online environment. This was particularly necessary for accurately tracking the time it took for participants to answer each understandability question, and for computing metrics regarding the understandability task efficiency. The participants were informed upfront that they were timetracked.  Fig. 5 Parts of the questionnaire Before the actual experiment took place, the questionnaire was pre-tested as a final step by 6 graduate students. This also gave an indication about the required time-frame for the experiment (1.5-2 h). As a result of the pre-test, several ambiguities and minor mistakes were corrected in the final version.

Participants of the Experiments
The first experiment took place in June 2015 in a division in the headquarters of the company from which the process models used in the experiments originate. The company representatives initially selected 74 employees as candidates who worked in 13 departments of the division and had already taken part or might potentially take part in the execution of one of these processes. The participation was on a voluntary basis. Ultimately, 60 employees participated, leading to a response rate of around 81%. All participants have at least a university degree -the majority with an engineering background. Out of 60, 26 employees had previously taken part in the execution of one of these processes or were moderately familiar with their execution. The participants were randomly assigned to each experiment block with the exception of the 26 employees that had a certain degree of familiarity with the domain and process models. These were evenly assigned to the blocks (4 or 5 participants per experiment block). Each participant received an invitation with practical guidelines for accessing the online experiment site, including a username which also determined the experimental block that the participant was assigned to.
The second experiment took place in a single-location setting in a university in January 2016. The participants were graduate students of a number of engineering programs, the majority of which were in operations management (51%), information systems (14%), and innovation management master programs (17%). These students were enrolled in the same master level course on business process management (BPM), where they participated in the experiment a few days before the final course examination. The participation in this experiment was also on a voluntary basis; however, the students were offered 0.5 bonus points (out of 10) to their final course grade to offer a certain level of motivation for participation. Among 208 students, 140 participated (67%). The participants were randomly assigned to each experiment block.

Results and Discussions
In this section, we present the descriptive statistics for the variables, and discuss their correlations. Next, we proceed to testing the hypotheses.

Descriptive Statistics and Correlations
As each participant tested two process models in different forms, the experiment led to 400 observations from 200 participants, which are distributed in a uniform way over different modularity representations and presentation mediums. Table 3 presents the descriptive statistics for these independent variables tested in the experiment.
The descriptive statistics for the confounding variable of BP modeling competency is presented in Table 4. Due to some practical problems in the first experiment, the BP Modeling Competency test was available only in the second experiment that we performed with 140 students. Hence, we have 140 data points in total regarding the BP modeling competency factor, which is, however, sufficient to derive valid inferences about this factor. In addition, we used the individual total understandability task effectiveness score and efficiency values that each participant obtained in answering 9 understandability questions for each process model (in total 18 questions). As shown in Table 4, there are very few participants at the two ends of the competency level spectrum, i.e., levels 1 and 6. However, the number of participants at levels 2-5 can be considered appropriate for further statistical analyses.
As mentioned above, the participants of the first experiment did not go through any test to measure their level of BP modeling competency. However, in order to gain a general understanding of their level of knowledge and experience, we asked participants for their opinion on the level of experience and knowledge they have of process modeling and BPMN. About 72% of the participants of the first experiment stated that they are knowledgeable or somewhat knowledgeable concerning process modeling. However, they had no or limited knowledge about BPMN. Overall, we can consider the majority of the participants in the first experiment to be fairly inexperienced in terms of general BPM skills and capabilities.
We performed a correlation analysis between the independent variables of modularity representation and presentation medium, and the confounding variable of BP modeling competency. As depicted in Table 5, the analysis shows no significant correlation between these variables, which suggests that we can run the tests for our hypotheses independently.

Hypothesis Testing
In order to identify the appropriate statistical tests that can be used to test our hypotheses, we analyzed the data to check if it is conformant with the assumptions of each feasible statistical test. The results of our initial analysis showed that there are clear deviations from normality for the measures of all dependent variables over independent variables (Kolmogorov-Smirnov test of normality (Field 2013), all with p = 0.01). Therefore, we forwent the predictive power of parametric tests and applied their nonparametric counterparts, in particular the Kruskal-Wallis test with pairwise multiple comparison. The Kruskal-Wallis test is the generalization of the Mann-Whitney test, but for the analysis of more than two independent groups (Field 2013).
Hypothesis testing was performed individually for each of the independent and confounding factors, using SPSS The rows in italic show the higher level concepts and numbers aggregated from the rows just below them a Each correctly answered question counts for 1 point for the Score, totaling to 9 points max for 9 questions. Higher mean values indicate better understandability in terms of task effectiveness b Higher mean values indicate better understandability in terms of task efficiency c Four items to be answered in a 7-point Likert scale, totaling to a min value of 4, max value of 28 (4 9 7). Higher mean values indicate better understandability as perceived by the participants d The experimental block-design given in Table 2 is optimum for testing the modularity representations but not ideal for the presentation medium (due to the inadequate number of representation pairs followed by the participants for each presentation medium). Therefore, in examining the influence of the presentation medium, responses gathered only for the process model A is considered The rows in italic show the higher level concepts and numbers aggregated from the rows just below them a Each participant answers 9 understandability questions for each process model, totaling to max 18 points for the total score of understandability task effectiveness b Only the participants of the second experiment performed the BP Modeling Competency test, leading to 140 data points. Therefore, the overall averages in this table reflect the data from the second experiment only v23. As is common practice in experimental studies, we used 0.05 as the standard level of significance.

Hypothesis Testing for the Use of Vertical Modularity
Our first group hypotheses (H1) argued for the significant influence of the use of different modularity representations on process model understandability. Figure 6 presents the boxplot diagrams for the understandability indicators over the modularity representations. The results of the Kruskal-Wallis tests are presented in Table 6. Accordingly, a significant impact of the modularity representation is observable only for one of the four indicators of model understandability, i.e., the perceived ease of understanding.
In the next sub-sections, we discuss the results with respect to each process model understandability indicator. Although the boxplot diagram in Fig. 6a shows a lower mean for Repr3 for understandability task effectiveness (score), the results of our statistical tests ( In order to investigate if the effectiveness scores obtained from different types of comprehension questions show any major difference, we performed further statistical tests. As described in Sect. 4.5, we distinguished between global, local, and global-local type of understandability questions. The results indicate that the scores the participants obtained from the local questions (which can be answered by looking only at sub-processes) are significantly different Based on these results, we can infer that for local questions, vertical modularization reduces effectiveness when overview ? detail strategy is used (as in Repr3, where sub-processes are shown separately, detached from their context). This is probably due to the increased browsing costs (split-attention effect) in Repr3 and insignificant costs of complexity in flattened models (Repr1 and Repr2) even with the group information (Repr2). It may further indicate that the context in which a sub-process (or the part of the process that can be grouped as a subprocess) takes place can play an important role in understanding process information.
For the global questions (where answering requires information only about the main/modularized model) and global-local questions (where answering requires information about both modularized model and one or more sub-processes), the differences in the scores for each form of modularity representation are not significant (p = 0.27 and p = 0.69, respectively). Accordingly, vertical modularization does not have a significant effect on effectiveness for global and global-local questions. This implies that the understandability gain acquired in abstracting less relevant information through vertical modularization is insignificant in these types of process models. With regard to understandability task efficiency, our statistical analysis does not indicate a significant difference between three forms of modularity representations [H(2): 0.67, p = 0.72]. A relatively high dispersion of the efficiency values both for Repr1 and Repr3 is also worth mentioning. The results are in line with respect to the efficiency obtained for questions concerning different process perspectives and scope (i.e., there is no significant difference with respect to the forms of modularity representation).
Although Repr3 has the lowest ratings as to how useful the participants consider the model is for facilitating understanding (as depicted in Fig. 6c), our statistical analysis indicates no significant difference between three modularity representations in terms of perceived usefulness for understandability [H(2): 4.21, p = 0.12].
For perceived ease of understanding, on the other hand, the attitude towards the ease of understanding differs significantly with respect to the forms of modularity representation [H(2): 10.30, p = 0.01]. Pairwise comparisons indicate that Repr1 is considered significantly easier to understand than the modular form of Repr3 (p = 0.01). This shows that fully flattened models in BPMN (collaboration) diagrams are regarded as easier to understand than models with sub-processes in separate views.

Hypothesis Testing for the Presentation Medium
In our second group of hypotheses (H2), we argued that the medium used to present process models has a significant influence on their understandability. The boxplot diagrams for the understandability indicators over the presentation mediums are presented in Fig. 7. Table 7 presents the results of the Kruskal-Wallis tests. The results confirm the boxplot diagrams in that the presentation medium has a significant influence on the understandability task effectiveness and is regarded critical from the users' point of view.
A key result from the analysis is the significantly higher understandability task effectiveness scores achieved by participants that analyzed the process models on paper [H(1): 6.29, p = 0.01]. Hence, understandability task effectiveness would be positively impacted by the use of papers instead of static process models presented on computers.
As for the understandability task efficiency, the statistical tests indicate that the use of paper or computer for presenting process models does not lead to a significant difference regarding efficiency [H(1): 0.71, p = 0.40].
With regard to perceived usefulness for understandability and ease of understanding, the participants considered models presented on paper easier to understand and more useful ( The analysis of the effect of presentation media indicates that using paper or computer influences process model understandability (as measured by three of the four indicators, i.e., understandability task effectiveness, perceived usefulness for understandability, and perceived ease of understanding) when it comes to the models of this type, structure and complexity. We observed that the participants that received models on paper studied them using their fingers, which can be more difficult on the screen. However, very few participants made notes directly on the printed models.
We also recognize that the effect of the presentation medium on the understandability is likely to depend heavily on the size of the process model -a factor that we did not control in our experiments. For those models that can be fully fitted to the computer screen and still be sufficiently eligible to the reader, one can argue that the difference in the understandability level due to the use of different presentation media might diminish.

Hypothesis Testing for the Model Reader's BP Modeling Competency
Our third group hypotheses (H3) argues that the model readers with higher levels of BP modeling competency will have significantly higher understandability task effectiveness and efficiency. We present the boxplot diagrams and the results of our statistical tests in Fig. 8 and Table 8, respectively. The results support only the first part of H3 regarding understandability task effectiveness.
The results indicate that model readers with higher levels of knowledge of BP modeling and related notations achieve higher scores for understandability task effectiveness [H(5): 14.83, p = 0.01]. This is reflected in the boxplot diagram in Fig. 8a. According to the Kruskal-Wallis pairwise multiple comparison tests, participants at level 5 scored significantly higher than those at level 2 (adjusted significance p = 0.023) (we ignore the values at Level 1 and 6 due to the few number of participants at these levels).
However, the results indicate no significant influence of model readers' BP modeling competency on understandability task efficiency [H(5): 4.11, p = 0.53]. Although the boxplot diagram in Fig. 8b shows a slight negative relation between competency level and efficiency, this relation is not statistically significant.  The results partially confirm the importance of prior knowledge (competency) of process modeling and notation for understanding a process model. This is reflected in the effectiveness scores obtained by the participants. Although the relation between the knowledge level and efficiency is insignificant, we see that participants at lower levels tended to spend less time on answers at the expense of decreasing effectiveness. This can be explained by a likely tendency of participants with lower levels of knowledge on process modeling and notation to engage less actively in a thorough deliberation of the models and tasks given to them.

Differences Between the Participants of Two Experiments
In addition to the modularity representation, presentation medium, and BP modeling competency, we were also interested in the difference between the two experiments with regard to the understandability tasks effectiveness and efficiency. We were particularly interested to see if there is any difference between the performance of practitioners of the first experiment and that of the graduate students of the second experiment.
Although we were not able to measure the level of BP modeling competency of the practitioners using the test, we expected them to have less theoretical knowledge on process modeling (particularly with respect to the notation used) than the graduate students, who had enrolled (and almost completed) a BPM course. Therefore, we expected graduate students to be more effective and efficient than practitioners. Figure 9 and Table 9 present the boxplot diagrams and the results of our statistical tests, respectively. Accordingly, practitioners in Experiment 1 were significantly more effective than the students in experiment 2 [H(1): 6.86, p = 0.01], while the students were significantly more efficient (spent less time on each correct answer -as is depicted by higher scores of understandability task efficiency) than the practitioners [H(1): 36.96, p = 0.01]. In that respect, the results of our analysis were unexpected.
We bring forward two possible explanations for this result. First, the practitioners were working in the business environment where these processes were executed (although a limited number of practitioners were only marginally involved in the execution of these processes). Therefore, their familiarity with the domain and terminology might have helped them in gaining a better understanding of the process and related experimental tasks. Second, it is probable that the practitioners took the experimental tasks more seriously assuming that these tasks were a part of an organizational training program or a job-related assignment (although they were explicitly told that their individual performance would not be communicated to any party, and a result would not be traced back to any individual).

Analysis of Other Confounding Factors
The first part of our questionnaire consisted of questions about participants' opinion of their level of experience in process modeling (1), the frequency in which they encounter process models, i.e., intensity (2), level of knowledge in process modeling (3) and BPMN (4), as well as their level of familiarity with the domain (5) where the processes were taking place (relevant items are available in Appendix, Part 0A-0C).
As the participants of the second experiment were students, we did not expect to have a sufficiently dispersed group here. However, in the first experiment with practitioners, the participants differed to a certain extent with regard to these factors. Therefore, we performed a series of analyses to investigate if any of these factors had a significant influence on understandability (as operationalized in our study). Our statistical analyses did not indicate a significant influence of any of these factors (per experiment or in the combined dataset). This can be attributed to the relatively limited number of participants in the first experiment (with practitioners), as well as to the accuracy and appropriateness of the items we used to operationalize them. For instance, while the perceived level of knowledge of process modeling and BPMN did not indicate any influence, the participants' process modeling competency as measured using the BP Modeling Competency test revealed a significant influence, as discussed in Sect. 5.2.3.

Conclusions
Business process models are important artifacts in various phases of the BPM lifecycle. Therefore, it is necessary that the intended target audience of these models are able to understand the models correctly and timely. In this paper, we have described the design and conduct of an experimental study to investigate a set of factors that potentially influence process model understandability. We examined if and how different forms of modularity representations for vertical modularization and the medium used for the presentation influence the understandability of process models that are in the form of BPMN collaboration diagrams. In addition, we investigated the relation between the BP modeling competency level of model readers and their BP model understanding. To contribute to the generalizability of our findings, we used two models of real-life processes as the objects of our experiment. We conducted two experiments. In the first one 60 employees of a large organization participated, who belonged to the target group that the models used for testing addressed. The second experiment involved 140 graduate students, who participated in the experiment as a voluntary task of a BPM course that they were enrolled in.
Table 10 summarizes our hypotheses and findings. From the measurements using task effectiveness, we can conclude that transforming flattened models in BPMN using sub-processes, which are shown as separate models, does not contribute to the models' understandability. On the contrary, for tasks which involve gathering and understanding information that is located in sub-processes, the effectiveness is significantly lower for vertically modularized process models. Hence, using separately shown subprocesses in BPMN may negatively influence effectiveness without contributing to task efficiency (when compared with models that are flattened or vertically modularized  using groups). In addition, flattened models are considered easier to understand than models with sub-processes shown separately. Therefore, if vertical modularization is necessary and the comprehension of the models is critical, the use of context-aware groups (to indicate the process elements that can be combined into a sub-process) should be preferred to separately shown sub-processes.
The results also show that paper is practitioners' preferred choice of medium to facilitate understandability and ease of understanding of process models. In addition, we found that the task effectiveness of model readers is higher when the models are presented to them on paper.
Our study also confirms the role of model readers' level of knowledge on BP modeling and notations as an important factor of process model comprehension. We found that model readers with higher levels of knowledge on BP modeling competency achieve higher effectiveness scores, without any significant decrease of efficiency.

Limitations and Future Work
Our work has a number of limitations as the results are confined by threats to validity -in particular, the construct, internal, external and conclusion validity (Wohlin et al. 2012). Below, we discuss these threats, how we addressed them during the design and execution of the experiments, and highlight the improvements to be carried out as future work.

Threats to Internal Validity
The specific choice for the vertical modularization of two processes can also be regarded as a threat to the internal validity of our findings. It is difficult to verify that the choices of the parts that are structured as sub-processes are optimal (but not arbitrary, which may lead to a flawed modularization ). We addressed this threat by requesting domain experts, who were also acting as process modelers/owners in the company, to validate the models including their modularity structures. However, future research should examine the effect of different types of modularity when other (theoretical) modularization approaches are employed, such as Wand & Weber's (1989) as in (Johannsen et al. 2014), heuristics in (Milani et al. 2015), or role-based approaches (Turetken and Demirors 2013;van den Hurk et al. 2015). Result Interpretation H1. The use of vertical modularity has a significant impact on: (a) Understandability task effectiveness

Partially supported
Effectiveness is higher with flattened BPMN models (with or without groups, i.e., Repr2 and Repr1, respectively) than with vertically modularized models with sub-processes (Repr3) for certain typse of understandability related tasks (b) Understandability task efficiency

Not supported
Efficiency does not differ with the use of models in any form (flattened or vertically modularized using groups or sub-processes) (c) Perceived usefulness for understandability

Not supported
Using a different form of modularity representation does not have a significant effect on usefulness in terms of facilitating understanding (d) Perceived ease of understanding

Supported
Fully-flattened models are perceived easier to understand than models that are vertically modularized (using groups or sub-processes) H2. The presentation medium has a significant influence on: (a) Understandability task effectiveness Supported Effectiveness of model readers is higher when they are presented with the models on paper (rather than on computer) (b) Understandability task efficiency

Not supported
The medium used for presenting models (paper or computer) does not influence efficiency significantly (c) Perceived usefulness for understandability

Supported
As a presentation medium, paper is considered more useful (than computer) in terms of facilitating understanding (d) Perceived ease of understanding

Supported
The models on paper are considered easier to understand than models on computer H3. Model readers with higher levels of BP modeling competency will have significantly higher (a) Understandability task effectiveness

Supported
Model readers with higher levels of knowledge of BP modeling and related notations achieve higher effectiveness scores (b) Understandability task efficiency

Not supported
The efficiency is not significantly correlated with model reader's level of knowledge on BP modeling notation 7.2 Threats to Construct Validity Following a rigorous method in developing, verifying and validating the understandability questions contributes to the accuracy by which the understandability factors are operationalized. This reinforces the construct validity of our work. Performing the experiments in a single location and time-setting and using automated means to collect detailed data regarding the duration spent on each part of the questionnaire by each participant are other factors that also contributed to the construct validity of this work. It is plausible to assume that the computer screen size and resolution influence the results regarding the presentation medium. The participants of the first experiment performed the experiment in their business settings where they were provided with standard computer facilities (i.e., a desktop and a standard-sized monitor). Thus, the potential effect of using computer environment with different display size and resolutions was reduced. However, in the second experiment, the students used their personal notebooks with different configurations. As these participants were subject to different display size and resolutions, the external validity of the findings is threatened regarding the presentation medium. We have recorded the screen-resolution information of all participants (automatically through the survey application) and analyzed it to see if the results differ for different groups of resolution values. The results did not indicate any significant difference. However, future work should investigate this factor in a (better) controlled setting (i.e., only when a standard computer environment is guaranteed for all participants).

Threats to External and Conclusion Validity
The research design based on a single experiment with a replication poses threats to the external (and conclusion) validities of the results that we achieved. However, experimenting with real-life processes and business practitioners helps to alleviate these threats. This allows us to better generalize the results towards practical implications. Although these practitioners were working in a single company, they were from 18 different departments of this large corporation. Yet, future research should consider involving more practitioners with different backgrounds and working in different business domains.
Aside from the modularity representations that we tested in our experiments, there are a number of other important modularity representation or presentation styles that we have not experimented with. For instance, using expanded sub-processes, or using a combination of collapsed and expanded sub-processes together in a single model are two options that are also commonly used for vertical modularization. Future research should consider testing these options, also in cases where there is a certain level of reusability in the process. This was not the case in the process models that we experimented with. However, in such cases, a sub-process can be modeled as an extended version and be re-used later in another part of the process model in a collapsed form. This will allow model readers to study the sub-process information only once by means of the expanded sub-process and spend less time on the collapsed sub-processes that appear later. This may provide a positive effect particularly on the efficiency dimension of the understandability.
Our findings are valid only for BPMN collaboration diagrams, where a number of pools are used (each with a single control-flow). To understand the potential effect of using this type of BPMN models, future work should consider experimenting also with BPMN models where a single main control-flow is present (i.e., a single pool potentially with multiple lanes). The set of BPMN constructs used for the models (zur Muehlen and Recker 2008) is another factor to be experimented on. Future works should also use processes of different size, complexity, and applied level of vertical/horizontal modularity to better understand the interplay between these factors and thereby contribute to the development of guidelines for applying modularization in business process modeling.

Implications for Research and Practice
Our work contributes by empirically investigating the relevant influences of three factors on the understandability of business process models. It extends the body of knowledge in the field and contributes to the practice of more effective and efficient business process modeling. This, in turn, will increase the benefits of process modeling in organizations.
The results of our experiments challenge the general view and assumptions on the use of vertical modularization in process modeling. Thus, the process modeling community needs to rethink the implications of vertical modularization when the understandability of BPMN models are of concern. Given the increasing popularity of BPMN as a modeling notation, the research community should continue seeking empirical evidence for the conditions and cases where using modularity can help increase or where it hinders the understandability of process models. The benefits gained in using modularization (e.g., ease of process model reuse) should outweigh its potential drawbacks for understandability.
Our findings also emphasize the importance of the presentation medium, which is a factor that has not been studied in previous work. Presenting models in a digital environment or on a paper in a form that provides an overall picture has a considerable effect on the understandability, favoring paper as the preferred choice of medium. As such, the BPM systems and process modeling tools that publish process models in digital forms should consider offering additional features to the users (e.g., animations, dynamic representations, search functions) to try to counterbalance this drawback.
Our results confirm the importance of the model reader's prior knowledge of the general practices and notations used for business process modeling. This implies that it is essential to provide formal training in the theoretical concepts of business process modeling and notation to the employees across the organization. This will ensure a better level of understanding of process models.
Our empirical work on the understandability of process models also points to a need for a set of guidelines that provide standards and rules for planning, conducting and reporting on such empirical works. The tests and guidelines (such as the BP modeling competency test, the guidelines for developing understandability questions that adequately represent various process perspectives and dimensions) would help to establish valid experiments and to report in a systematic way. This, in turn, would help contribute to the accurate measurement of the constructs and to the validity of the findings. The studies of factors influencing process model understandability need to grow to maturity with more empirical studies in order to bring BPM research closer to practice.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.