1 Introduction

Simulation modeling has become increasingly important in studying organizational behavior (Carley 2002; Harrison et al. 2007). Among the several simulation approaches available, agent-based models (ABMs) play a special role (Prietula and Carley 1994). Since early classics such as the Garbage Can Model (Cohen et al. 1972), researchers have increasingly used agent-based simulation techniques to address relevant organizational, strategic and operational questions (Prietula et al. 1998; Luo et al. 2018; Barnes et al. 2020). As Anderson (1999) emphasizes, ABMs allow researchers to examine open systems–common in management situations–whose behavior cannot be described analytically by equations derived from energy conservation principles or decision-theoretical axioms.

We can broadly define ABMs as computational models in which aggregate outcomes emerge from agents’ properties, behaviors and interactions, without the imposition of any top-down constraint. This makes ABMs extremely flexible, as it is relatively easy to vary the building blocks of an ABM. Yet the flexibility of ABMs does not come for free: results are often hard to interpret (Rahmandad and Sterman 2008). Lacking a closed-form expression that links inputs to outputs, agent-based modelers often struggle to assess whether the results of their ABMs are robust and their conclusions are valid.

Addressing these issues is one purpose of sensitivity analysis. However, as our literature review shows, sensitivity analysis is often omitted or only performed in a partial, ad-hoc manner in work that employs ABMs. This is understandable as ABMs raise a number of challenges for sensitivity analysis. For instance, the flexibility of ABMs makes it possible to vary not only parameters but also agents’ behavioral rules. While sensitivity analysis with respect to parameters is relatively straightforward, it is less clear how to perform sensitivity analysis on elements that do not belong to a well-defined numerical space. Yet varying non-parametric elements, not just parametric elements, is surely essential in any sensitivity analysis of an ABM. After all, ABMs are often regarded as axiomatic systems that generate outcomes which should be regarded as propositions or theorems (Gallegati and Richiardi 2009). We cannot test the robustness and stability of ABMs’ theoretical findings without varying their non-parametric elements. (Henceforth, we use the term non-parametric elements to indicate elements of an ABM simulator other than parameters; thus, we do not use the term non-parametric in the sense of models that are free of assumptions about the frequency distributions of the variables being assessed, a usage found in statistics.)

To strengthen future work that employs ABMs, we develop a systematic process for conducting sensitivity analysis on ABMs. Our first contribution is to propose a general conceptual structure for ABMs that identifies their “moving parts”—the elements that can be subjected to a sensitivity analysis. This conceptual structure is helpful for researchers to catalogue all the assumptions that underlie their model and to assess the breadth of their sensitivity analysis.

It is a common misconception that sensitivity analysis is just about showing that the main conclusions of a paper are robust to a range of assumptions. Robustness is only one of several goals. Sensitivity analyses can reveal which elements of a model, or combinations of elements, have the greatest impact on the results, and how strongly various elements interact with each other to influence model outcomes. In our second, more technical, contribution, we propose a design that allows the simultaneous variation of parametric and non-parametric ABM elements to address simultaneously multiple aims of a sensitivity analysis.

A further challenge of sensitivity analysis, which usually involves running many variants of a core model, is the visualization of results. Consequently, we devote special attention to graphical representation. In this context, a third contribution of our paper is a modification of the well-known individual conditional expectation (ICE) plots, a modification that accounts for the stochastic nature of ABM’s response, by adding a test on the difference of mean values to validate the statistical significance of the insights regarding direction of change. We call the modification “stochastic individual conditional expectation” plots, or S-ICE (see Appendix 3 for greater details).

To illustrate our approach, we apply our sensitivity analysis process to the Garbage Can Model (GCM) in the implementation of Fioretti and Lomi (2008, 2010). We choose this ABM because it is well-known in the management sciences and the software implementation is publicly available. We show that a non-parametric element is the most important driver of the results, demonstrating that a sensitivity analysis focused only on parameters misses important aspects of ABMs. We then show that interactions between model elements are relevant and offer new insights on the managerial interpretation of the findings. These insights would be missed by approaches to sensitivity analysis that vary “one parameter at a time.” As we illustrate with our analysis of the GCM, sensitivity analysis can reveal new research insights and point to fruitful extensions for future modeling.

2 Related literature

Our paper builds on two related literatures, one on agent-based modeling and another one on sensitivity analysis. These subjects are vast and this review cannot claim to be exhaustive. In this section, we provide synthetic overviews of these fields in order to highlight the research gap that we address. Appendix 1 provides a more comprehensive review of the literature, with several additional details.

2.1 Agent-based models in the management sciences

ABMs have been used to address a number of topics in the management sciences. A first use of ABMs is for theory development. An early classic is the garbage can model (GCM). The model arose in association with the “garbage can” theory of organizational decision-making (Cohen et al. 1972). In the garbage can theory, organizations are viewed as “collections of choices looking for problems” (Cohen et al. 1972). Each opportunity for choice is like a garbage can into which problems, solutions, and decision makers have been dumped, and what emerges from the collection is “organized anarchy.” To any scholar who has spent time in real organizations, the garbage can is a welcome complement to traditional models. Over the years, it was noted that the computer model presented in the original article did not reflect the corresponding verbal theory (Bendor et al. 2001). Several authors have reformulated or extended the initial simulation model, replicating and generalizing the original results (Masuch and LaPotin 1989; Fioretti and Lomi 2008, 2010; Lomi and Harrison 2012; Troitzsch 2012). While a classic, the garbage can model (GCM) still remains an influential work in the management sciences (Glynn et al. 2020), with substantial spillovers into other disciplines (Simshauser 2018). Another classical group of ABMs with a similar focus is the family of NK models (Levinthal 1997; Rivkin and Siggelkow 2003; Baumann et al. 2018), which applies techniques first developed in evolutionary biology to study the interrelationship between organizational design and market selection forces. ABMs have also been used to support theoretical investigations in fields that range from innovation diffusion (Garcia and Jager 2011; Fibich and Gibori 2010), knowledge transfer (Levine and Prietula 2012), and organizational learning (e.g., Levinthal 1997) to management accounting (e.g., Wall 2016) and organizational design (e.g., Dosi et al. 2003; Clement and Puranam 2018).

A second use of ABMs is as simulation tools for reproducing the behavior of actual complex systems. Examples of this vast literature are works such as Amini et al. (2012) and Stummer et al. (2015), in which ABM is used to simulate product diffusion, Utomo et al. (2018), on modeling agri-food supply chains, Barnes et al. (2010), Ayer et al. (2019), Barnes et al. (2020) on modeling disease transmission. Often, agent-based models are also part of hybrid simulations; we refer to the reviews of Brailsford et al. (2019), Robertson (2019), Currie et al. (2020) for additional details. Further discussion can also be found in Appendix 1.

2.2 Sensitivity analysis

Broadly speaking, sensitivity analysis can be thought of as the exploration of a mathematical or numerical model. The model is typically regarded as a black box that processes a set of inputs and calculates one or more quantities of interest (outputs). Thus, the exploration is not performed by a direct inspection of the model. Instead, the properties of the model are obtained indirectly, by investigating how the output changes given variations in the inputs.

Despite much theoretical progress [see the handbook by Ghanem et al. (2016)], sensitivity analysis is often a neglected task (Saltelli et al. 2020). The recent investigation by Saltelli et al. (2019), who inspect how sensitivity analysis is performed to help scientific modeling across several disciplines, shows that either sensitivity analysis is not applied or it is applied unsystematically. Reviews focused on sensitivity analysis for ABMs reach similar conclusions: in a survey of papers published in Ecological Modelling and in the Journal of Artificial Societies and Social Simulation, Thiele et al. (2014) find that 88% and 76% of the papers published in the years 2009 and 2010, respectively, do not include a serious sensitivity analysis. Similarly, Utomo et al. (2018) survey agri-food supply chain ABMs, finding that 28% of papers do not incorporate any form of sensitivity analysis, 68% of them perform a basic analysis, and only 4% of them apply a more systematic approach.

We have also performed a closer investigation of the works that have applied some form of sensitivity analysis in agent-based modeling; it is reported in Appendix 1. The analysis shows that the majority of authors use sensitivity analysis to check if the main qualitative conclusions of their work are robust when some pre-selected parameters are set to values different from those included in their baseline scenario. To assess robustness, modelers typically vary one-parameter-at-a-time, plot the output variable against different parameter values, and show qualitatively whether conclusions are robust. Only a minority of authors consider other goals beyond robustness, vary multiple parameters at the same time, and use more sophisticated quantitative sensitivity methods. However, realistic ABMs would benefit substantially from a systematic method to identify the importance of computationally expensive assumptions. For example, if the outcome of the model was not sensitive to replacing an expensive assumption with a computationally cheaper one, this would be really useful for model building. At the same time, research on ABMs with a theoretical focus could benefit from the systematic approach to sensitivity analysis that we propose, as it would allow a researcher to develop new insights about the importance and interaction of model elements. Yet, authors are not rigorous about mapping goals to methods, and do not vary models elements other than parameters (and usually only a subset of parameters is pre-selected without a formal procedure), possibly due to the lack of a systematic approach. We aim to fill these gaps in the remainder of our work.

In this respect, our work is related to Lorscheid et al. (2012). The authors propose a systematic approach to opening the black-box of simulation models. They are interested in the whole pipeline of model analysis, including the formulation of the research question. Our work focuses on the sensitivity analysis of ABMs. Given this difference in goals, the two papers focus on different aspects. Lorscheid et al. (2012) include a pre-experimental phase to decide the number of simulations that need to be run to achieve stable results, and they also consider an iterative approach to select interesting parameter ranges. This can be seen as a premise to our approach, which addresses the goals that can be answered by sensitivity analysis; we elaborate on the nature of ABMs creating a conceptual map of the elements of an ABM that can be subjected to sensitivity analysis. We also exploit recent results showing that new sensitivity measures can be extracted from the design; we consider new visualization methods that address different sensitivity goals.

3 The elements of agent-based models

In this section, we propose a conceptual structure for ABMs, with the goal of facilitating sensitivity analyses. Our structure classifies the elements of an ABM into a number of sets and subsets. Admittedly, it is not the only conceptual framework that can describe an ABM, and sometimes the distinction between elements might be blurred, subjective or dependent on the specific model under scrutiny. Nonetheless, our aim is to classify the “moving parts” of an ABM that can be subjected to sensitivity analysis. We first describe the conceptual structure at a theoretical level, and then illustrate our framework using the Garbage Can Model.

3.1 Theoretical structure

Our conceptual framework is represented in the diagram of Fig. 1.

Fig. 1
figure 1

We identify four types of elements: principles, assumptions, parameters, and procedures. Some parameters are associated with agents, as are some procedures. The capital letters in the diagram identify relevant sets discussed in the text

All elements of the ABM are contained in the outer rectangle. They can be classified into six sets and subsets:

  • Principles (A) are high-level elements that define the nature of the ABM. Principles do not include specific algorithmic implementations; rather, they are conceptual guidelines that influence the modeler in formulating specific procedures or in choosing certain parameters. This is illustrated in Fig. 1, in which principles are represented outside of the rectangle containing all practical assumptions of an ABM. We consider principles to be out of the scope of sensitivity analysis, in the sense that varying principles would lead to a different ABM rather than a sensitivity analysis of a given ABM.

  • Assumptions are low-level elements that define a specific implementation of an ABM. As such, we would like to include as many assumptions as possible within the scope of a sensitivity analysis. While most assumptions are instances of procedures and parameters, some are not. Such assumptions are identified with the letter B in Fig. 1. Telling principles from assumptions is not always straightforward in practice: changing some assumptions may lead to analyzing a new model. We address this issue further in the discussion section.

  • Parameters are a specific subset of assumptions. They are cardinal quantities that influence the evolution of the model but are determined outside of the simulation run, either at the initialization stage or through a direct intervention of the modeler. Some parameters (C) characterize the environment or define general properties of the simulation (e.g., the number of time steps), while other parameters are closely associated to agents and determine their properties (D).

  • Procedures are also a subset of assumptions. They can be defined as algorithmic prescriptions that determine the time evolution of the ABM. Unlike parameters, procedures are not cardinal, nor are they drawn from an easily defined set of possibilities. Procedures and parameters are sometimes intertwined, in the sense that certain procedures regulate the distribution of parameters, and certain parameters co-determine the effects of procedures. Certain procedures define the main mechanisms through which the model works or through which agents’ attributes are initialized (E), while other procedures define what an agent “does”, and are also commonly known as behavioral rules (F).

Changing the assumptions that characterize agents makes it possible to conduct a sensitivity analysis “at the agent level”, that is to evaluate the impact of agents on simulation outcomes.Footnote 1

3.2 Illustrating the structure: the garbage can model

To illustrate the conceptual structure outlined in Sect. 3.1, we consider the Garbage Can Model (GCM). We first provide a summary description of the GCM and then we map out the elements of the model following the conceptual structure of Fig. 1. We focus on the Fioretti and Lomi (2008, 2010) implementation, which takes the most careful steps to match the ABM to the original verbal theory of Cohen et al. (1972). In Appendix 2 we include more details on the software implementation, while we refer to Fioretti and Lomi (2008, 2010) for greater details.

3.3 Description of the GCM

Fioretti and Lomi model four classes of agents: participants, who can make decisions; choice opportunities, openings for participants to make decisions; problems concerning conditions or people inside or outside the organization; and solutions in search of problems.Footnote 2 The organization is depicted as cells on a grid, and over time, participants, choice opportunities, problems, and solutions move–randomly and independently–on the grid. When at least one choice opportunity, at least one participant, and at least one solution happen to be collocated in a cell, the participants can make a decision. This is easy if, by chance, no problem is in the cell. The participants just declare a decision, even though it solves no problem! Cohen et al. (1972) refer to this as a decision by oversight.

Matters are more challenging when one or more problems are in the cell along with at least one choice opportunity, participant, and solution. Then participants can make a decision only if the solution is good enough to solve the problem(s). In particular, participants are characterized by a level of ability, a cardinal variable that takes a value between two extremes (minimum and maximum ability). Likewise, solutions have a certain efficiency, and problems have a certain difficulty, both taking values between a minimum and a maximum. If the sum of the abilities of the participants who are present multiplied by the efficiency of the most effective solution in the cell is greater than the sum of the difficulties of problems in the cell, then the participants can make a decision that solves the problems. Cohen et al. (1972 refer to this as a decision by resolution. A decision by resolution is the most desirable outcome, as it associates a solution to a problem.

If the participants in a choice opportunity lack a solution good enough to solve their problems (because their ability is too low and/or the efficiency of the solutions is too low), they are blocked–unable to make a decision or to move on. They remain stuck until another choice opportunity, moving randomly, happens into their cell. The newly-arrived choice opportunity then grabs the most difficult problem and wanders off with it, freeing up the participants and solutions. Cohen et al. (1972) refer to this as a flight. In subsequent wandering, the now-freed participants might very well stumble onto the same vexing problem again.

A final twist in the model concerns organizational structure. The model described so far corresponds to an anarchy setting. In their ABM, Fioretti and Lomi (2008, 2010) also allow a hierarchy setting. In a hierarchy, choice opportunities are always assigned a ranking from most important to least important. Participants can also be ranked relative to one another, as can problems and solutions. The rankings are then used as follows:

  • When hierarchy is applied to participants (that is, the participant structure is hierarchical rather than anarchic), each participant is allowed to be part of choice opportunities as important as, or less important than, herself. So the most important participant can take part in all choice opportunities, but less important participants can take part only in less important choice opportunities.

  • When hierarchy applies to problems (i.e., the problem structure is hierarchical), each problem can be considered in choice opportunities as important as, or less important than, itself. The most important problems can be considered in all choice opportunities, but less important problems are on the table only in less important choice opportunities.

  • When hierarchy applies to solutions (i.e., the solution structure is hierarchical), each solution can be considered in choice opportunities as important as, or less important than, itself. The most important solutions can enter all choice opportunities, but less important solutions are up for consideration only in less important choice opportunities.

The model allows researchers the flexibility to apply hierarchy rather than anarchy just to participants, just to problems, just to solutions, or to any combination of the three.Footnote 3

3.4 Elements of the GCM

To illustrate our conceptual structure of Fig. 1, in this section we map various elements of the GCM.

  • Principles (A): A key principle of the GCM is organized anarchy: decision makers work on problems they have stumbled upon, using solutions that happen to be available. This can be contrasted with game-theoretic models of industrial organization, in which agents with clear preferences seek rational solutions to well-defined problems. Another distinguishing principle of the GCM is independence between the objects of decision-making. Indeed, choice opportunities, participants (decision-makers), solutions and problems exist independently of one another. This principle distinguishes the GCM from other models where, instead, solutions exist only attached to problems.

  • Assumptions that are neither parameters nor procedures (B): The assumption that agents move on a regular square grid as opposed to a more sophisticated structure such as a network fits into this class.

  • Parameters that are not agent parameters (C): A first example of an element in this class is the number of choice opportunities, as it determines how many agents of this type exist, but not their properties. Note that the number of choice opportunities is an assumption rather than a principle: in the original implementation of the GCM (Cohen et al. 1972) there was only one choice opportunity, but some scholars have since argued that multiple choice opportunities are more in line with the verbal theory of the GCM (Bendor et al. (2001; Fioretti and Lomi (2010). Thus, the number of choice opportunities does not seem to lead to conceptually different models. Another example of an element in class C is the size of the grid on which agents wander.

  • Agent parameters (D): The minimum and maximum levels of ability, efficiency and difficulty determine key attributes of the agents, so we consider them as “agent parameters”.

  • Procedures that are not agent procedures (E): As an example of a procedure in this class, we consider the rule for assigning values to ability, efficiency and difficulty. This procedure assigns these values to agents by sampling the interval delimited by the minimum and maximum values above uniformly at random. While it determines an agent’s parameter, it does not define what an agent “does”, and so we do not think of it as an agent procedure.

  • Agent procedures—or behavioral rules (F): The participant, problem and solution structures determine the access of participants, problems and solutions to choice opportunities, and so clearly influence how agents behave. Thus, we consider these procedures as behavioral rules.

4 A systematic process for sensitivity analysis: six steps

In this section, we outline a process in six steps to make sensitivity analysis of ABMs systematic. We deliberately keep the discussion as general as possible, leaving all mathematical details and most references to Appendices 2 and 3. Also, for illustration purposes, we shall focus on traditional parametric elements in this section, leaving the extension to non-parametric elements to Sect. 5. We visualize the steps in Fig. 2.

Fig. 2
figure 2

Six steps for sensitivity analysis of agent-based models

4.1 Output of interest

The first step of the process is choosing the output(s) of interest. Indeed, as noted in Lee et al. (2015), usually ABMs produce a multiplicity of outputs. One or more of the outputs could be quantities of interest, as long as they are deemed relevant by the analyst or by the decision maker. Moreover, the ABM response is frequently stochastic. For instance, in the Garbage Can ABM, the number of decisions by resolution or by oversight is stochastic with respect to the simulation inputs. Formally, in stochastic simulators the output Y is the conditional distribution of the quantity of interest given inputs X. Not infrequently, analysts are interested in one or more summary statistics or functions of such distribution. For example, quantities of interest can be a moment (the expected value, the variance of Y), a quantile (the median, the 95 or 99 quantile) or the probability that Y is above a given threshold. All these quantities can be the target of a sensitivity analysis.

4.2 Goal

After the analyst has determined the quantities of interest, she decides on the goal of the sensitivity analysis (this is also known as setting in the sensitivity analysis literature). A variety of goals are possible. The analyst may be wishing to increase her own understanding of the model behavior, or might be asked to test theoretical aspects as part of a broader research investigation, or may be required to deliver robust managerial insights to a stakeholder. Broadly, one may be interested in the relative importance of alternative model elements (factor prioritization), or in whether they increase or decrease the quantity of interest (direction of change), or in how they interact (interaction quantification), or in whether conclusions drawn from the model are robust with respect to variations in the inputs (robustness analysis).

4.3 Elements

A third step is to decide which elements of the ABM to vary. This phase requires a critical review of the model, its main principles, assumptions, parameters, and procedures. For instance, varying a given element may not make managerial sense in a given application, and in this case it would be safe to exclude that element from the sensitivity analysis. The structure laid out in Sect. 3 is a useful reference to the researcher for classifying the element(s) at hand. It helps in balancing which elements are left out of the analysis (e.g., it is hazardous to focus entirely on parameters and to ignore procedures—see Sect. 7 for further discussion).

4.4 Sensitivity method/design

A fourth step is to choose the most appropriate method for each goal-element combination, and the related experimental design, that is, the choice of how to sample points in the input space. The choice of the method also implies the choice of the scale, local or global. Let us start from a local sensitivity analysis, namely a sensitivity analysis around a particular point in the input space (also known as scenario). A natural choice for Agent-Based Models is to use finite difference methods. Finite differences are given by the difference between the output at a base scenario and the output at alternative scenarios, obtained by varying one or more of the inputs. The relevant sensitivity measures are main, interaction and total effects (mathematically defined in Appendix 2). They capture, respectively, the individual influence of each input, the influence of an input when varied together with other inputs net of its individual effect, and the sum of the two. Total effects can answer a factor prioritization goal, while main and interaction effects can be used to address direction of change and interaction quantification. Also, the examination of model output across scenarios addresses robustness.

The evaluation of the model at several locations in the input space is the basis for a global sensitivity analysis. Differently from a local analysis, a global sensitivity analysis implies considering the response of the model with inputs sampled across the entire input space, or at least across the portion of the input space that the analyst deems most relevant Saltelli et al. (2008)—see Appendix 2 for mathematical aspects. This exploration is often performed through various forms of Monte Carlo experimental designs. The selection of the global sensitivity method depends on the goal. For factor prioritization, variance methods are a classic set of tools. According to these methods, the most important inputs are those that explain most output variance. Moment-independent methods are a more recent approach that exploits the full distribution of the output given the inputs. For direction of change, popular techniques are, among others, gradients computed at randomized locations and partial dependence functions. For interaction quantification, the analyst can use high-order variance-based methods: instead of looking at which individual inputs explain variance the most, these methods consider the contribution to variance of two or more inputs that are varied together. Finally, the examination of the output behavior across the Monte Carlo-generated scenarios helps the analyst in conducting a robustness analysis.

4.5 Assignment of values

Once the goal and the method/design of the analysis have been identified, the researcher selects numerical values for the parameters. The type of assignment depends on the design. For instance, if the analyst is considering a local analysis, she needs to assign reference values that form a basic scenario (usually called base case) and variation ranges/levels to the inputs to reach other scenarios (sometimes called best case or worst case). Conversely, if the analysis is global and the design is probabilistic, the analyst needs to specify values in the form of ranges (supports) and distributions for the inputs. Value assignment is a delicate step in ABMs: One may have constraints on parameter changes dictated by procedures or other model elements. That is, a choice for a procedure may limit the range of “reasonable” values that a parameter can take—Sect. 7 provides further discussion.

4.6 Results communication/visualization

The last step is the choice of how to visualize the results of a sensitivity analysis. For any combination of goal/element/method, there may be one or more appropriate visualization tools. For example, for factor prioritization and interaction quantification, parametric elements and calculation of main, total and interaction effects (local analysis), bar-charts or tornado diagrams Eschenbach (1992) are suitable. If instead one is interested in direction of change, a variety of tools are available, ranging from spiderplots Eschenbach (1992) to ICE plots Goldstein et al. (2015). In this respect, we note that ICE plots have been defined for simulators whose output is deterministic. Most ABMs have a stochastic response. We then propose a modification of ICE plots that takes the stochastic nature of the response into account. Specifically, because the response of a (stochastic) ABM simulator is, in fact, a distribution, we introduce a test for the statistical significance of the difference in mean values. The test allows us to distinguish whether a small but non-null change in a mean (or conditional mean) is indeed significant or just the consequence of numerical noise.

5 An experimental design for including non-parametric elements

Local and global methods have been mainly conceived for inputs which are continuous or discrete real numbers. However, ABMs include also elements such as behavioral rules and interaction procedures, that do not belong to a well-defined numerical space.

While it is not always necessary to vary all the elements of an ABM simultaneously, it is important to stress that a researcher who is performing a sensitivity analysis involving only parameters is implicitly fixing a substantial portion of an agent-based simulation. Suppose one performs a global sensitivity analysis on all of the model parameters. As Fig. 1 highlights, only a fraction of the ABM would be tested (subsets C and D). Using the example of the GCM, the effect of the participant, solution and problem structures on simulation outcomes would be ignored.

To address these issues, we treat the alternative specifications of a non-parametric element as the levels of a categorical variable (see Appendix 3 for all technical details). The levels may or may not have an ordinal meaning. We then propose to perform a full factorial design that comprises all possible combinations of (a) levels of the categorical variables associated with non-parametric elements and (b) two or more discrete parameter values for each parameter (the choice of a full factorial design is not restrictive; computationally cheaper designs can be also exploited—see Sect. 7).

This design makes it possible to address sensitivity analysis goals in a way that is in between a local and a global approach. In particular, it allows the researcher to compute finite difference measures (main, interaction and total effects) at several locations of the input space and to obtain global measures as appropriate functions of these local effects.

For example, as we show in Appendix 3, from the variance of the main effects it is possible to estimate the total order variance-based sensitivity indices, a recommended global importance measure (Saltelli et al. 2008). This allows the researcher to address the factor prioritization goal. The same sample allows also the calculation of moment-independent global sensitivity measures. We should emphasize that obtaining global measures as appropriate functions of local measures is by no means novel in sensitivity analysis (Morris 1991). Nonetheless, our approach extends this idea to non-parametric elements, making it possible for example to assess the contribution of a behavioral rule to output variance.

Our design also allows the researcher to address additional goals besides factor prioritization. Regarding direction of change, for a given input, the design allows one to consider all pairs of points that are only different because of that input. One can then compute the resulting Newton quotients at all such pairs of points. We propose to visualize this information in Stochastic Individual Conditional Expectation (S-ICE) plots that take into account the stochastic nature of the ABM response. We will describe these plots when presenting our example of the GCM in Sect. 6 (Appendix 3 reports additional details).

Our design also allows the researcher to quantify interactions of a parameter and a procedure. To do so, the researcher evaluates the simulator response at all pairs of points that differ in two or more inputs. It is then possible to calculate all corresponding interaction effects as well as second-order Newton quotients. We plot them as histograms to facilitate visualization. Such visualization shows heterogeneity of interactions across scenarios, enriching the insights coming from reporting a single number that represents a global interaction between inputs.

Finally, regarding robustness, we show how S-ICE plots can be used to test the robustness of conjectures on the model (we refer the reader to the example of the GCM in the next section).

6 The garbage can model as an illustration

This section offers an illustration of our approach through its application to the GCM.

6.1 Applying the six steps

6.1.1 Output of interest

Recall from Sect. 3.3 that decisions in the GCM can happen by oversight, flight or resolution. A notable outcome of many simulations is that decisions by resolution are far from ubiquitous. That is, decisions as envisioned in more traditional models of organizations, in which participants gather to apply a solution directly to a problem, are not at all the only thing that happens. Quite often, simulated participants make empty, symbolic decisions that solve no problems (oversight) or get bogged down in intractable issues (flight). The portion of decisions by resolution is therefore a natural quantity of interest in GCM simulations and a focus of previous literature (Takahashi 1997; Fioretti and Lomi 2008, 2010). In this section, our sensitivity analysis focuses solely on the fraction of decisions by resolution for reasons of space; the same analysis can be conducted for other outputs. To deal with stochasticity, for each set of inputs we perform 100 simulations with different random seeds, and report the average fraction of decisions by resolution across these simulations. Each simulation lasts 1000 time steps; this is enough for results to be sufficiently stable, in the sense that we observe that the output of interest shows little variability across simulation runs. Concerning this issue, the recent work of Vandin et al. (2021) proposes a way to automatically select the number and duration of simulation runs, provided some requirements.

6.1.2 Goal

We consider obtaining insights regarding all four main goals: factor prioritization, direction of change, interaction quantification and robustness analysis.

6.1.3 Elements

We select some elements of the GCM to illustrate the range of types of elements shown in Fig. 1. Of the elements that we consider (see Table 1), the number of opportunities and the minima and maxima of ability, efficiency and difficulty are parametric elements (D and F in Fig. 1, respectively), while participant, solution and problem structures (E) are procedures and thus non-parametric elements.

In contrast, we decide not to vary other elements of the GCM: these are the principles (A), which by definition are not candidates for sensitivity analysis; the size of the grid (D), which comprises 195 cells (15 in one direction and 13 in the other direction), and the number of participants, solutions and problems (D), which are 25 each; and the assignment rule of ability/difficulty/efficiency (C). We also do not vary the structure of the grid—a regular square lattice (B).

Looking back at Fig. 1, we notice that we vary elements in all main sets of our classification, suggesting that our exploration of the model touches all main classes of elements. Our conceptual structure also suggests which parts of the model are left untested. Since we do not subject the assignment rule of ability/difficulty/efficiency to our sensitivity analysis, we are not varying any non-agent procedure (C in Fig. 1).

6.1.4 Sensitivity method/design

We use two variance-based indicators (first order and total order indexes) and two moment-independent methods, namely the \(\delta\) and \(\beta ^{Ku}\) importance measures Baucells and Borgonovo (2013) (see Appendix 2 for the definitions). Using alternative measures has the advantage that they look at alternative properties of the output distribution. If alternative measures concur in indicating an input as important, this indication is more robust than the one coming from a single sensitivity measure.

Table 1 Inputs for the GCM sensitivity analysis

6.1.5 Assignment of values

We assign participant structure, problem structure, and solution structure (which are non-parametric elements) to categorical variables \(X_1\), \(X_2\), and \(X_3\). These can take two values: \(X_i=\{ A,H \}\), with \(X_i = A\) representing anarchy and \(X_i = H\) representing hierarchy (\(i=1,\,2,\,3\)). Only considering these two options is consistent with the organizational structures of the original Garbage Can Model and with the findings of Fioretti and Lomi (2008), who identified Anarchy and Hierarchy as the most relevant structures. The remaining elements that we test are parametric. According to our design, we select a few discrete values that the parameters can take. In particular, the number of choice opportunities can take a base value of 25 (as for all other classes of agents), and an incremented value of 40, which is sufficiently different to the baseline to generate a substantial change in the output. We assign three values to each of the remaining parametric variables: the two extremes and the central value of their respective ranges—see Table 1. Ranges are selected to avoid overlap of the minimum and the maximum values of attributes.

We consider all possible combinations of elements, giving rise to a full factorial design with 11, 664 input configurations (scenarios). Evaluating the ABM at these scenarios, allows one to compute 5,832 main effects for each element taking two values, \(X_1:X_4\), and 11,664 for each element taking three values, \(X_5:X_{10}\). Also, it is possible to calculate a total of 664, 848 pairwise interaction effects.

6.1.6 Results communication/visualization

For result visualization we use the tools discussed in Sect. 5.

6.2 Results of the numerical experiments

Across the 11,664 scenarios, we obtain an average of \(10.3\%\) decisions by resolution, \(65.4\%\) by oversight and \(24.3\%\) by flight, corroborating the original insights of the GCM. From now on, we focus on the share of decisions by resolution.

Fig. 3
figure 3

Ranking of the GCM inputs according to first- and total-order variance-based, \(\delta _i\) and \(\beta ^{Ku}_i\) moment-independent measures. Sorting is based on the total-order indexes \(\tau _i^2\)

6.2.1 Factor prioritization

Figure 3 displays a Pareto-chart visualizing the ranking of the elements. Consistently across the four sensitivity measures, problem structure is the most important element. The number of choice opportunities is the second most important element in affecting the share of decisions by resolution, while solution and participant structure are ranked third and have a similar importance.

Overall, assumptions regarding non-parametric elements impact the output of the GCM more than assumptions regarding parameters. One may argue that this result is driven by the interval we chose for parameters — that is, between 0 and 2 (or 0.0 and 0.2) for minimum values, \(X_5, X_7, X_9\), and between 8 and 10 (or 0.8 and 1.0) for maximum values, \(X_6, X_7, X_{10}\). The argument would be that choosing relatively narrow intervals in which parameters are varied could artificially result in procedures looking more important than parameters. To test the robustness of our result, we assigned alternative ranges to minimum and maximum values, \(X_5:X_{10}\). In particular, we repeated the calculations letting \(X_5\) and \(X_9\) take values between 0 and 4 and \(X_6\) and \(X_{10}\) take values between 6 and 10 (we also let \(X_7\) and \(X_8\) take values between 0.0 and 0.4, and 0.6 and 1.0, respectively). These intervals are almost as wide as they can be relative to each other, as it would not make sense to have a minimum value of, say, ability, be larger than its maximum value. Re-running the analysis, the findings remained unchanged.

Fig. 4
figure 4

S-ICE plots for the share of decisions by resolution as a function of the elements. Red: negative change. Blue: positive change. Grey: change is not statistically significant. (Color figure online)

6.2.2 Direction of change

Figure 4 displays the S-ICE plots for the inputs. These graphs can be read as follows. Consider the top left panel as a reference. The horizontal axis reports participant structure (\(X_1=A\) is anarchy, while \(X_1=H\) is hierarchy). The vertical axis reports the expected share of decisions by resolution conditional on the value of \(X_1\) in all scenarios (Because there are several scenarios, it is difficult to distinguish small black dots on the vertical axis, which instead appear as a continuous vertical black line). The large black dots on each vertical line represent the average percentage of decisions by resolution across all scenarios. For instance, given a participant structure that is anarchic (A), on average about \(10\%\) of the decisions are taken by resolution. This fraction increases on average to about \(11\%\) when participant structure equals H—technically, these two dots form the graph of the corresponding partial dependence function, which, in our design case, is discrete. This information is complemented by the lines joining the smaller black dots at \(X_1=A,H\), whose purpose is to show the change in the portion of decisions by resolution, scenario by scenario. These lines join pairs of small black dots such that an input in a given scenario is switched to an alternative value in the other scenario, with all other elements unchanged. More specifically, in the first panel, the lines join pairs of conditional expectations obtained when participant structure is A in a given scenario and switched to H in the other scenario, with all other elements fixed. As we mentioned, for each scenario we run 100 simulations to estimate the conditional mean of the share of decisions by resolution in that scenario. Hence, the value we are dealing with is a sample mean; to assess whether the observed change is different from zero, we need to perform a statistical test. We use a two-sample t-test at 5% significance level. If the null hypothesis is not rejected the corresponding line in Fig. 4 is gray; otherwise it is blue or red depending on whether the change is positive or negative. In the first panel, we observe a majority of blue lines, indicating that the switch of participant structure from anarchy to hierarchy increases the portion of decisions by resolution in the majority of scenarios. At the same time, not all lines are blue, i.e., there are scenarios in which the opposite occurs and thus the output is not monotonic in this variable.

We now comment on the other panels. The solutions structure panel (first row, second panel in Fig. 4) delivers insights similar to the participants structure panel. The problem structure panel (first row, third panel) yields instead a different message. When problem structure is A (anarchy), we expect about \(11.5\%\) decisions by resolution; this number decreases to about \(9\%\) when problem structure is hierarchical. Thus, the switch of problem structure from A to H leads to a decrease (significant percentage-wise) in the average portion of decisions taken by resolution. Note that virtually all lines are red: for almost all combinations of all other inputs, the switch of problem structure from anarchy to hierarchy leads to a decrease in the portion of decisions by resolution. (See the paragraph on robustness below for additional comments on this property.) Moving to the following panel, it is possible to see that the number of choice opportunities, on average, has a positive effect on decisions by resolution; however, the plot shows two well clustered sets of positive and negative realizations. This is a sign that this element is involved in interactions with other inputs. In fact, it can be shown (see Appendix 3) that if an input is binary and the model is additive in that input, then there is only one possible slope for the one-way lines in an S-ICE plot. Thus, changes in slope denote non-additivity. Finally, panels for minimum and maximum values of ability, efficiency and difficulty (\(X_5:X_{10}\)) exhibit a weaker but regular impact, with ability and efficiency parameters having on average a positive effect on decisions by resolution, and difficulty parameters having on average a negative effect. Overall, the panels in Fig. 4 signal a non-monotonic behavior of the quantity of interest as a function of the inputs. Traditional local sensitivity analysis methods that vary one input at a time would thus be inadequate to study this output.

6.2.3 Interaction quantification

We then come to our third goal, understanding interaction effects quantitatively. Overall, there are 45 possible pairwise interactions among the 10 variables we focused on. The first question we address is whether pairwise interactions are significant in determining the model outcome. To answer this question, we fit a linear regression model with all 45 interaction terms on the input-output data (we use the subroutine fitlm.m available in Matlab). Results show that 43 pairwise interactions are statistically significant at the \(1\%\) threshold for decisions by resolution. Thus, interactions do matter in the GCM. Another indication in this sense comes from the sensitivity measures in Fig. 3. A visual inspection of the values of the first- and total-order variance-based sensitivity indexes shows several cases with a large discrepancy between these measures. For example, the values of the first- and total-order indexes for number of opportunities and participant structure in the Pareto chart of Fig. 3 reveals low first-order indexes and large total-order indexes, suggesting that these inputs owe a significant part of their importance to their interactions with the remaining inputs.

Fig. 5
figure 5

The five strongest interactions for decisions by resolution

We gain information about the signs of interactions through second-order finite difference interaction effects. In Fig. 5, we report the histograms of the normalized second-order Newton quotients for the five most important pairwise interactions. For each interaction, in the legend we also display the mean value, the standard deviation, and the percentage of the scenarios that lead to a positive interaction. Note that interactions are considered positive when increasing both inputs together leads to an increase in output, by more than what could be expected by raising the two inputs individually. This definition only applies to elements that have a notion of ordinality; when considering categorical variables, this interpretation does not apply. To study interactions between numerical and categorical elements, by convention we consider \(\{ A,H \}\) as an ordered set in which A is lower. Therefore, positive interactions between a parameter and a categorical variable mean that output increases when the parameter is increased together with one out of problem/solution/participant structures switching from anarchy to hierarchy across two scenarios.

We find that the number of opportunities is involved in three of the five most important interactions. Specifically, the strongest interaction is between number of opportunities and problem structure. The interaction is positive in \(98.8\%\) of the cases and has an average effect of \(+2.8\%\) on the share of decision by resolution, a large impact when considering that the average number of decision by resolutions is \(10.4\%\). The second most important interaction is the one between participant structure and problem structure. This interaction is mostly positive (\(96\%\) of the scenarios). Note that, had the analyst focused only on parameters, the sensitivity analysis would not have revealed the most important interactions.

6.2.4 Robustness

Let us suppose for a moment that the experiments on the GCM were carried out with the goal of validating the following two conjectures:

  1. 1.

    C1 A hierarchical problem structure decreases efficiency in decision making.

  2. 2.

    C2 A higher number of opportunities increases efficiency in decision-making.

Here, what we mean by an increase in efficiency is an increase in the fraction of decisions taken by resolution.

We can check if these conjectures are true by considering each possible line in the S-ICE plots. Conjecture C1 is confirmed by the experiments of the ABM. Indeed, in the third panel of the top row of Fig. 4, all lines are red and none are blue. There would a few cases in which the change in problem structure from anarchy to hierarchy slightly increases the expected number of decision by resolutions. However, it turns out that none of these increases is statistically significant, and indeed these lines are colored grey in our representation. These cases are a byproduct of the stochasticity of the model and we expect them to drop to zero for a large enough number of replications. In sum, we cannot find any statistically significant case in which a hierarchical problem structure decreases the share of decisions by resolution. Consequently, there is not enough statistical evidence to reject the first conjecture.

Conversely, conjecture C2 is not robust. Indeed, the cluster of red lines in the fourth panel in Fig. 4 shows that the opposite of the conjecture occurs in 26.5% of the scenarios, and many of these cases are statistically significant. In particular, we find that this behavior is critically dependent on the interaction with problem structure. When problem structure is hierarchic, the number of opportunities has a consistently positive effect (in 100% of the scenarios); yet, the effect is mixed when problem structure is anarchic, with a positive effect in 47% of the cases and a negative effect in the remaining.

6.3 Some managerial insights

Managerial insights are obtained when a question of interest for an operations research/management science investigator is answered by interrogating the ABM at hand. To illustrate, consider a researcher who is using the GCM to understand the combination of elements necessary to achieve a higher share of decisions by resolution (this number might be considered a proxy of the effectiveness of an organization). As is observable in the problem structure panel of Fig. 4, while the impact of a hierarchic problem structure on decisions by resolution is almost always negative, sometimes this effect is milder. This is because of an interaction effect: the negative impact can be mitigated by an increased number of opportunities and anarchic participant and solution structures (Fig. 5). Thus, the combination that leads to the highest share of decisions by resolution is the one with hierarchic participants and solutions structures, anarchic decisions structure, and a low number of choice opportunities. Note that, had the researcher performed a simple series of one-at-a-time sensitivities (equivalent to considering only individual effects), the researcher would have naively picked the structure in which problems and solutions freely float in the organization as optimal.

More generally, interaction analysis reveals a further insight: the strongest interaction effect is the one between number of opportunities and problem structure, which can be seen as complementary variables given the sign of their interaction. Suppose that an organization has a hierarchic problem structure and a large number of choice opportunities. Because the interaction between these two variables is positive, the decrease in the decision-by-resolution share generated by the hierarchic problem structure is mitigated by the simultaneous variation of the number of opportunities.

A further insight for a research using the GCM is understanding what determines the share of decisions by resolutions. All in all, our analysis shows that since the presence of a problem is the defining difference between a decision by resolution and a decision by oversight, it is problem hierarchy that leads to the greatest decline in decisions by resolution. An organization that wants to avoid a proliferation of problem-free decisions should take care to spread problems out widely and avoid having lots of problem-poor, solution-rich decision makers looking for work.

To learn a broader lesson from these results, our sensitivity analysis approach can suggest where further modeling efforts could be fruitful. In our example, given the importance of problem structure, one may think of building sub-models that simulate access of problems to choice opportunities at a finer granularity than the anarchy-hierarchy procedures in the current GCM implementation.

7 Discussion

We now discuss in depth a number of possible issues that we mentioned in the previous sections.

7.1 The case of computationally heavy models

Thanks to the fast execution time of the GCM, we were able to explore all input combinations, but our approach may help the analyst also when this is not possible. On the one hand, consider an analyst wishing to apply our design to a time consuming model. Given a budget of available simulator runs, she can reduce the number of model evaluations in several ways, for instance by considering groups of inputs rather than individual assumptions (Saltelli et al. 2008); as an alternative, she might apply a so-called fractional factorial design, which allows the exploration of the model input space at a lower number of locations than in a full factorial design. In using fractional factorial designs, the recommended procedure would be that the investigator pre-identifies potential interactions of interest (for instance, all second order interactions), and then chooses the design that allows the desired level of resolution. Here, the analyst has available a variety of choices based on orthogonal arrays (Morris et al. 2008) or supersaturated designs (Lin 2000). Moreover, (or alternatively) the analyst may fit the model response with an emulator, and focus on parametric sensitivity in a predetermined region of the model input space. If, according to statistical performance measures, the emulator fit is accurate, then the original (and time consuming) mathematical model can be replaced by the fast emulator, eliminating the computational burden restrictions (see Appendix 1 on works applying emulation in agent-based modeling).

On the other hand, the methodological part of our approach can provide guidance to analysts dealing with models for which computational burden does not allow one to apply the proposed design. Given high resource constraints, it becomes even more delicate to carefully examine the triplets element/goal/method to find the combination(s) that could maximize the insights on model behavior and aid managerial intuition. The analyst could apply our sensitivity analysis iteratively, in conjunction with model building. This would make it possible to avoid computationally heavy assumptions, if cheaper assumptions lead to similar results with respect to the goals of interest.

7.2 When is it sensitivity analysis and when is it a new model?

In Sect. 3.1, we defined principles as those elements of an ABM that characterize it: changing principles would lead to a new model. As it could be difficult to tell principles from assumptions, especially when considering alternative procedures, this might lead the analyst to the border between exploring a new model versus performing a sensitivity analysis of the same model. If the use of different procedures corresponds to the use of different theories, then, indeed, the analyst might be interpreting results that are in between a sensitivity analysis of a model and the comparison between two alternative models. To be concrete, consider the choice in the GCM between anarchy and hierarchy for, say, participant structure. Both anarchic and hierarchic participant structures are consistent with the principles of the GCM listed in Sect. 3.4. However, a procedure that computes the optimal assignment of participants in a way that the number of decisions by resolution is maximized would be in contrast with the first principle from Sect. 3.4, and thus lead to a new model. Our investigation would suggest that it is not possible to draw a sharp line for all ABM elements and all ABMs. Telling whether an assumption is in conflict with a principle depends on domain knowledge, and is outside the scope of sensitivity analysis.

7.3 Selection of elements

Choosing which elements are subjected to sensitivity analysis and which are ignored depends on many aspects. First, the analyst could disambiguate between elements that are central to the research question, which we call elements of interest, and elements that are needed for internal consistency of the model, which we call incidental. Under this distinction, the analyst could focus the sensitivity analysis on elements of interest. For instance, in the GCM, anarchy vs. hierarchy is certainly an element of interest, while the assignment of ability/efficiency/difficulty (our example element in class E in Sect. 3.4) is an incidental element, which we do not vary.

Other classifications of elements are possible. In terms of parameters, Smith and Rand (2018) suggest four possible criteria to select the ones appropriate for a robustness test. Their analysis suggests that model-altering parameters and parameters about whose value the research is uncertain should be part of a sensitivity analysis. Additionally, it can be possible to add controlling parameters, which have policy-intervention potential, and environmental parameters. The researcher decides which subset of these parameter groups should be included in the analysis, depending on what the research question is.

Finally, some combinations of elements may yield identical results simply through the mechanics of the model. For example, in the GCM, for values of grid size and number of agents around the current assignment, it turns out that it is equivalent to increase grid size or decrease the number of agents of each type, as what determines the number of potential decisions is the “density” of agents.—We actually performed additional experiments increasing the number of agents from 25 to 40 and keeping the density fixed, and results confirmed this assumption.—However, testing this effect for a larger number of agents might result in different insights: for instance the model might evidence the appearance of self-organizing behavior. More generally, understanding these equivalences requires a detailed knowledge of the simulator, which may not be possible for large and complicated ABMs.

7.4 Assigning values to input variables

To carry out a sensitivity analysis, the analyst is actually asked to make decisions on what element(s) to vary, and on what values to assign to the element(s) under scrutiny. This is a critical step, because the indications of the model as well as the results of the sensitivity analysis depend on the quantitative assumptions. However, this is a common problem for any sensitivity analysis method. We can distinguish alternative situations. In a modeling phase in which information collection on the inputs is partial, the analyst may be willing to assign wide or conservative (albeit subjective) ranges to the inputs, to gain preliminary insights about, among others, the correctness of the model behavior. In a second phase, the researcher might hone in on parameter ranges in which the model changes its behavior, e.g., when a previously found result switches its sign. For complex ABMs that attempt to replicate real world phenomena, relevant parameter ranges might be suggested by field experts. Varying procedures might further lead to the observation that the domain of procedures is actually infinite and that while an infinite domain for a parameter is tractable, such a domain is intractable for procedures. In this respect, our approach can even help the analyst to recognize that she is not in a position to make decisions concerning numerical assignments. Then, the quantitative part of our approach would not be applicable. It would remain an open research question whether such an impasse would be pointing the analyst towards using an alternative sensitivity analysis approach or towards additional modeling efforts or information collection, which are needed before a sensitivity analysis can be fully informative.

Also, the assignment of values cannot be separated from the structure of the model: constraints may impose to bind together certain elements, limit the magnitude of relative changes or even their possibility of varying individually. In certain situations one might not be able to disentangle individual effects by varying each input separately. These structural constraints then impact the design that can be chosen for a meaningful sensitivity analysis.

Our approach also opens further research questions. First, while sensitivity analyses on parameters mostly lead to further data collection, sensitivity analyses on procedures and behavioral rules can guide further research on how agents interact or behave. In this respect, our method could increase the synergy between lab experiments and ABMs, highlighted in the recent work of Smith and Rand (2018). Second, the issue about the border of an ABM, i.e., whether changing an element leads to a new model, actually raises the question of what is the border of sensitivity analyses.

8 Conclusion

This work contributes to the use of ABMs by studying their sensitivity analysis. We have proposed a general conceptual structure that classifies the main elements of an ABM and a six-step approach that allows the researcher to introduce an element-method oriented analysis. We have studied a design that enables the variation of potentially all ABM elements simultaneously. Our approach borrows from the literature on the design of computer experiments and adds the calculation of a variety of sensitivity measures; in particular, the design allows the randomized evaluation of finite change sensitivity indices for determining individual and interaction effects. For direction of change, we have introduced a convenient graphical representation modifying the well known ICE plots to account for the stochastic nature of the ABM response.

To illustrate our method, we have carried out a thorough sensitivity analysis of the GCM, varying three procedures and seven parameters. Among other results, the analysis reveal that the most important element is non-parametric and interaction occurs between a parameter and a non-parametric assumption; this interaction would have been overlooked if the sensitivity had focused only on parameters. Limitations of the approach and direction of future research have been underlined in the discussion section.