An Application of Calibration and Uncertainty Quantification Techniques for Agent-Based Models

In this chapter, a step-by-step application of calibrating an agent-based model is presented. In particular, an agent-based model for small-scale PV adoption was calibrated on the historical data for the small-scale solar PV capacity additions that took place in Greece from January 2010 to February 2013. The process of the model calibration allowed to (a) quantify and take into consideration uncertainties that are related to the characteristics and the decision-making criteria of the agents (i.e. independent PV power producers), in contrast to the more obvious uncertainties, such as technology costs, and (b) use the calibration results to explore the plausible—given the historical data—behaviour of the potential PV adopters in Greece under the new net-metering scheme (in effect as of mid-2015).


Introduction
Models try to narrow the differences between decision-makers' thinking, reasoning, representation and computing (Doukas 2013). Anytime we decide to develop a model, we have two choices. The first one is to rely on physical laws that we know they hold true or mathematical relationships that are valid as long as the underlying theory is valid too. The second choice is to come up with a plausible model that captures relationships and dynamics that we expect to be true and acknowledge the fact that many parts of the model will inevitably be arbitrary.
Agent-based models (ABMs) fall into the second choice (Flamos 2016). This means that developing a new ABM is an interesting endeavour, but unless we identify the uncertainty that governs the applicability of the model, it is just a modelling exercise. While ABMs give us increased flexibility in modelling complex systems driven by agent actions and interactions, they give rise to the question of when an ABM represents reality well enough to extrapolate into the future.
A common approach to validating the extent at which an ABM is realistic is through calibration using historical data. If done right, this process is an opportunity to deal with model uncertainty, which is the specific type of uncertainty that stems from the fact that there exist many variations of an ABM that are all plausible under the same set of historical data.
Agreement between model and data does not imply that the modelling assumptions accurately describe the processes producing the observed behaviour; it merely indicates that the model is one (of maybe several) that is plausible. As a result, dealing with model uncertainty during calibration allows us to address the trade-off between a very flexible model that can easily fit on the historical data and model uncertainty.
In this chapter, we present a method for quantifying model uncertainty. The ABM used aims at replicating the dynamics of small-scale PV adoption in Greece. Its outputs are determined by the values of its ten free parameters. As a result, before the model can be used for forward-looking simulations, these parameters must be constrained into a subspace of plausible values. Accordingly, the model was calibrated on the data for the period from 2010 to 2013. This period was the period with the largest additions in small-scale PV capacity (Anagnostopoulos et al. 2017;Papadelis et al. 2016). The calibration data corresponds to: a) Past prices and market conditions (prices for small-scale PV systems and the relevant Greek feed-in tariff (FIT) prices) b) The requests for grid connections, the records for which are available at the individual request level 1 The chapter is structured as follows: Sect. 2 presents a brief description of the employed ABM, Sect. 3 presents the concept of Gaussian process emulators that play a central role in the calibration method that was applied, Sect. 4 presents the details of fitting a Gaussian process emulator on the ABM results and Sect. 5 presents the method and results of the model calibration. The chapter concludes with a discussion.

The ABM for the Diffusion of Small-Scale Solar PV
The parameters that govern the model's behaviour are as follows: 1. Agent initial beliefs. Each agent has a private initial belief for the expected annual cash inflows from investing in 300 Wp of solar PV. This belief is expressed as a Gaussian distribution with a mean value μ CF and a precision ρ CF . Low values of ρ CF reflect flexibility in beliefs (i.e. little evidence is enough to shift the agent's beliefs). Low values of μ CF reflect pessimism about the payoffs of the investment. The initial beliefs of each agent are randomly drawn from two global probability distributions-one for the mean value μ CF and one for the precision ρ CF of each agent-while the parameters of both distributions are regarded as parameters that need to be set through calibration. 2. Social learning. To capture the effects of social learning, each agent receives information from the agents in its social circle that have already invested in PV. This information concerns the actual profitability of their investments so far, and it is used to update 2 the agent's beliefs. Accordingly, each agent has a social circle. This condition is modelled as a "small-word" network (Watts and Strogatz 1998). 3. Resistance towards PV investments. Each agent is characterized by their resistance towards investing in solar PV. Resistance is defined as a weighted sum of two parameters: (a) The profitability of the investment expressed by its payback period, so that the larger the profitability, the shorter the payback period, and, thus, the lower the resistance. We assume that agents are able to use their beliefs regarding the expected cash inflows to estimate the profitability of investing. Given the fact that the agents' beliefs are expressed probabilistically, we can calculate the following z i value, where i ¼ 1,2,. . . represents the years from the simulation's "now": where capex is the capital expenditure and d the discount rate. The term z i gives us the probability that the (discounted) payback period of the investment is equal to i. Making the assumption that all agents evaluate an investment based on when it will have paid itself back with probability 90%; this formula gives us the respective payback period i.
(b) The difference between the total number of agents in the simulation and the number of them that has already invested in PV-so that the larger the installed base, the smaller the resistance. Baranzini et al. (2017) provide evidence that both learning and imitation are important components of social contagion for the adoption of solar PV technology. By making attitude towards PV a function of the installed base, we aim to capture the imitation (social influence) aspect.
The weights of this sum are derived from a global probability distribution, the parameters of which are regarded as parameters that need to be set through calibration.
4. Propensity to invest. Each agent has a threshold value for their resistance parameter. When the latter falls under the threshold value, action could-but not necessarily will-be induced. For the calibration phase, we assumed that when agents decide to invest in a PV system, its size is given by the empirical probability distribution that was derived from the available historical data (Fig. 1).
Finally, Young (2009) notes that inertia is the simplest reason why innovations take time to diffuse: people delay in acting on new information. Accordingly, we have included inertia in the model by defining a global parameter (i.e. it has the same value for all agents in the model) that represents the probability that any agent would actually invest if their resistance threshold is crossed.

The Concept of Emulators
An emulator is a probabilistic approximation of the ABM. The probabilistic nature of the emulators makes them ideal for the quantification of the uncertainty regarding their estimations, as well as of the way the uncertainty of the ABM parameters gets propagated into its results. In addition, emulators can be evaluated significantly faster in comparison to the actual model, which allows us to employ Monte Carlo sampling methods that would be otherwise prohibitively expensive in terms of computation resources. In the application presented in this chapter, Gaussian processes are used as emulators. Accordingly, a brief description of Gaussian processes is provided.

Gaussian Processes for Regression
Suppose that we have some function f(x) and we want to estimate the value of this function at the input points x 1 , x 2 and x 3 . As long as we believe that a suitable prior of each variable f(x) could follow a Gaussian distribution, the marginalization property of the Gaussian allows us to write where m f (x i ) is the mean value of f(x i ) and σ 2 f x i ð Þ its variance. If we also believe that the values of this function are related to each other (i.e. knowing one can tell us something about the others), we can assume that the function values follow a joint Gaussian distribution that imposes covariance between these values. Covariance is introduced by defining a covariance function k f (x, x * ) that models the covariance between the function values at different inputs x and x * : For the case of the three-dimensional Gaussian distribution of this example, the prior distribution becomes now of the form: (3) can be written in a more compact form as: where the mean function m f (x) reflects the expected function value at input x: A Gaussian process (GP) generalizes the aforementioned multivariate normal to infinite dimensions. It is defined as a collection of random variables f, any finite number of which has a joint Gaussian distribution (Rasmussen and Williams 2006). If we assume a GP prior over a set of function values f, we actually assume that: The covariance function k f is called the kernel of the GP.
2. Their joint distribution is also Gaussian. For a sample X ¼ {x i , i ¼ 1, . . ., n} consisting of n pairs of inputs, the probability density function of the f 's values is where K ff (X) is the covariance matrix obtained by evaluating the kernel k f on all data points in X: Once a mean function and a kernel have been chosen, we have actually decided on a prior for the function f. This is a distribution over possible functions. Since it is not conditioned on any observed data, it represents our prior assumption of the function space from which the data may have been generated.
When we do observe some actual data D ¼ {(x i,: , y i ), i ¼ 1, . . ., n}, the GP prior is conditioned on D to obtain a posterior GP process that represents all functions that are consistent with both the observed data and the prior. This posterior GP process p ( f | D) is used as a proxy for the original ABM.
The main advantage from using a GP is that it becomes analytically tractable to estimate the values of the approximated function on inputs that are not part of the observed data. In particular, we can make predictions for new inputs X * by drawing f * from the posterior p( f | D). This posterior predictive distribution can be written as 3 : This means that we can fit a GP on the results from a small number of parameter combinations for the ABM and then use the GP's posterior predictive distribution to estimate the model's results for other parameter combinations without a need to re-run the ABM.

Benefits of Using Gaussian Processes as Emulators
An important implication is that we have a formula that calculates the predictive uncertainty at each test input X * : This allows us to employ a sequential sampling scheme: 1. The GP emulator is fitted on the results from a small number of parameter combinations at which the ABM is run. 2. Additional parameter combinations for the ABM can be chosen corresponding to regions where the emulator's predictive uncertainty is high. 3. The GP emulator is updated to include the new ABM results.
The function Var( f | X * ) can accept matrices as inputs. This allows us to evaluate it on a large set of possible inputs and then either select only the inputs (i.e. matrix rows) that correspond to a predictive uncertainty that is higher than a predetermined threshold or select a predetermined number of inputs that correspond to the highest predictive uncertainty.

Options for the Emulator Form
Following Haylock and O'Hagan (1996), the general form for the emulator to fit on the model output j is The first term is a regression term, where h i (•) are deterministic functions of the model parameters and β i the regression coefficients. The second term is a GP with a zero mean function and an appropriately chosen kernel. One way to interpret this formulation is that the ABM is approximated by a regression term, which is then improved by a GP. Alternatively, this formulation can be interpreted as an emulator that follows a GP that is given by: where θ denotes the parameters of the kernel. The most common approach for estimating the parameters of the GP emulator (i.e. the coefficients β i and the parameters θ of the kernel) is the "plug-in" approach of Kennedy and O'Hagan (2001), where the parameters are estimated using maximum likelihood and, then, treated as fixed during the calibration of the model.
As far as the GP kernels are concerned, the most popular choice is the squared exponential kernel, which implies that the function to approximate has infinitely many derivatives. If the smoothness assumption of the squared exponential kernel is unrealistic for a specific modelling case, the Matérn kernel class can be used as an alternative (Genton 2002).
Both the squared exponential kernel and the kernels of the Matérn class include the following term: The length scale ' determines how close two data points x i , x j have to be to influence each other significantly. If we allow for different values of ' per input dimension, then we achieve automatic relevance determination, so named because estimating the length scale parameters ' 1 ,' 2 ,. . .,' q implicitly determines the "relevance" of each dimension. Input dimensions with relatively large length scales imply relatively little variation/impact on the function being approximated.

Fitting the GP Emulator
In the application presented in this chapter, a GP emulator-the STatistical approximation-based modEl EMulator (STEEM) of the Technoeconomics of Energy Systems Laboratory (TEESlab)-was fitted on the results from a number of parameter combinations at which the ABM was run. This means that the parameter combinations were the input data and the ABM results were the output data for the GP emulator. The mean function of the emulator was set to zero, while the Matérn 3/2 kernel was used as the covariance function.
To generate the input data, preliminary ranges for the value of each model parameter were derived, and then the ABM was run for 150 different parameter combinations. These initial combinations were chosen using a maximin Latin hypercube design, to fill the entire input space by maximizing the minimum distance between the points generated.
The initial ranges for the ABM parameters were defined as presented in Table 1. There are two possible caveats when selecting the initial ranges for the model parameters to be calibrated: 1. The ranges are wider than necessary. Sensitivity analysis will be carried out to test if and for which parameter this is true. 2. The ranges are too narrow. At the end of the calibration, one should revisit these ranges.
The corresponding PV capacity pathways were scaled to the [0, 1] range. This allowed the comparison of the simulated pathways with the-also scaled-pathway of the actual PV investments during the calibration period.
The ABM calibration was conducted with the assumption that the model represents reality well if it can replicate the historical growth rates for PV installations. In other words, calibration looked for similar shapes of the cumulative capacity of PV installations, while the scale is inevitably different. At the same time, calibrating based only on the shape of the simulated pathways creates a problem of non-identifiability; we need a final target to use as a point of reference. To this end, the median of the capacities achieved at the end of the simulated period was mapped to the value of 1 during the scaling. In addition, the input data was normalized. The ABM results were split into four periods (mid-2010, end-2010, end-2011, end of simulation period), and four GP emulators (four implementations of the STEEM) were fitted on the respective results. The hyperparameters of each emulator were estimated using likelihood maximization, and the Python package GPy 4 was employed for this purpose.

Bastos and O'Hagan (2009) present different validation methods for GP emulators.
In this chapter, the generalization capability of the fitted emulator was judged by performing k-fold cross-validation on the emulator that corresponds to the results at the end of the simulated period. K-fold cross-validation was selected to make sure that there is no region in the input space that the emulator has been poorly fitted. As a metric for the validation, the coefficient of Nash-Sutcliffe efficiency was used, which is given by: where m represents the number of data in each fold, y i are the respective outputs of the ABM, y is the mean of these outputs and y i,GP are the estimations of the GP emulator for the same parameter inputs. This coefficient has an optimal value of 1; a value of 0 indicates that a forecast using the mean of the ABM outputs will have the same utility as the result of the GP emulator.

Sensitivity Analysis
The preliminary ranges for the value of each model parameter were chosen arbitrarily with the goal of fitting a GP emulator on a reasonably large input space. However, after the emulator is fitted, it is useful to revisit the ranges by performing a sensitivity analysis that is facilitated by the emulator. Sensitivity analysis explains which of the d features in the input dataset X ¼ {x j , j ¼ 1, . . ., d} are most responsible for the uncertainty in the model's result. It makes sense to restrict the range of the parameters that have only a small impact on the uncertainty of the model's outputs.
For each feature j : 1,2,. . .d, we split the dataset X into two parts, one including the selected feature and the other the remaining ones: Then, we can assess the sensitivity of the model's output to the uncertainty regarding x j through the expected reduction in the output's variance if the true value of x j is learnt. Saltelli et al. (2010) provide the following formula for the calculation of the expected reduction in variance: A and B are two independent N Â d matrices that contain samples of the model's inputs. Following Saltelli et al. (2010), a Sobol sequence was used to derive these matrices. Sobol sequences were designed to cover the unit hypercube with lower discrepancy than completely random sampling. The index j runs from 1 to d, where d is the number of parameters. The index i runs from 1 to N, where N is the number of input samples. The term f(A) i represents the ith element of the vector that is the output of the GP emulator when evaluated at X * ¼ A. The term A j ð Þ B represents a matrix where column j comes from matrix B and all other columns come from matrix A. The matrices A and B can be generated from a Sobol sequence of size N Â 2d: A is the left half of the sequence and B is the right part of it.
Given Var x j E y j x j À Á À Á , we can compute the first-order sensitivity coefficient S j that captures the main effect of x j on y as: As it turns out, the ABM's output variance is mainly driven by: 1. The shape parameter of the global distribution that assigns each agent's threshold value for their resistance parameter 2. The shape parameter of the global distribution that assigns the weight of the profitability to each agent's resistance 3. The shape parameter of the global distribution that assigns μ CF to each agent in the model Accordingly, all other parameters were fixed to the mean value of their initial ranges.

The History Matching Method
The calibration of the model was based on the history matching method (Craig et al. 1997). History matching works by excluding subsets of the model's parameter space, which are unlikely to provide a good match between the model outputs and the observed reality. As the plausible space becomes smaller and smaller, emulators become smoother and more accurate, allowing us to zoom in the parameter space we explore. As a result, the implausible parameter values are excluded in iterations, known as waves, where new emulators are built after each wave. History matching has been successfully applied across a wide range of scientific domains, including the case of calibrating ABM (Andrianakis et al. 2015).
A central concept of the history matching method is the quantification of the major uncertainties that have an impact on the calibration process (Kennedy and O'Hagan 2001): 1. Ensemble variability (V es ). This uncertainty represents the stochastic nature of the ABM; different runs with the same parameters will give different outcomes.
To calculate the ensemble variability, one combination for the model's parameters was selected-it was assumed that the variability is independent from the inputs, so one combination will suffice-and run the model K ¼ 25 times to capture the variability of the results. Accordingly, the ensemble variability was calculated to be V es ¼0.002. 2. Emulation uncertainty (V em ). This type of uncertainty originates in the possible discrepancy between the ABM and the GP-based STEEM. To calculate the emulation uncertainty, the model runs used for the calculation of the ensemble uncertainty were utilized. The emulation uncertainty was given by: where f j,k (x) is the output of the ABM for the period j at the kth run and g j (x) is the emulator's result. Accordingly, the emulation uncertainty was calculated to be V em ¼ 0.002.
Having specified the aforementioned uncertainties, the implausibility of a specific parameter combination x is given by (Vernon et al. 2010): where z j is the observed data for the period j.
A natural cut-off value for implausibility is 3, i.e. any parameter combination x with I(x) > 3 should be considered implausible.
For the calculation of the implausibility function, the following steps took place: 1. z j ! The actual data was scaled to the [0, 1] range. 2.
x ! A new and much larger (20,000) set of combinations was derived through Latin hypercube design, this time taking random points within the sampling intervals.
3. g j (x) ! We predicted the cumulative capacity additions at the end of the period j for each combination. 4. I j (x) ! The implausibility function was used as an indicator function to split the samples into plausible and implausible ones through the Patient Rule Induction Method (Friedman and Fisher 1999).

The Patient Rule Induction Method
The Patient Rule Induction Method (PRIM) is a heuristic algorithm that aims to identify rectangular partitions (called boxes) of the input space where the average response is much larger than the average response across the whole input space. When the responses correspond to classification classes, the goal is to identify partitions with high homogeneity or, equivalently, low impurity as measured by the Gini index: where p i is the fraction of responses labelled with class i in the partition. For binary responses (as is our case), both aforementioned goals are equivalent if the responses of no interest (i.e. with implausibility function values greater than 3) are assigned the zero value.
The PRIM algorithm proceeds as follows: 1. Identify a rectangle box that includes all input data, B 1 . Identify a dimension j that when removing the slice of data below the a quantile or above the 1 À a quantile of the corresponding variable's distribution, we achieve the greatest decrease in the impurity of the remaining data. The parameter a is user-defined.
The objective function that we used divides the gain in purity by the loss in mass: where GI B1 is the Gini index of the initial box B 1 and GI B1 À j the Gini index of the box resulting from removing the amount of β B1 À j data from dimension j.
2. Repeat step 2 until a minimum number β of data remains in the rectangle. This process is called "peeling". The parameter β is also user-defined. 3. Reverse the peeling process by expanding the rectangle in the dimension where adding a slice of data leads to the greatest decrease in the impurity of the expanded partition.
4. Repeat step 4 until no decrease in impurity is possible. This process is called "pasting" and aims to refine the box boundaries.
Based on the aforementioned, PRIM was applied so as to find boxes that contain plausible parameter combinations (i.e. with I(x) 3). The quality of the derived partition was judged by measuring its density. Density is the ratio of the total number of plausible combinations inside the box to the number of all combinations inside this box.
In addition, the partition boxes were evaluated according to their coverage. Coverage is the ratio of the total number of plausible combinations inside the box to the total number of plausible combinations found across the whole dataset. If the coverage is low, it is possible that either the data has high levels of noise or there are additional partitions with high density as well.
To explore the latter possibility, the method of Guivarch et al. (2016) can be used, according to which the data in the identified box are marked as uninteresting (i.e. they are assigned the zero value), and the PRIM algorithm is re-run so as to identify a new box if possible.

Calibration and Extrapolation Results
After considering all periods, the PRIM constrained the parameters 1, 5 and 9 of Table 1 as presented in Table 2.
Subsequently, the calibrated model was used to explore the expected effectiveness of the Greek net-metering scheme in driving investments in small-scale PV during the period 2018-2025. To this end, a set of scenarios for different plausible values of the model parameters was run, assuming that the retail prices remained unaffected and taking into consideration the provisions of the Greek net-metering scheme.
The ABM that was used for the forward-looking simulations differs from the one used for calibration in the following ways: 1. Available options. When the agents decide to invest in solar PV, they can choose one option from a limited set of different ones, all of which are available to all agents. These options concern the size of the PV system installed: 2.4 kWp, 4.8 kWp or 9.6 kWp. 2. The option selection rule. If more than one option is evaluated favourably by an agent, the selection is based on the softmax rule; the probability of investing in option j is related to the agent's resistance to it (r j ) according to: The results are presented in Fig. 2.

Discussion
The ambition of this chapter was to highlight the fact that when calibrating ABMs to historical data, we should aim to explicitly quantify model uncertainty. For the model under study, this uncertainty can be regarded as uncertainty that is related to the characteristics and the decision-making criteria of the agents (i.e. independent PV power producers), and it is evident by the range of the different plausible parameters of the model and the range in the results that they produce. The main takeaway is that model uncertainty exists whether we choose to quantify it or not. However, by quantifying it, we are able to understand the utility and the limits of our model. In this case, we can compare the future scenarios with the historical data (Fig. 3). This reveals that the expected results from the net-metering scheme in Greece are positive but not as strong as the results of the FIT period; even with the most favourable of the plausible parameter combinations, it will take 7 years to achieve the same PV capacity additions that the FIT scheme achieved from 2010 up to the end of 2012. In addition, highlighting model uncertainty makes it sensible that a policy design process that utilizes ABMs is structured around the concept of adaptability; as new data on the actual decisions of the relevant actors is accumulated, the initial policy should adapt, the same way as it should adapt to changes in its environment, such as technology costs.
Finally, ABMs can help policies succeed by focusing on directly affecting the agents' characteristics. As an example, dissemination of trusted information can help align agents' beliefs, while marketing campaigns and appropriately devised narratives can help increase the degree of influence by social learning and imitation. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.