To investigate the drivers and constraints of model acceptance in policy, we conducted fieldwork on eight different models that are currently in use (see Table 1). Three case studies were conducted in the Netherlands and five in the UK. The cases were selected on the basis of maximum variation sampling. This sampling strategy consists of including as much diversity in the cases as possible. It generally results in a heterogeneous set of cases, which allows the researcher to identify commonalities that cut across the variations (Patton 2002). If such commonalities exist amongst divergent cases, they provide a basis for theory formation.
The maximum variation sampling of cases was guided by a set of selection criteria. The first selection criterion is that of novelty, that is, the number of years since it was first completed. If a model has been around for a long time, those working with it may have developed familiarity with it, affecting its perceived user-friendliness and interpretability. This idea is supported by findings on the adoption of information technology in general (Venkatesh and Bala 2008).
The second selection criterion is model technique. Model characteristics such as performance and the user interface are likely to be related to the technique used in the model. For instance, a particular modelling technique might be computationally intensive or limit the options for an interface. Moreover, Boulanger and Bréchet (2005) demonstrate that different modelling techniques have different strengths and weaknesses for use in policy. Although their work does not claim a link between these strengths and weaknesses and the way in which models are used, it does support the connection between model technique and other model characteristics.
Besides model characteristics, the context in which models are used for policy has been argued to impact how a model is used. However, there has been little work that provides a typification of the contexts of model use, and the relative importance of different aspects of these contexts has not been established. To permit sampling for maximum variation given these limited theoretical foundations, multiple selection criteria were used. By aiming for variation on three selection criteria, some level of variation in terms of contexts could be achieved. The first criterion for the context of model use is model purpose, classified according to the United Kingdom Treasury’s seven purposes (Treasury 2013).
The second and third contextual selection criteria are based on the assumption that model use is subject to cultural influences. Diez and Mcintosh (2011) put forward this idea and suggest that model use can differ across geographical contexts. For this study, models were sampled from two countries: the UK and the Netherlands. Sampling was limited to these two locations for practical reasons. Following the premise that model acceptance is affected by norms and institutions, the host organization was used as a third selection criterion for context.
The two dimensions and the five selection criteria they encompass are a means towards accomplishing maximum variation in the sample of models for this study. However, there exists no exhaustive inventory of all models that are used to inform policy. In more technical terms, this constitutes the absence of a sampling frame to which the selection criteria can be applied.
To prevent sampling bias and to allow for the application of the sampling criteria, a tentative sampling frame was created by combining web-searches, online archives of models, such as the LIAISE Toolkit (LIAISE 2014), and the annex to the UK government review of analytical models (Treasury 2013). By combining these archives and manual web-searches, an overview of 660 models was created. Still, some bias in the sampling could not be prevented as more controversial models might not be publicly advertised. Twenty models were targeted for inclusion in this study, but access was only attained to eight.
Over a period of 2 years, data were collected on these eight models in the form of interviews, documents and observations. The data included 34 semi-structured interviews with model developers and policy analysts. To ensure interview comparability, an interview guide was used. The interview data were supplemented by an archival study of 41 documents such as minutes, model documentation and green papers that reference the model. In addition, in situ observations of model developers and policy analysts were conducted. Taken together, these three methods allowed for validation of the findings through triangulation.
For the analysis of the collected data, we employed a format that is based on the grounded theory paradigm put forward by Strauss and Corbin (1998). The approach consists of open coding each interview and attributing descriptors to fragments of texts. In a subsequent stage, these open codes are refined and recombined into more clearly defined concepts. These concepts are then used to recode the interviews and other data for each case. An overview of all the extracted codes was written up in a report form for each case study. The final stage of analysis consisted of merging the codes that were set out in the respective case studies. During this phase, the codes with a direct relation to model acceptance were extracted and a list of text excerpts was created for each observed criterion of model acceptance.
Observed model acceptance criteria
In this section, we identify and define the criteria that contributed to the acceptance of each model. The grounded theory-based process of coding resulted in the eleven criteria of model acceptance in government shown in Table 2. This table also demonstrates which criteria were observed in each particular case. A limitation of such binary categorization is the reduction of nuances or contrasts that may exist in the data. To ameliorate this and to structure the categorization, we employed a simple heuristic. A criterion was considered to be “observed” and assigned a value of “1” if the code for that criterion occurred in two separate sources for the same case. The eleven criteria that emerged from the data can be divided into three groups: model characteristics, organizational characteristics and supporting infrastructure. Two criteria could not be aggregated: reputation and participation in development.
Model characteristics are attributes of a particular model. Of the eleven criteria, quality, tractability, efficiency and flexibility belong in this category.
We define quality as the degree in which the model is perceived to be valid. In all eight cases validation was reported to be a key consideration in driving acceptance of the model. For the Forecasting model, concerns about the validity of the previous model motivated the organization to seek development of a new version. For Quitsim, validation was prioritized in the development process in order to create wider support for the model. In the SAFFIERII case, model results were monitored and validated on a rolling basis in order to ensure the quality of the outcomes.
The quality of the underlying data played an important role in all of the cases. In the Forecasting model case, the participants noted that it was important for the data to originate from reputable sources. In the UKTimes and Quitsim cases, interviewees suggested that the stakeholders’ validation of the data had contributed to their acceptance of the model.
The capacity of users to understand the model was reported to be of significance in five cases. In the SAFFIERII case, the tractability of the model was a key motivation for it to be accepted. The participants pointed out that the model could not be too complex, because otherwise no single person would be able to comprehend it, rendering the model unusable. In the UKTimes case, the complexity of the previous model was noted by participants to be an important motivation in the development and acceptance of the new one. In the Retail Risk Index case, the intuitive tractability of the chosen modelling technique was reported to have contributed to the acceptance of the model. Policymakers did not wish to understand fully how the model worked, but rather wanted to understand it on an abstract level. One participant of the Retail Risk Index case used a metaphor to make a similar suggestion in pointing out that “in order to drive a car, one doesn’t need to know how the engine works. It suffices to know that pressing the accelerator pedal will move the car and the steering wheel can be used to move it in a particular direction”. In the case of the 2050 calculator, participants suggested that exposing policymakers to the mechanics of the model could actually adversely affect organizational acceptance because they were sceptical of the simplifications involved.
In the SAFFIERII, EUPA model and Quitsim cases a decrease in the time taken to run a single iteration of the model was reason to revise the model itself or the structure of model use. In the Pensim2 and SAFFIERII cases, participants noted that it was important for model outcomes to be produced in a timely fashion; the model itself needed to produce figures as fast as possible. In the Quitsim case, the way in which the use of the model was organized was perceived as a main impediment. The interactions between policymakers and model analysts were considered to be time consuming and detrimental to the acceptance of the model. In the other five cases, the participants did not suggest model speed was of importance. However, it should be noted that in these cases the run-time of the model was reported to be in the seconds to ten minutes range.
Flexibility is defined as the potential ease with which a model can be adapted to inform new questions. In the Pensim2 case, it was suggested that the capacity to adapt the model to the changing requirements of the policymakers was important. More specifically, the ability to include particular policy measures in the model was considered critical. A similar point was made about the SAFFIERII model.
These three criteria refer to the existing or required infrastructure for the use of models. An absence of transparency will not render a model useless, but will impact its acceptance. Likewise, compatibility with existing systems and the consistency of model outputs will impact the perception of whether a model can be used with ease.
Compatibility is the degree to which a model is implemented in a programming language or software platform that the user is familiar with. In the SAFFIERII case, interviewees suggested that the programming language the model was developed in corresponded with the one predominantly used at the organization. They argued that this permitted the long-term maintainability of the model.
A similar point about compatibility was raised by participants in the Pensim2, Forecasting model, EUPA model, and 2050 calculator cases. In these cases, the interviewees suggested that the fact that the model was implemented in a particular spreadsheet software aided its acceptance. Because model analysts were familiar with this type of interface, they could easily interact with the model. In the Retail Risk Index case, the model was implemented in an existing proprietary software platform. The participants reported that the fact that they were familiar with this software aided their acceptance of the model. The Quitsim model was also implemented in a programming language that was familiar to its users.
The transparency of the model was considered to be of importance in the SAFFIERII, EUPA model, 2050 calculator and UKTimes cases. The interviewees considered transparency to consist of the ability to review the mechanics of the model and its underlying assumptions. In the Pensim2, SAFFIERII, EUPA model and UKTimes cases, participants suggested that transparency did not necessarily mean that all users should understand the mechanics of the model, but rather that experts reviewing the model should not have any difficulty in doing so.
Consistency refers to the degree in which the models’ outcomes are perceived to align with previous figures. In the SAFFIERII, EUPA model, UKTimes and Quitsim cases, participants pointed out that inconsistency of model results could lead to confusion and scepticism about the validity of the model, thereby adversely affecting the likelihood of its acceptance.
Two organizational factors were shown to have an impact on model acceptance.
In the case of the 2050 calculator model, the participants reported that one of the main reasons for the model being adopted was that the senior policymakers felt that, at that particular time, they could not be attributed blame if implementation of the model failed. If it succeeded, however, they would be able to benefit from this. In the case of the SAFFIERII model, participants suggested that acceptance of the model had to occur in a context where analysis has to be ongoing. Failure to implement the model in a timely fashion would have had adverse consequences in the progress of the policy process.
In the Forecasting model case, the participants suggested that the acceptance of the model was spearheaded by an advocate within the organization. This advocate advised in favour of using a model and the organization responded in suit. A similar process was observed by participants of the 2050 calculator. They argued that the support from one renowned individual contributed strongly to the acceptance of the model within the organization.
The two remaining criteria, reputation of the model developer and participation in model development, cannot be grouped under any of the previous headings.
Reputation refers to the reputation of the developer of the model. In the SAFFIERII, Forecasting model, EUPA model, Retail Risk Index, UKTimes and Quitsim cases, development of the model was outsourced to an external party. The participants reported that the reputation of the external developers was of benefit to the acceptance of the model. In the Quitsim case, the participants suggested that the developer first needed to establish its reputation by providing validation of the models results.
Participation in development
With the exception of the 2050 calculator and Retail Risk Index cases, participants suggested that the opportunity to be involved in the process of developing the model had contributed to their acceptance of the model. More specifically, they referred to involvement in defining the mechanisms of a model on a conceptual level. In the Retail Risk Index case, the model developers suggested that the fact that users were not involved in the development process negatively affected their initial willingness to accept it. They argued that the lack of participation might have resulted in a degree of scepticism towards the model.